summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-17Linux 4.9.182v4.9.182Greg Kroah-Hartman
2019-06-17tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()Eric Dumazet
commit 967c05aee439e6e5d7d805e195b3a20ef5c433d6 upstream. If mtu probing is enabled tcp_mtu_probing() could very well end up with a too small MSS. Use the new sysctl tcp_min_snd_mss to make sure MSS search is performed in an acceptable range. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-17tcp: add tcp_min_snd_mss sysctlEric Dumazet
commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream. Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-17tcp: tcp_fragment() should apply sane memory limitsEric Dumazet
commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream. Jonathan Looney reported that a malicious peer can force a sender to fragment its retransmit queue into tiny skbs, inflating memory usage and/or overflow 32bit counters. TCP allows an application to queue up to sk_sndbuf bytes, so we need to give some allowance for non malicious splitting of retransmit queue. A new SNMP counter is added to monitor how many times TCP did not allow to split an skb if the allowance was exceeded. Note that this counter might increase in the case applications use SO_SNDBUF socket option to lower sk_sndbuf. CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the socket is already using more than half the allowed space Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-17tcp: limit payload size of sacked skbsEric Dumazet
commit 3b4929f65b0d8249f19a50245cd88ed1a2f78cff upstream. Jonathan Looney reported that TCP can trigger the following crash in tcp_shifted_skb() : BUG_ON(tcp_skb_pcount(skb) < pcount); This can happen if the remote peer has advertized the smallest MSS that linux TCP accepts : 48 An skb can hold 17 fragments, and each fragment can hold 32KB on x86, or 64KB on PowerPC. This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs can overflow. Note that tcp_sendmsg() builds skbs with less than 64KB of payload, so this problem needs SACK to be enabled. SACK blocks allow TCP to coalesce multiple skbs in the retransmit queue, thus filling the 17 fragments to maximal capacity. CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs Backport notes, provided by Joao Martins <joao.m.martins@oracle.com> v4.15 or since commit 737ff314563 ("tcp: use sequence distance to detect reordering") had switched from the packet-based FACK tracking and switched to sequence-based. v4.14 and older still have the old logic and hence on tcp_skb_shift_data() needs to retain its original logic and have @fack_count in sync. In other words, we keep the increment of pcount with tcp_skb_pcount(skb) to later used that to update fack_count. To make it more explicit we track the new skb that gets incremented to pcount in @next_pcount, and we get to avoid the constant invocation of tcp_skb_pcount(skb) all together. Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-17tcp: reduce tcp_fastretrans_alert() verbosityEric Dumazet
commit 8ba6ddaaf86c4c6814774e4e4ef158b732bd9f9f upstream. With upcoming rb-tree implementation, the checks will trigger more often, and this is expected. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Amit Shah <amit@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11Linux 4.9.181v4.9.181Greg Kroah-Hartman
2019-06-11ethtool: check the return value of get_regs_lenYunsheng Lin
commit f9fc54d313fab2834f44f516459cdc8ac91d797f upstream. The return type for get_regs_len in struct ethtool_ops is int, the hns3 driver may return error when failing to get the regs len by sending cmd to firmware. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabledDavid Ahern
commit 9b3040a6aafd7898ece7fc7efcbca71e42aa8069 upstream. Define __ipv4_neigh_lookup_noref to return NULL when CONFIG_INET is disabled. Fixes: 4b2a2bfeb3f0 ("neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11fuse: Add FOPEN_STREAM to use stream_open()Kirill Smelkov
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream. Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11fs: stream_open - opener for stream-like files so that read and write can ↵Kirill Smelkov
run simultaneously without deadlock commit 10dce8af34226d90fa56746a934f8da5dcdba3df upstream. Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added locking for file.f_pos access and in particular made concurrent read and write not possible - now both those functions take f_pos lock for the whole run, and so if e.g. a read is blocked waiting for data, write will deadlock waiting for that read to complete. This caused regression for stream-like files where previously read and write could run simultaneously, but after that patch could not do so anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") which fixes such regression for particular case of /proc/xen/xenbus. The patch that added f_pos lock in 2014 did so to guarantee POSIX thread safety for read/write/lseek and added the locking to file descriptors of all regular files. In 2014 that thread-safety problem was not new as it was already discussed earlier in 2006. However even though 2006'th version of Linus's patch was adding f_pos locking "only for files that are marked seekable with FMODE_LSEEK (thus avoiding the stream-like objects like pipes and sockets)", the 2014 version - the one that actually made it into the tree as 9c225f2655e3 - is doing so irregardless of whether a file is seekable or not. See https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/ https://lwn.net/Articles/180387 https://lwn.net/Articles/180396 for historic context. The reason that it did so is, probably, that there are many files that are marked non-seekable, but e.g. their read implementation actually depends on knowing current position to correctly handle the read. Some examples: kernel/power/user.c snapshot_read fs/debugfs/file.c u32_array_read fs/fuse/control.c fuse_conn_waiting_read + ... drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read arch/s390/hypfs/inode.c hypfs_read_iter ... Despite that, many nonseekable_open users implement read and write with pure stream semantics - they don't depend on passed ppos at all. And for those cases where read could wait for something inside, it creates a situation similar to xenbus - the write could be never made to go until read is done, and read is waiting for some, potentially external, event, for potentially unbounded time -> deadlock. Besides xenbus, there are 14 such places in the kernel that I've found with semantic patch (see below): drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write() drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write() drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write() drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write() net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write() drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write() drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write() drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write() net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write() drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write() drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write() drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write() drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write() drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write() In addition to the cases above another regression caused by f_pos locking is that now FUSE filesystems that implement open with FOPEN_NONSEEKABLE flag, can no longer implement bidirectional stream-like files - for the same reason as above e.g. read can deadlock write locking on file.f_pos in the kernel. FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse: implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and write routines not depending on current position at all, and with both read and write being potentially blocking operations: See https://github.com/libfuse/osspd https://lwn.net/Articles/308445 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510 Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as "somewhat pipe-like files ..." with read handler not using offset. However that test implements only read without write and cannot exercise the deadlock scenario: https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216 I've actually hit the read vs write deadlock for real while implementing my FUSE filesystem where there is /head/watch file, for which open creates separate bidirectional socket-like stream in between filesystem and its user with both read and write being later performed simultaneously. And there it is semantically not easy to split the stream into two separate read-only and write-only channels: https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169 Let's fix this regression. The plan is: 1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS - doing so would break many in-kernel nonseekable_open users which actually use ppos in read/write handlers. 2. Add stream_open() to kernel to open stream-like non-seekable file descriptors. Read and write on such file descriptors would never use nor change ppos. And with that property on stream-like files read and write will be running without taking f_pos lock - i.e. read and write could be running simultaneously. 3. With semantic patch search and convert to stream_open all in-kernel nonseekable_open users for which read and write actually do not depend on ppos and where there is no other methods in file_operations which assume @offset access. 4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via steam_open if that bit is present in filesystem open reply. It was tempting to change fs/fuse/ open handler to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. 5. Add stream_open and FOPEN_STREAM handling to stable kernels starting from v3.14+ (the kernel where 9c225f2655 first appeared). This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in their open handler and this way avoid the deadlock on all kernel versions. This should work because fs/fuse/ ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. This patch adds stream_open, converts /proc/xen/xenbus to it and adds semantic patch to automatically locate in-kernel places that are either required to be converted due to read vs write deadlock, or that are just safe to be converted because read and write do not use ppos and there are no other funky methods in file_operations. Regarding semantic patch I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. The script also does not convert files that should be valid to convert, but that currently have .llseek = noop_llseek or generic_file_llseek for unknown reason despite file being opened with nonseekable_open (e.g. drivers/input/mousedev.c) Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yongzhi Pan <panyongzhi@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Tejun Heo <tj@kernel.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Nikolaus Rath <Nikolaus@rath.org> Cc: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11TTY: serial_core, add ->installJiri Slaby
commit 4cdd17ba1dff20ffc99fdbd2e6f0201fc7fe67df upstream. We need to compute the uart state only on the first open. This is usually what is done in the ->install hook. serial_core used to do this in ->open on every open. So move it to ->install. As a side effect, it ensures the state is set properly in the window after tty_init_dev is called, but before uart_open. This fixes a bunch of races between tty_open and flush_to_ldisc we were dealing with recently. One of such bugs was attempted to fix in commit fedb5760648a (serial: fix race between flush_to_ldisc and tty_open), but it only took care of a couple of functions (uart_start and uart_unthrottle). I was able to reproduce the crash on a SLE system, but in uart_write_room which is also called from flush_to_ldisc via process_echoes. I was *unable* to reproduce the bug locally. It is due to having this patch in my queue since 2012! general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G L 4.12.14-396-default #1 SLE15-SP1 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound flush_to_ldisc task: ffff8800427d8040 task.stack: ffff8800427f0000 RIP: 0010:uart_write_room+0xc4/0x590 RSP: 0018:ffff8800427f7088 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 000000000000002f RSI: 00000000000000ee RDI: ffff88003888bd90 RBP: ffffffffb9545850 R08: 0000000000000001 R09: 0000000000000400 R10: ffff8800427d825c R11: 000000000000006e R12: 1ffff100084fee12 R13: ffffc900004c5000 R14: ffff88003888bb28 R15: 0000000000000178 FS: 0000000000000000(0000) GS:ffff880043300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561da0794148 CR3: 000000000ebf4000 CR4: 00000000000006e0 Call Trace: tty_write_room+0x6d/0xc0 __process_echoes+0x55/0x870 n_tty_receive_buf_common+0x105e/0x26d0 tty_ldisc_receive_buf+0xb7/0x1c0 tty_port_default_receive_buf+0x107/0x180 flush_to_ldisc+0x35d/0x5c0 ... 0 in rbx means tty->driver_data is NULL in uart_write_room. 0x178 is tried to be dereferenced (0x178 >> 3 is 0x2f in rdx) at uart_write_room+0xc4. 0x178 is exactly (struct uart_state *)NULL->refcount used in uart_port_lock from uart_write_room. So revert the upstream commit here as my local patch should fix the whole family. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Li RongQing <lirongqing@baidu.com> Cc: Wang Li <wangli39@baidu.com> Cc: Zhang Yu <zhangyu31@baidu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11drm/i915: Fix I915_EXEC_RING_MASKChris Wilson
commit d90c06d57027203f73021bb7ddb30b800d65c636 upstream. This was supposed to be a mask of all known rings, but it is being used by execbuffer to filter out invalid rings, and so is instead mapping high unused values onto valid rings. Instead of a mask of all known rings, we need it to be the mask of all possible rings. Fixes: 549f7365820a ("drm/i915: Enable SandyBridge blitter ring") Fixes: de1add360522 ("drm/i915: Decouple execbuf uAPI from internal implementation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v4.6+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301140404.26690-21-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11drm/radeon: prefer lower reference dividersChristian König
commit 2e26ccb119bde03584be53406bbd22e711b0d6e6 upstream. Instead of the closest reference divider prefer the lowest, this fixes flickering issues on HP Compaq nx9420. Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108514 Suggested-by: Paul Dufresne <dufresnep@gmail.com> Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11drm/gma500/cdv: Check vbt config bits when detecting lvds panelsPatrik Jakobsson
commit 7c420636860a719049fae9403e2c87804f53bdde upstream. Some machines have an lvds child device in vbt even though a panel is not attached. To make detection more reliable we now also check the lvds config bits available in the vbt. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1665766 Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190416114607.1072-1-patrik.r.jakobsson@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11genwqe: Prevent an integer overflow in the ioctlDan Carpenter
commit 110080cea0d0e4dfdb0b536e7f8a5633ead6a781 upstream. There are a couple potential integer overflows here. round_up(m->size + (m->addr & ~PAGE_MASK), PAGE_SIZE); The first thing is that the "m->size + (...)" addition could overflow, and the second is that round_up() overflows to zero if the result is within PAGE_SIZE of the type max. In this code, the "m->size" variable is an u64 but we're saving the result in "map_size" which is an unsigned long and genwqe_user_vmap() takes an unsigned long as well. So I have used ULONG_MAX as the upper bound. From a practical perspective unsigned long is fine/better than trying to change all the types to u64. Fixes: eaf4722d4645 ("GenWQE Character device and DDCB queue") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11Revert "MIPS: perf: ath79: Fix perfcount IRQ assignment"Greg Kroah-Hartman
This reverts commit f9b1baac265600a61d36ebaf9ba657119303b5b5 which is commit a1e8783db8e0d58891681bc1e6d9ada66eae8e20 upstream. Petr writes: Karl has reported to me today, that he's experiencing weird reboot hang on his devices with 4.9.180 kernel and that he has bisected it down to my backported patch. I would like to kindly ask you for removal of this patch. This patch should be reverted from all stable kernels up to 5.1, because perf counters were not broken on those kernels, and this patch won't work on the ath79 legacy IRQ code anyway, it needs new irqchip driver which was enabled on ath79 with commit 51fa4f8912c0 ("MIPS: ath79: drop legacy IRQ code"). Reported-by: Petr Štetiar <ynezz@true.cz> Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Cc: John Crispin <john@phrozen.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11MIPS: pistachio: Build uImage.gz by defaultPaul Burton
commit e4f2d1af7163becb181419af9dece9206001e0a6 upstream. The pistachio platform uses the U-Boot bootloader & generally boots a kernel in the uImage format. As such it's useful to build one when building the kernel, but to do so currently requires the user to manually specify a uImage target on the make command line. Make uImage.gz the pistachio platform's default build target, so that the default is to build a kernel image that we can actually boot on a board such as the MIPS Creator Ci40. Marked for stable backport as far as v4.1 where pistachio support was introduced. This is primarily useful for CI systems such as kernelci.org which will benefit from us building a suitable image which can then be booted as part of automated testing, extending our test coverage to the affected stable branches. Signed-off-by: Paul Burton <paul.burton@mips.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> URL: https://groups.io/g/kernelci/message/388 Cc: stable@vger.kernel.org # v4.1+ Cc: linux-mips@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11x86/power: Fix 'nosmt' vs hibernation triple fault during resumeJiri Kosina
commit ec527c318036a65a083ef68d8ba95789d2212246 upstream. As explained in 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") we always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees. That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once. This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid. That results in triple fault shortly after cr3 is switched, and machine reboots. Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration. Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait. Cc: 4.19+ <stable@vger.kernel.org> # v4.19+ Debugged-by: Thomas Gleixner <tglx@linutronix.de> Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11fuse: fallocate: fix return with locked inodeMiklos Szeredi
commit 35d6fcbb7c3e296a52136347346a698a35af3fda upstream. Do the proper cleanup in case the size check fails. Tested with xfstests:generic/228 Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate") Cc: Liu Bo <bo.liu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11parisc: Use implicit space register selection for loading the coherence ↵John David Anglin
index of I/O pdirs commit 63923d2c3800919774f5c651d503d1dd2adaddd5 upstream. We only support I/O to kernel space. Using %sr1 to load the coherence index may be racy unless interrupts are disabled. This patch changes the code used to load the coherence index to use implicit space register selection. This saves one instruction and eliminates the race. Tested on rp3440, c8000 and c3750. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11rcu: locking and unlocking need to always be at least barriersLinus Torvalds
commit 66be4e66a7f422128748e3c3ef6ee72b20a6197b upstream. Herbert Xu pointed out that commit bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers") was incorrect in making the preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT. If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is a no-op, but still is a compiler barrier. And RCU locking still _needs_ that compiler barrier. It is simply fundamentally not true that RCU locking would be a complete no-op: we still need to guarantee (for example) that things that can trap and cause preemption cannot migrate into the RCU locked region. The way we do that is by making it a barrier. See for example commit 386afc91144b ("spinlocks and preemption points need to be at least compiler barriers") from back in 2013 that had similar issues with spinlocks that become no-ops on UP: they must still constrain the compiler from moving other operations into the critical region. Now, it is true that a lot of RCU operations already use READ_ONCE() and WRITE_ONCE() (which in practice likely would never be re-ordered wrt anything remotely interesting), but it is also true that that is not globally the case, and that it's not even necessarily always possible (ie bitfields etc). Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Fixes: bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers") Cc: stable@kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11Revert "fib_rules: return 0 directly if an exactly same rule exists when ↵Hangbin Liu
NLM_F_EXCL not supplied" [ Upstream commit 4970b42d5c362bf873982db7d93245c5281e58f4 ] This reverts commit e9919a24d3022f72bcadc407e73a6ef17093a849. Nathan reported the new behaviour breaks Android, as Android just add new rules and delete old ones. If we return 0 without adding dup rules, Android will remove the new added rules and causing system to soft-reboot. Fixes: e9919a24d302 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Yaro Slav <yaro330@gmail.com> Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11Revert "fib_rules: fix error in backport of e9919a24d302 ("fib_rules: return ↵Greg Kroah-Hartman
0...")" This reverts commit d5c71a7c533e88a9fcc74fe1b5c25743868fa300 as the patch that this "fixes" is about to be reverted... Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11ipv6: use READ_ONCE() for inet->hdrincl as in ipv4Olivier Matz
[ Upstream commit 59e3e4b52663a9d97efbce7307f62e4bc5c9ce91 ] As it was done in commit 8f659a03a0ba ("net: ipv4: fix for a race condition in raw_sendmsg") and commit 20b50d79974e ("net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the value of inet->hdrincl in a local variable, to avoid introducing a race condition in the next commit. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11ipv6: fix EFAULT on sendto with icmpv6 and hdrinclOlivier Matz
[ Upstream commit b9aa52c4cb457e7416cc0c95f475e72ef4a61336 ] The following code returns EFAULT (Bad address): s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1); sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */ The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW instead of IPPROTO_ICMPV6. The failure happens because 2 bytes are eaten from the msghdr by rawv6_probe_proto_opt() starting from commit 19e3c66b52ca ("ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt""), but at that time it was not a problem because IPV6_HDRINCL was not yet introduced. Only eat these 2 bytes if hdrincl == 0. Fixes: 715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets") Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11pktgen: do not sleep with the thread lock held.Paolo Abeni
[ Upstream commit 720f1de4021f09898b8c8443f3b3e995991b6e3a ] Currently, the process issuing a "start" command on the pktgen procfs interface, acquires the pktgen thread lock and never release it, until all pktgen threads are completed. The above can blocks indefinitely any other pktgen command and any (even unrelated) netdevice removal - as the pktgen netdev notifier acquires the same lock. The issue is demonstrated by the following script, reported by Matteo: ip -b - <<'EOF' link add type dummy link add type veth link set dummy0 up EOF modprobe pktgen echo reset >/proc/net/pktgen/pgctrl { echo rem_device_all echo add_device dummy0 } >/proc/net/pktgen/kpktgend_0 echo count 0 >/proc/net/pktgen/dummy0 echo start >/proc/net/pktgen/pgctrl & sleep 1 rmmod veth Fix the above releasing the thread lock around the sleep call. Additionally we must prevent racing with forcefull rmmod - as the thread lock no more protects from them. Instead, acquire a self-reference before waiting for any thread. As a side effect, running rmmod pktgen while some thread is running now fails with "module in use" error, before this patch such command hanged indefinitely. Note: the issue predates the commit reported in the fixes tag, but this fix can't be applied before the mentioned commit. v1 -> v2: - no need to check for thread existence after flipping the lock, pktgen threads are freed only at net exit time - Fixes: 6146e6a43b35 ("[PKTGEN]: Removes thread_{un,}lock() macros.") Reported-and-tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11net: rds: fix memory leak in rds_ib_flush_mr_poolZhu Yanjun
[ Upstream commit 85cb928787eab6a2f4ca9d2a798b6f3bed53ced1 ] When the following tests last for several hours, the problem will occur. Server: rds-stress -r 1.1.1.16 -D 1M Client: rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30 The following will occur. " Starting up.... tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu % 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 " >From vmcore, we can find that clean_list is NULL. >From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker. Then rds_ib_mr_pool_flush_worker calls " rds_ib_flush_mr_pool(pool, 0, NULL); " Then in function " int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool, int free_all, struct rds_ib_mr **ibmr_ret) " ibmr_ret is NULL. In the source code, " ... list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail); if (ibmr_ret) *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode); /* more than one entry in llist nodes */ if (clean_nodes->next) llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list); ... " When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next instead of clean_nodes is added in clean_list. So clean_nodes is discarded. It can not be used again. The workqueue is executed periodically. So more and more clean_nodes are discarded. Finally the clean_list is NULL. Then this problem will occur. Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist") Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages queryErez Alfasi
[ Upstream commit 135dd9594f127c8a82d141c3c8430e9e2143216a ] Querying EEPROM high pages data for SFP module is currently not supported by our driver but is still tried, resulting in invalid FW queries. Set the EEPROM ethtool data length to 256 for SFP module to limit the reading for page 0 only and prevent invalid FW queries. Fixes: 7202da8b7f71 ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support") Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmitDavid Ahern
[ Upstream commit 4b2a2bfeb3f056461a90bd621e8bd7d03fa47f60 ] Commit cd9ff4de0107 changed the key for IFF_POINTOPOINT devices to INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not updated to use the altered key. The result is that every packet Tx does a lookup on the gateway address which does not find an entry, a new one is created only to find the existing one in the table right before the insert since arp_constructor was updated to reset the primary key. This is seen in the allocs and destroys counters: ip -s -4 ntable show | head -10 | grep alloc which increase for each packet showing the unnecessary overhread. Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for NEIGH_ARP_TABLE. Fixes: cd9ff4de0107 ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY") Reported-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11ethtool: fix potential userspace buffer overflowVivien Didelot
[ Upstream commit 0ee4e76937d69128a6a66861ba393ebdc2ffc8a2 ] ethtool_get_regs() allocates a buffer of size ops->get_regs_len(), and pass it to the kernel driver via ops->get_regs() for filling. There is no restriction about what the kernel drivers can or cannot do with the open ethtool_regs structure. They usually set regs->version and ignore regs->len or set it to the same size as ops->get_regs_len(). But if userspace allocates a smaller buffer for the registers dump, we would cause a userspace buffer overflow in the final copy_to_user() call, which uses the regs.len value potentially reset by the driver. To fix this, make this case obvious and store regs.len before calling ops->get_regs(), to only copy as much data as requested by userspace, up to the value returned by ops->get_regs_len(). While at it, remove the redundant check for non-null regbuf. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11media: uvcvideo: Fix uvc_alloc_entity() allocation alignmentNadav Amit
commit 89dd34caf73e28018c58cd193751e41b1f8bdc56 upstream. The use of ALIGN() in uvc_alloc_entity() is incorrect, since the size of (entity->pads) is not a power of two. As a stop-gap, until a better solution is adapted, use roundup() instead. Found by a static assertion. Compile-tested only. Fixes: 4ffc2d89f38a ("uvcvideo: Register subdevices for each entity") Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Doug Anderson <dianders@chromium.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11efi/libstub: Unify command line param parsingArd Biesheuvel
commit 60f38de7a8d4e816100ceafd1b382df52527bd50 upstream. Merge the parsing of the command line carried out in arm-stub.c with the handling in efi_parse_options(). Note that this also fixes the missing handling of CONFIG_CMDLINE_FORCE=y, in which case the builtin command line should supersede the one passed by the firmware. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bhe@redhat.com Cc: bhsharma@redhat.com Cc: bp@alien8.de Cc: eugene@hp.com Cc: evgeny.kalugin@intel.com Cc: jhugo@codeaurora.org Cc: leif.lindholm@linaro.org Cc: linux-efi@vger.kernel.org Cc: mark.rutland@arm.com Cc: roy.franz@cavium.com Cc: rruigrok@codeaurora.org Link: http://lkml.kernel.org/r/20170404160910.28115-1-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [ardb: fix up merge conflicts with 4.9.180] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11Revert "x86/build: Move _etext to actual end of .text"Greg Kroah-Hartman
This reverts commit 392bef709659abea614abfe53cf228e7a59876a4. It seems to cause lots of problems when using the gold linker, and no one really needs this at the moment, so just revert it from the stable trees. Cc: Sami Tolvanen <samitolvanen@google.com> Reported-by: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Alec Ari <neotheuser@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11mm: make page ref count overflow check tighter and more explicitLinus Torvalds
commit f958d7b528b1b40c44cfda5eabe2d82760d868c3 upstream. We have a VM_BUG_ON() to check that the page reference count doesn't underflow (or get close to overflow) by checking the sign of the count. That's all fine, but we actually want to allow people to use a "get page ref unless it's already very high" helper function, and we want that one to use the sign of the page ref (without triggering this VM_BUG_ON). Change the VM_BUG_ON to only check for small underflows (or _very_ close to overflowing), and ignore overflows which have strayed into negative territory. Acked-by: Matthew Wilcox <willy@infradead.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11mm: prevent get_user_pages() from overflowing page refcountLinus Torvalds
commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream. If the page refcount wraps around past zero, it will be freed while there are still four billion references to it. One of the possible avenues for an attacker to try to make this happen is by doing direct IO on a page multiple times. This patch makes get_user_pages() refuse to take a new page reference if there are already more than two billion references to the page. Reported-by: Jann Horn <jannh@google.com> Acked-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 4.9: - Add the "err" variable in follow_hugetlb_page() - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11mm, gup: ensure real head page is ref-counted when using hugepagesPunit Agrawal
commit d63206ee32b6e64b0e12d46e5d6004afd9913713 upstream. When speculatively taking references to a hugepage using page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the page returned by pmd_page() is the head page. Although normally true, this assumption doesn't hold when the hugepage comprises of successive page table entries such as when using contiguous bit on arm64 at PTE or PMD levels. This can be addressed by ensuring that the page passed to page_cache_add_speculative() is the real head or by de-referencing the head page within the function. We take the first approach to keep the usage pattern aligned with page_cache_get_speculative() where users already pass the appropriate page, i.e., the de-referenced head. Apply the same logic to fix gup_huge_[pud|pgd]() as well. [punit.agrawal@arm.com: fix arm64 ltp failure] Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agrawal@arm.com Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agrawal@arm.com Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepagesWill Deacon
commit a3e328556d41bb61c55f9dfcc62d6a826ea97b85 upstream. When operating on hugepages with DEBUG_VM enabled, the GUP code checks the compound head for each tail page prior to calling page_cache_add_speculative. This is broken, because on the fast-GUP path (where we don't hold any page table locks) we can be racing with a concurrent invocation of split_huge_page_to_list. split_huge_page_to_list deals with this race by using page_ref_freeze to freeze the page and force concurrent GUPs to fail whilst the component pages are modified. This modification includes clearing the compound_head field for the tail pages, so checking this prior to a successful call to page_cache_add_speculative can lead to false positives: In fact, page_cache_add_speculative *already* has this check once the page refcount has been successfully updated, so we can simply remove the broken calls to VM_BUG_ON_PAGE. Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agrawal@arm.com Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11fs: prevent page refcount overflow in pipe_buf_getMatthew Wilcox
commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream. Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11binder: replace "%p" with "%pK"Todd Kjos
commit 8ca86f1639ec5890d400fff9211aca22d0a392eb upstream. The format specifier "%p" can leak kernel addresses. Use "%pK" instead. There were 4 remaining cases in binder.c. Signed-off-by: Todd Kjos <tkjos@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11binder: Replace "%p" with "%pK" for stableBen Hutchings
This was done as part of upstream commits fdfb4a99b6ab "8inder: separate binder allocator structure from binder proc", 19c987241ca1 "binder: separate out binder_alloc functions", and 7a4408c6bd3e "binder: make sure accesses to proc/thread are safe". However, those commits made lots of other changes that are not suitable for stable. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11brcmfmac: add subtype check for event handling in data pathArend van Spriel
commit a4176ec356c73a46c07c181c6d04039fafa34a9f upstream. For USB there is no separate channel being used to pass events from firmware to the host driver and as such are passed over the data path. In order to detect mock event messages an additional check is needed on event subtype. This check is added conditionally using unlikely() keyword. Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11brcmfmac: assure SSID length from firmware is limitedArend van Spriel
commit 1b5e2423164b3670e8bc9174e4762d297990deff upstream. The SSID length as received from firmware should not exceed IEEE80211_MAX_SSID_LEN as that would result in heap overflow. Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11brcmfmac: add length checks in scheduled scan result handlerArend Van Spriel
commit 4835f37e3bafc138f8bfa3cbed2920dd56fed283 upstream. Assure the event data buffer is long enough to hold the array of netinfo items and that SSID length does not exceed the maximum of 32 characters as per 802.11 spec. Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> [bwh: Backported to 4.9: - Move the assignment to "data" along with the assignment to "netinfo_start" that depends on it - Adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11drm/vmwgfx: Don't send drm sysfs hotplug events on initial master setThomas Hellstrom
commit 63cb44441826e842b7285575b96db631cc9f2505 upstream. This may confuse user-space clients like plymouth that opens a drm file descriptor as a result of a hotplug event and then generates a new event... Cc: <stable@vger.kernel.org> Fixes: 5ea1734827bb ("drm/vmwgfx: Send a hotplug event at master_set") Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Deepak Rawat <drawat@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11gcc-plugins: Fix build failures under Darwin hostKees Cook
commit 7210e060155b9cf557fb13128353c3e494fa5ed3 upstream. The gcc-common.h file did not take into account certain macros that might have already been defined in the build environment. This updates the header to avoid redefining the macros, as seen on a Darwin host using gcc 4.9.2: HOSTCXX -fPIC scripts/gcc-plugins/arm_ssp_per_task_plugin.o - due to: scripts/gcc-plugins/gcc-common.h In file included from scripts/gcc-plugins/arm_ssp_per_task_plugin.c:3:0: scripts/gcc-plugins/gcc-common.h:153:0: warning: "__unused" redefined ^ In file included from /usr/include/stdio.h:64:0, from /Users/hns/Documents/Projects/QuantumSTEP/System/Library/Frameworks/System.framework/Versions-jessie/x86_64-apple-darwin15.0.0/gcc/arm-linux-gnueabi/bin/../lib/gcc/arm-linux-gnueabi/4.9.2/plugin/include/system.h:40, from /Users/hns/Documents/Projects/QuantumSTEP/System/Library/Frameworks/System.framework/Versions-jessie/x86_64-apple-darwin15.0.0/gcc/arm-linux-gnueabi/bin/../lib/gcc/arm-linux-gnueabi/4.9.2/plugin/include/gcc-plugin.h:28, from /Users/hns/Documents/Projects/QuantumSTEP/System/Library/Frameworks/System.framework/Versions-jessie/x86_64-apple-darwin15.0.0/gcc/arm-linux-gnueabi/bin/../lib/gcc/arm-linux-gnueabi/4.9.2/plugin/include/plugin.h:23, from scripts/gcc-plugins/gcc-common.h:9, from scripts/gcc-plugins/arm_ssp_per_task_plugin.c:3: /usr/include/sys/cdefs.h:161:0: note: this is the location of the previous definition ^ Reported-and-tested-by: "H. Nikolaus Schaller" <hns@goldelico.com> Fixes: 189af4657186 ("ARM: smp: add support for per-task stack canaries") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEMRoberto Bergantinos Corpas
commit 31fad7d41e73731f05b8053d17078638cf850fa6 upstream. In cifs_read_allocate_pages, in case of ENOMEM, we go through whole rdata->pages array but we have failed the allocation before nr_pages, therefore we may end up calling put_page with NULL pointer, causing oops Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11staging: vc04_services: prevent integer overflow in create_pagelist()Dan Carpenter
commit ca641bae6da977d638458e78cd1487b6160a2718 upstream. The create_pagelist() "count" parameter comes from the user in vchiq_ioctl() and it could overflow. If you look at how create_page() is called in vchiq_prepare_bulk_data(), then the "size" variable is an int so it doesn't make sense to allow negatives or larger than INT_MAX. I don't know this code terribly well, but I believe that typical values of "count" are typically quite low and I don't think this check will affect normal valid uses at all. The "pagelist_size" calculation can also overflow on 32 bit systems, but not on 64 bit systems. I have added an integer overflow check for that as well. The Raspberry PI doesn't offer the same level of memory protection that x86 does so these sorts of bugs are probably not super critical to fix. Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11docs: Fix conf.py for Sphinx 2.0Jonathan Corbet
commit 3bc8088464712fdcb078eefb68837ccfcc413c88 upstream. Our version check in Documentation/conf.py never envisioned a world where Sphinx moved beyond 1.x. Now that the unthinkable has happened, fix our version check to handle higher version numbers correctly. Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11kernel/signal.c: trace_signal_deliver when signal_group_exitZhenliang Wei
commit 98af37d624ed8c83f1953b1b6b2f6866011fc064 upstream. In the fixes commit, removing SIGKILL from each thread signal mask and executing "goto fatal" directly will skip the call to "trace_signal_deliver". At this point, the delivery tracking of the SIGKILL signal will be inaccurate. Therefore, we need to add trace_signal_deliver before "goto fatal" after executing sigdelset. Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info. Link: http://lkml.kernel.org/r/20190425025812.91424-1-weizhenliang@huawei.com Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT") Signed-off-by: Zhenliang Wei <weizhenliang@huawei.com> Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ivan Delalande <colona@arista.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>