summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-26Linux 4.16.18v4.16.18Greg Kroah-Hartman
2018-06-26mm, page_alloc: do not break __GFP_THISNODE by zonelist resetVlastimil Babka
commit 7810e6781e0fcbca78b91cf65053f895bf59e85f upstream. In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for allocations that can ignore memory policies. The zonelist is obtained from current CPU's node. This is a problem for __GFP_THISNODE allocations that want to allocate on a different node, e.g. because the allocating thread has been migrated to a different CPU. This has been observed to break SLAB in our 4.4-based kernel, because there it relies on __GFP_THISNODE working as intended. If a slab page is put on wrong node's list, then further list manipulations may corrupt the list because page_to_nid() is used to determine which node's list_lock should be locked and thus we may take a wrong lock and race. Current SLAB implementation seems to be immune by luck thanks to commit 511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on arbitrary node") but there may be others assuming that __GFP_THISNODE works as promised. We can fix it by simply removing the zonelist reset completely. There is actually no reason to reset it, because memory policies and cpusets don't affect the zonelist choice in the first place. This was different when commit 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their own restricted zonelists. We might consider this for 4.17 although I don't know if there's anything currently broken. SLAB is currently not affected, but in kernels older than 4.7 that don't yet have 511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on arbitrary node") it is. That's at least 4.4 LTS. Older ones I'll have to check. So stable backports should be more important, but will have to be reviewed carefully, as the code went through many changes. BTW I think that also the ac->preferred_zoneref reset is currently useless if we don't also reset ac->nodemask from a mempolicy to NULL first (which we probably should for the OOM victims etc?), but I would leave that for a separate patch. Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26fs/binfmt_misc.c: do not allow offset overflowThadeu Lima de Souza Cascardo
commit 5cc41e099504b77014358b58567c5ea6293dd220 upstream. WHen registering a new binfmt_misc handler, it is possible to overflow the offset to get a negative value, which might crash the system, or possibly leak kernel data. Here is a crash log when 2500000000 was used as an offset: BUG: unable to handle kernel paging request at ffff989cfd6edca0 IP: load_misc_binary+0x22b/0x470 [binfmt_misc] PGD 1ef3e067 P4D 1ef3e067 PUD 0 Oops: 0000 [#1] SMP NOPTI Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc] Call Trace: search_binary_handler+0x97/0x1d0 do_execveat_common.isra.34+0x667/0x810 SyS_execve+0x31/0x40 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Use kstrtoint instead of simple_strtoul. It will work as the code already set the delimiter byte to '\0' and we only do it when the field is not empty. Tested with offsets -1, 2500000000, UINT_MAX and INT_MAX. Also tested with examples documented at Documentation/admin-guide/binfmt-misc.rst and other registrations from packages on Ubuntu. Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26vhost: fix info leak due to uninitialized memoryMichael S. Tsirkin
commit 670ae9caaca467ea1bfd325cb2a5c98ba87f94ad upstream. struct vhost_msg within struct vhost_msg_node is copied to userspace. Unfortunately it turns out on 64 bit systems vhost_msg has padding after type which gcc doesn't initialize, leaking 4 uninitialized bytes to userspace. This padding also unfortunately means 32 bit users of this interface are broken on a 64 bit kernel which will need to be fixed separately. Fixes: CVE-2018-1118 Cc: stable@vger.kernel.org Reported-by: Kevin Easton <kevin@guarana.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26HID: wacom: Correct logical maximum Y for 2nd-gen Intuos Pro largeJason Gerecke
commit d471b6b22d37bf9928c6d0202bdaaf76583b8b61 upstream. The HID descriptor for the 2nd-gen Intuos Pro large (PTH-860) contains a typo which defines an incorrect logical maximum Y value. This causes a small portion of the bottom of the tablet to become unusable (both because the area is below the "bottom" of the tablet and because 'wacom_wac_event' ignores out-of-range values). It also results in a skewed aspect ratio. To fix this, we add a quirk to 'wacom_usage_mapping' which overwrites the data with the correct value. Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> CC: stable@vger.kernel.org # v4.10+ Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26HID: intel_ish-hid: ipc: register more pm callbacks to support hibernationEven Xu
commit ebeaa367548e9e92dd9374b9464ff6e7d157117b upstream. Current ISH driver only registers suspend/resume PM callbacks which don't support hibernation (suspend to disk). Basically after hiberation, the ISH can't resume properly and user may not see sensor events (for example: screen rotation may not work). User will not see a crash or panic or anything except the following message in log: hid-sensor-hub 001F:8086:22D8.0001: timeout waiting for response from ISHTP device So this patch adds support for S4/hiberbation to ISH by using the SIMPLE_DEV_PM_OPS() MACRO instead of struct dev_pm_ops directly. The suspend and resume functions will now be used for both suspend to RAM and hibernation. If power management is disabled, SIMPLE_DEV_PM_OPS will do nothing, the suspend and resume related functions won't be used, so mark them as __maybe_unused to clarify that this is the intended behavior, and remove #ifdefs for power management. Cc: stable@vger.kernel.org Signed-off-by: Even Xu <even.xu@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26orangefs: report attributes_mask and attributes for statxMartin Brandenburg
commit 7f54910fa8dfe504f2e1563f4f6ddc3294dfbf3a upstream. OrangeFS formerly failed to set attributes_mask with the result that software could not see immutable and append flags present in the filesystem. Reported-by: Becky Ligon <ligon@clemson.edu> Signed-off-by: Martin Brandenburg <martin@omnibond.com> Fixes: 68a24a6cc4a6 ("orangefs: implement statx") Cc: stable@vger.kernel.org Cc: hubcap@omnibond.com Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26orangefs: set i_size on new symlinkMartin Brandenburg
commit f6a4b4c9d07dda90c7c29dae96d6119ac6425dca upstream. As long as a symlink inode remains in-core, the destination (and therefore size) will not be re-fetched from the server, as it cannot change. The original implementation of the attribute cache assumed that setting the expiry time in the past was sufficient to cause a re-fetch of all attributes on the next getattr. That does not work in this case. The bug manifested itself as follows. When the command sequence touch foo; ln -s foo bar; ls -l bar is run, the output was lrwxrwxrwx. 1 fedora fedora 4906 Apr 24 19:10 bar -> foo However, after a re-mount, ls -l bar produces lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo After this commit, even before a re-mount, the output is lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo Reported-by: Becky Ligon <ligon@clemson.edu> Signed-off-by: Martin Brandenburg <martin@omnibond.com> Fixes: 71680c18c8f2 ("orangefs: Cache getattr results.") Cc: stable@vger.kernel.org Cc: hubcap@omnibond.com Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26iwlwifi: fw: harden page loading codeLuca Coelho
commit 9039d985811d5b109b58b202b7594fd24e433fed upstream. The page loading code trusts the data provided in the firmware images a bit too much and may cause a buffer overflow or copy unknown data if the block sizes don't match what we expect. To prevent potential problems, harden the code by checking if the sizes we are copying are what we expect. Cc: stable@vger.kernel.org Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26x86/intel_rdt: Enable CMT and MBM on new Skylake steppingTony Luck
commit 1d9f3e20a56d33e55748552aeec597f58542f92d upstream. New stepping of Skylake has fixes for cache occupancy and memory bandwidth monitoring. Update the code to enable these by default on newer steppings. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org # v4.14 Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: https://lkml.kernel.org/r/20180608160732.9842-1-tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26genirq/migration: Avoid out of line call if pending is not setThomas Gleixner
commit d340ebd696f921d3ad01b8c0c29dd38f2ad2bf3e upstream. The upcoming fix for the -EBUSY return from affinity settings requires to use the irq_move_irq() functionality even on irq remapped interrupts. To avoid the out of line call, move the check for the pending bit into an inline helper. Preparatory change for the real fix. No functional change. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26genirq/affinity: Defer affinity setting if irq chip is busyThomas Gleixner
commit 12f47073a40f6aa75119d8f5df4077b7f334cced upstream. The case that interrupt affinity setting fails with -EBUSY can be handled in the kernel completely by using the already available generic pending infrastructure. If a irq_chip::set_affinity() fails with -EBUSY, handle it like the interrupts for which irq_chip::set_affinity() can only be invoked from interrupt context. Copy the new affinity mask to irq_desc::pending_mask and set the affinity pending bit. The next raised interrupt for the affected irq will check the pending bit and try to set the new affinity from the handler. This avoids that -EBUSY is returned when an affinity change is requested from user space and the previous change has not been cleaned up. The new affinity will take effect when the next interrupt is raised from the device. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26genirq/generic_pending: Do not lose pending affinity updateThomas Gleixner
commit a33a5d2d16cb84bea8d5f5510f3a41aa48b5c467 upstream. The generic pending interrupt mechanism moves interrupts from the interrupt handler on the original target CPU to the new destination CPU. This is required for x86 and ia64 due to the way the interrupt delivery and acknowledge works if the interrupts are not remapped. However that update can fail for various reasons. Some of them are valid reasons to discard the pending update, but the case, when the previous move has not been fully cleaned up is not a legit reason to fail. Check the return value of irq_do_set_affinity() for -EBUSY, which indicates a pending cleanup, and rearm the pending move in the irq dexcriptor so it's tried again when the next interrupt arrives. Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26irq_remapping: Use apic_ack_irq()Thomas Gleixner
commit 8a2b7d142e7ac477d52f5f92251e59fc136d7ddd upstream. To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of the special ir_ack_apic_edge() implementation which is merily a wrapper around ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.555716895@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26x86/platform/uv: Use apic_ack_irq()Thomas Gleixner
commit 839b0f1c4ef674cd929a42304c078afca278581a upstream. To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of the special uv_ack_apic() implementation which is merily a wrapper around ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.721691398@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26x86/ioapic: Use apic_ack_irq()Thomas Gleixner
commit 2b04e46d8d0b9b7ac08ded672e3eab823f01d77a upstream. To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of directly invoking ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.639011135@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26x86/apic: Provide apic_ack_irq()Thomas Gleixner
commit c0255770ccdc77ef2184d2a0a2e0cde09d2b44a4 upstream. apic_ack_edge() is explicitely for handling interrupt affinity cleanup when interrupt remapping is not available or disable. Remapped interrupts and also some of the platform specific special interrupts, e.g. UV, invoke ack_APIC_irq() directly. To address the issue of failing an affinity update with -EBUSY the delayed affinity mechanism can be reused, but ack_APIC_irq() does not handle that. Adding this to ack_APIC_irq() is not possible, because that function is also used for exceptions and directly handled interrupts like IPIs. Create a new function, which just contains the conditional invocation of irq_move_irq() and the final ack_APIC_irq(). Reuse the new function in apic_ack_edge(). Preparatory change for the real fix. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26x86/apic/vector: Prevent hlist corruption and leaksThomas Gleixner
commit 80ae7b1a918e78b0bae88b0c0ad413d3fdced968 upstream. Several people observed the WARN_ON() in irq_matrix_free() which triggers when the caller tries to free an vector which is not in the allocation range. Song provided the trace information which allowed to decode the root cause. The rework of the vector allocation mechanism failed to preserve a sanity check, which prevents setting a new target vector/CPU when the previous affinity change has not fully completed. As a result a half finished affinity change can be overwritten, which can cause the leak of a irq descriptor pointer on the previous target CPU and double enqueue of the hlist head into the cleanup lists of two or more CPUs. After one CPU cleaned up its vector the next CPU will invoke the cleanup handler with vector 0, which triggers the out of range warning in the matrix allocator. Prevent this by checking the apic_data of the interrupt whether the move_in_progress flag is false and the hlist node is not hashed. Return -EBUSY if not. This prevents the damage and restores the behaviour before the vector allocation rework, but due to other changes in that area it also widens the chance that user space can observe -EBUSY. In theory this should be fine, but actually not all user space tools handle -EBUSY correctly. Addressing that is not part of this fix, but will be addressed in follow up patches. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Dmitry Safonov <0x7f454c46@gmail.com> Reported-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26x86/vector: Fix the args of vector_alloc tracepointDou Liyang
commit 838d76d63ec4eaeaa12bedfa50f261480f615200 upstream. The vector_alloc tracepont reversed the reserved and ret aggs, that made the trace print wrong. Exchange them. Fixes: 8d1e3dca7de6 ("x86/vector: Add tracepoints for vector management") Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180601065031.21872-1-douly.fnst@cn.fujitsu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26w1: mxc_w1: Enable clock before calling clk_get_rate() on itStefan Potyra
commit 955bc61328dc0a297fb3baccd84e9d3aee501ed8 upstream. According to the API, you may only call clk_get_rate() after actually enabling it. Found by Linux Driver Verification project (linuxtesting.org). Fixes: a5fd9139f74c ("w1: add 1-wire master driver for i.MX27 / i.MX31") Signed-off-by: Stefan Potyra <Stefan.Potyra@elektrobit.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26nvme/pci: Sync controller reset for AER slot_resetKeith Busch
commit cc1d5e749a2e1cf59fa940b976181e631d6985e1 upstream. AER handling expects a successful return from slot_reset means the driver made the device functional again. The nvme driver had been using an asynchronous reset to recover the device, so the device may still be initializing after control is returned to the AER handler. This creates problems for subsequent event handling, causing the initializion to fail. This patch fixes that by syncing the controller reset before returning to the AER driver, and reporting the true state of the reset. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657 Reported-by: Alex Gagniuc <mr.nuke.me@gmail.com> Cc: Sinan Kaya <okaya@codeaurora.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Tested-by: Alex Gagniuc <mr.nuke.me@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirkHans de Goede
commit 2cfce3a86b64b53f0a70e92a6a659c720c319b45 upstream. Commit 184add2ca23c ("libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs") disabled LPM for SanDisk SD7UB3Q*G1001 SSDs. This has lead to several reports of users of that SSD where LPM was working fine and who know have a significantly increased idle power consumption on their laptops. Likely there is another problem on the T450s from the original reporter which gets exposed by the uncore reaching deeper sleep states (higher PC-states) due to LPM being enabled. The problem as reported, a hardfreeze about once a day, already did not sound like it would be caused by LPM and the reports of the SSD working fine confirm this. The original reporter is ok with dropping the quirk. A X250 user has reported the same hard freeze problem and for him the problem went away after unrelated updates, I suspect some GPU driver stack changes fixed things. TL;DR: The original reporters problem were triggered by LPM but not an LPM issue, so drop the quirk for the SSD in question. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1583207 Cc: stable@vger.kernel.org Cc: Richard W.M. Jones <rjones@redhat.com> Cc: Lorenzo Dalrio <lorenzo.dalrio@gmail.com> Reported-by: Lorenzo Dalrio <lorenzo.dalrio@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "Richard W.M. Jones" <rjones@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26libata: zpodd: small read overflow in eject_tray()Dan Carpenter
commit 18c9a99bce2a57dfd7e881658703b5d7469cc7b9 upstream. We read from the cdb[] buffer in ata_exec_internal_sg(). It has to be ATAPI_CDB_LEN (16) bytes long, but this buffer is only 12 bytes. Fixes: 213342053db5 ("libata: handle power transition of ODD") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26cpufreq: governors: Fix long idle detection logic in load calculationChen Yu
commit 7592019634f8473f0b0973ce79297183077bdbc2 upstream. According to current code implementation, detecting the long idle period is done by checking if the interval between two adjacent utilization update handlers is long enough. Although this mechanism can detect if the idle period is long enough (no utilization hooks invoked during idle period), it might not cover a corner case: if the task has occupied the CPU for too long which causes no context switches during that period, then no utilization handler will be launched until this high prio task is scheduled out. As a result, the idle_periods field might be calculated incorrectly because it regards the 100% load as 0% and makes the conservative governor who uses this field confusing. Change the detection to compare the idle_time with sampling_rate directly. Reported-by: Artem S. Tashkinov <t.artem@mailcity.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26cpufreq: ti-cpufreq: Fix an incorrect error return valueSuman Anna
commit e5d295b06d69a1924665a16a4987be475addd00f upstream. Commit 05829d9431df (cpufreq: ti-cpufreq: kfree opp_data when failure) has fixed a memory leak in the failure path, however the patch returned a positive value on get_cpu_device() failure instead of the previous negative value. Fix this incorrect error return value properly. Fixes: 05829d9431df (cpufreq: ti-cpufreq: kfree opp_data when failure) Cc: 4.14+ <stable@vger.kernel.org> # v4.14+ Signed-off-by: Suman Anna <s-anna@ti.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26cpufreq: Fix new policy initialization during limits updates via sysfsTao Wang
commit c7d1f119c48f64bebf0fa1e326af577c6152fe30 upstream. If the policy limits are updated via cpufreq_update_policy() and subsequently via sysfs, the limits stored in user_policy may be set incorrectly. For example, if both min and max are set via sysfs to the maximum available frequency, user_policy.min and user_policy.max will also be the maximum. If a policy notifier triggered by cpufreq_update_policy() lowers both the min and the max at this point, that change is not reflected by the user_policy limits, so if the max is updated again via sysfs to the same lower value, then user_policy.max will be lower than user_policy.min which shouldn't happen. In particular, if one of the policy CPUs is then taken offline and back online, cpufreq_set_policy() will fail for it due to a failing limits check. To prevent that from happening, initialize the min and max fields of the new_policy object to the ones stored in user_policy that were previously set via sysfs. Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject & changelog ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueueTejun Heo
commit f183464684190bacbfb14623bd3e4e51b7575b4c upstream. From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Wed, 23 May 2018 10:29:00 -0700 cgwb_release() punts the actual release to cgwb_release_workfn() on system_wq. Depending on the number of cgroups or block devices, there can be a lot of cgwb_release_workfn() in flight at the same time. We're periodically seeing close to 256 kworkers getting stuck with the following stack trace and overtime the entire system gets stuck. [<ffffffff810ee40c>] _synchronize_rcu_expedited.constprop.72+0x2fc/0x330 [<ffffffff810ee634>] synchronize_rcu_expedited+0x24/0x30 [<ffffffff811ccf23>] bdi_unregister+0x53/0x290 [<ffffffff811cd1e9>] release_bdi+0x89/0xc0 [<ffffffff811cd645>] wb_exit+0x85/0xa0 [<ffffffff811cdc84>] cgwb_release_workfn+0x54/0xb0 [<ffffffff810a68d0>] process_one_work+0x150/0x410 [<ffffffff810a71fd>] worker_thread+0x6d/0x520 [<ffffffff810ad3dc>] kthread+0x12c/0x160 [<ffffffff81969019>] ret_from_fork+0x29/0x40 [<ffffffffffffffff>] 0xffffffffffffffff The events leading to the lockup are... 1. A lot of cgwb_release_workfn() is queued at the same time and all system_wq kworkers are assigned to execute them. 2. They all end up calling synchronize_rcu_expedited(). One of them wins and tries to perform the expedited synchronization. 3. However, that invovles queueing rcu_exp_work to system_wq and waiting for it. Because #1 is holding all available kworkers on system_wq, rcu_exp_work can't be executed. cgwb_release_workfn() is waiting for synchronize_rcu_expedited() which in turn is waiting for cgwb_release_workfn() to free up some of the kworkers. We shouldn't be scheduling hundreds of cgwb_release_workfn() at the same time. There's nothing to be gained from that. This patch updates cgwb release path to use a dedicated percpu workqueue with @max_active of 1. While this resolves the problem at hand, it might be a good idea to isolate rcu_exp_work to its own workqueue too as it can be used from various paths and is prone to this sort of indirect A-A deadlocks. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26blk-mq: reinit q->tag_set_list entry only after grace periodRoman Pen
commit a347c7ad8edf4c5685154f3fdc3c12fc1db800ba upstream. It is not allowed to reinit q->tag_set_list list entry while RCU grace period has not completed yet, otherwise the following soft lockup in blk_mq_sched_restart() happens: [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270] [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000 [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150 [ 1064.256510] Call Trace: [ 1064.256664] <IRQ> [ 1064.256824] blk_mq_free_request+0xea/0x100 [ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client] [ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client] [ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core] [ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client] [ 1064.257669] ib_create_qp+0x321/0x380 [ib_core] [ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core] [ 1064.258007] irq_poll_softirq+0xb7/0xe0 [ 1064.258165] __do_softirq+0x106/0x2a2 [ 1064.258328] irq_exit+0x92/0xa0 [ 1064.258509] do_IRQ+0x4a/0xd0 [ 1064.258660] common_interrupt+0x7a/0x7a [ 1064.258818] </IRQ> Meanwhile another context frees other queue but with the same set of shared tags: [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds. [ 1288.201833] bash D 0 5910 5820 0x00000000 [ 1288.202016] Call Trace: [ 1288.202315] schedule+0x32/0x80 [ 1288.202462] schedule_timeout+0x1e5/0x380 [ 1288.203838] wait_for_completion+0xb0/0x120 [ 1288.204137] __wait_rcu_gp+0x125/0x160 [ 1288.204287] synchronize_sched+0x6e/0x80 [ 1288.204770] blk_mq_free_queue+0x74/0xe0 [ 1288.204922] blk_cleanup_queue+0xc7/0x110 [ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client] [ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client] [ 1288.205548] kernfs_fop_write+0x109/0x180 [ 1288.206328] vfs_write+0xb3/0x1a0 [ 1288.206476] SyS_write+0x52/0xc0 [ 1288.206624] do_syscall_64+0x68/0x1d0 [ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 What happened is the following: 1. There are several MQ queues with shared tags. 2. One queue is about to be freed and now task is in blk_mq_del_queue_tag_set(). 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in tag list in order to find hctx to restart. Because linked list entry was modified in blk_mq_del_queue_tag_set() without proper waiting for a grace period, blk_mq_sched_restart() never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup. Fix is simple: reinit list entry after an RCU grace period elapsed. Fixes: Fixes: 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: stable@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-block@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26nbd: use bd_set_size when updating disk sizeJosef Bacik
commit 9e2b19675d1338d2a38e99194756f2db44a081df upstream. When we stopped relying on the bdev everywhere I broke updating the block device size on the fly, which ceph relies on. We can't just do set_capacity, we also have to do bd_set_size so things like parted will notice the device size change. Fixes: 29eaadc ("nbd: stop using the bdev everywhere") cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26nbd: update size when connectedJosef Bacik
commit c3f7c9397609705ef848cc98a5fb429b3e90c3c4 upstream. I messed up changing the size of an NBD device while it was connected by not actually updating the device or doing the uevent. Fix this by updating everything if we're connected and we change the size. cc: stable@vger.kernel.org Fixes: 639812a ("nbd: don't set the device size until we're connected") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26nbd: fix nbd device deletionJosef Bacik
commit 8364da4751cf22201d74933d5e634176f44ed407 upstream. This fixes a use after free bug, we shouldn't be doing disk->queue right after we do del_gendisk(disk). Save the queue and do the cleanup after the del_gendisk. Fixes: c6a4759ea0c9 ("nbd: add device refcounting") cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26cifs: For SMB2 security informaion query, check for minimum sized security ↵Shirish Pargaonkar
descriptor instead of sizeof FileAllInformation class commit ee25c6dd7b05113783ce1f4fab6b30fc00d29b8d upstream. Validate_buf () function checks for an expected minimum sized response passed to query_info() function. For security information, the size of a security descriptor can be smaller (one subauthority, no ACEs) than the size of the structure that defines FileInfoClass of FileAllInformation. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199725 Cc: <stable@vger.kernel.org> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Noah Morrison <noah.morrison@rubrik.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26CIFS: 511c54a2f69195b28afb9dd119f03787b1625bb4 adds a check for session expiryMark Syms
commit d81243c697ffc71f983736e7da2db31a8be0001f upstream. Handle this additional status in the same way as SESSION_EXPIRED. Signed-off-by: Mark Syms <mark.syms@citrix.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26smb3: on reconnect set PreviousSessionId fieldSteve French
commit b2adf22fdfba85a6701c481faccdbbb3a418ccfc upstream. The server detects reconnect by the (non-zero) value in PreviousSessionId of SMB2/SMB3 SessionSetup request, but this behavior regressed due to commit 166cea4dc3a4f66f020cfb9286225ecd228ab61d ("SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setup") CC: Stable <stable@vger.kernel.org> CC: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26smb3: fix various xid leaksSteve French
commit cfe89091644c441a1ade6dae6d2e47b715648615 upstream. Fix a few cases where we were not freeing the xid which led to active requests being non-zero at unmount time. Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26x86/MCE: Fix stack out-of-bounds write in mce-inject.c: Flags_read()Tony Luck
commit 985c78d3ff8e9c74450fa2bb08eb55e680d999ca upstream. Each of the strings that we want to put into the buf[MAX_FLAG_OPT_SIZE] in flags_read() is two characters long. But the sprintf() adds a trailing newline and will add a terminating NUL byte. So MAX_FLAG_OPT_SIZE needs to be 4. sprintf() calls vsnprintf() and *that* does return: " * The return value is the number of characters which would * be generated for the given input, excluding the trailing * '\0', as per ISO C99." Note the "excluding". Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180427163707.ktaiysvbk3yhk4wm@agluck-desk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ALSA: hda: add dock and led support for HP ProBook 640 G4Dennis Wassenberg
commit 7eef32c1ef895a3a96463f9cbd04203007cd5555 upstream. This patch adds missing initialisation for HP 2013 UltraSlim Dock Line-In/Out PINs and activates keyboard mute/micmute leds for HP ProBook 640 G4 Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ALSA: hda: add dock and led support for HP EliteBook 830 G5Dennis Wassenberg
commit 2861751f67b91e1d24e68010ced96614fb3140f4 upstream. This patch adds missing initialisation for HP 2013 UltraSlim Dock Line-In/Out PINs and activates keyboard mute/micmute leds for HP EliteBook 830 G5 Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream()Bo Chen
commit a3aa60d511746bd6c0d0366d4eb90a7998bcde8b upstream. When 'kzalloc()' fails in 'snd_hda_attach_pcm_stream()', a new pcm instance is created without setting its operators via 'snd_pcm_set_ops()'. Following operations on the new pcm instance can trigger kernel null pointer dereferences and cause kernel oops. This bug was found with my work on building a gray-box fault-injection tool for linux-kernel-module binaries. A kernel null pointer dereference was confirmed from line 'substream->ops->open()' in function 'snd_pcm_open_substream()' in file 'sound/core/pcm_native.c'. This patch fixes the bug by calling 'snd_device_free()' in the error handling path of 'kzalloc()', which removes the new pcm instance from the snd card before returns with an error code. Signed-off-by: Bo Chen <chenbo@pdx.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ALSA: hda/conexant - Add fixup for HP Z2 G4 workstationTakashi Iwai
commit f16041df4c360eccacfe90f96673b37829e4c959 upstream. HP Z2 G4 requires the same workaround as other HP machines that have no mic-pin detection. Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ALSA: hda/realtek - Enable mic-mute hotkey for several Lenovo AIOsHui Wang
commit 986376b68dcc95bb7df60ad30c2353c1f7578fa5 upstream. We have several Lenovo AIOs like M810z, M820z and M920z, they have the same design for mic-mute hotkey and led and they use the same codec with the same pin configuration, so use the pin conf table to apply fix to all of them. Fixes: 29693efcea0f ("ALSA: hda - Fix micmute hotkey problem for a lenovo AIO machine") Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ALSA: usb-audio: Disable the quirk for Nura headsetTakashi Iwai
commit 5ebf6b1e459606d7fbf4fc67d2c28a6540953d93 upstream. The commit 33193dca671c ("ALSA: usb-audio: Add a quirk for Nura's first gen headset") added a quirk for Nura headset with USB ID 0a12:1243, with a hope that it doesn't conflict with others. Unfortunately, other devices (e.g. Philips Wecall) with the very same ID got broken by this change, spewing an error like: usb 2-1.8.2: 2:1: cannot set freq 48000 to ep 0x3 Until we find a proper solution, fix the regression at first by disabling the added quirk entry. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199905 Fixes: 33193dca671c ("ALSA: usb-audio: Add a quirk for Nura's first gen headset") Reviewed-by: Martin Peres <martin.peres@free.fr> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26btrfs: scrub: Don't use inode pages for device replaceQu Wenruo
commit ac0b4145d662a3b9e34085dea460fb06ede9b69b upstream. [BUG] Btrfs can create compressed extent without checksum (even though it shouldn't), and if we then try to replace device containing such extent, the result device will contain all the uncompressed data instead of the compressed one. Test case already submitted to fstests: https://patchwork.kernel.org/patch/10442353/ [CAUSE] When handling compressed extent without checksum, device replace will goe into copy_nocow_pages() function. In that function, btrfs will get all inodes referring to this data extents and then use find_or_create_page() to get pages direct from that inode. The problem here is, pages directly from inode are always uncompressed. And for compressed data extent, they mismatch with on-disk data. Thus this leads to corrupted compressed data extent written to replace device. [FIX] In this attempt, we could just remove the "optimization" branch, and let unified scrub_pages() to handle it. Although scrub_pages() won't bother reusing page cache, it will be a little slower, but it does the correct csum checking and won't cause such data corruption caused by "optimization". Note about the fix: this is the minimal fix that can be backported to older stable trees without conflicts. The whole callchain from copy_nocow_pages() can be deleted, and will be in followup patches. Fixes: ff023aac3119 ("Btrfs: add code to scrub to copy read data to another disk") CC: stable@vger.kernel.org # 4.4+ Reported-by: James Harvey <jamespharvey20@gmail.com> Reviewed-by: James Harvey <jamespharvey20@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> [ remove code removal, add note why ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26btrfs: return error value if create_io_em failed in cow_file_rangeSu Yue
commit 090a127afa8f73e9618d4058d6755f7ec7453dd6 upstream. In cow_file_range(), create_io_em() may fail, but its return value is not recorded. Then return value may be 0 even it failed which is a wrong behavior. Let cow_file_range() return PTR_ERR(em) if create_io_em() failed. Fixes: 6f9994dbabe5 ("Btrfs: create a helper to create em for IO") CC: stable@vger.kernel.org # 4.11+ Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2()Omar Sandoval
commit fd4e994bd1f9dc9628e168a7f619bf69f6984635 upstream. If we have invalid flags set, when we error out we must drop our writer counter and free the buffer we allocated for the arguments. This bug is trivially reproduced with the following program on 4.7+: #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <linux/btrfs.h> #include <linux/btrfs_tree.h> int main(int argc, char **argv) { struct btrfs_ioctl_vol_args_v2 vol_args = { .flags = UINT64_MAX, }; int ret; int fd; if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_WRONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args); if (ret == -1) perror("ioctl"); close(fd); return EXIT_SUCCESS; } When unmounting the filesystem, we'll hit the WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the filesystem to be remounted read-only as the writer count will stay lifted. Fixes: 6b526ed70cf1 ("btrfs: introduce device delete by devid") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26Btrfs: fix clone vs chattr NODATASUM raceOmar Sandoval
commit b5c40d598f5408bd0ca22dfffa82f03cd9433f23 upstream. In btrfs_clone_files(), we must check the NODATASUM flag while the inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags() will change the flags after we check and we can end up with a party checksummed file. The race window is only a few instructions in size, between the if and the locks which is: 3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode)) 3835 return -EISDIR; where the setflags must be run and toggle the NODATASUM flag (provided the file size is 0). The clone will block on the inode lock, segflags takes the inode lock, changes flags, releases log and clone continues. Not impossible but still needs a lot of bad luck to hit unintentionally. Fixes: 0e7b824c4ef9 ("Btrfs: don't make a file partly checksummed through file clone") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26driver core: Don't ignore class_dir_create_and_add() failure.Tetsuo Handa
commit 84d0c27d6233a9ba0578b20f5a09701eb66cee42 upstream. syzbot is hitting WARN() at kernfs_add_one() [1]. This is because kernfs_create_link() is confused by previous device_add() call which continued without setting dev->kobj.parent field when get_device_parent() failed by memory allocation fault injection. Fix this by propagating the error from class_dir_create_and_add() to the calllers of get_device_parent(). [1] https://syzkaller.appspot.com/bug?id=fae0fb607989ea744526d1c082a5b8de6529116f Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+df47f81c226b31d89fb1@syzkaller.appspotmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: fix fencepost error in check for inode count overflow during resizeJan Kara
commit 4f2f76f751433908364ccff82f437a57d0e6e9b7 upstream. ext4_resize_fs() has an off-by-one bug when checking whether growing of a filesystem will not overflow inode count. As a result it allows a filesystem with 8192 inodes per group to grow to 64TB which overflows inode count to 0 and makes filesystem unusable. Fix it. Cc: stable@vger.kernel.org Fixes: 3f8a6411fbada1fa482276591e037f3b1adcf55b Reported-by: Jaco Kroon <jaco@uls.co.za> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: correctly handle a zero-length xattr with a non-zero e_value_offsTheodore Ts'o
commit 8a2b307c21d4b290e3cbe33f768f194286d07c23 upstream. Ext4 will always create ext4 extended attributes which do not have a value (where e_value_size is zero) with e_value_offs set to zero. In most places e_value_offs will not be used in a substantive way if e_value_size is zero. There was one exception to this, which is in ext4_xattr_set_entry(), where if there is a maliciously crafted file system where there is an extended attribute with e_value_offs is non-zero and e_value_size is 0, the attempt to remove this xattr will result in a negative value getting passed to memmove, leading to the following sadness: [ 41.225365] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) [ 44.538641] BUG: unable to handle kernel paging request at ffff9ec9a3000000 [ 44.538733] IP: __memmove+0x81/0x1a0 [ 44.538755] PGD 1249bd067 P4D 1249bd067 PUD 1249c1067 PMD 80000001230000e1 [ 44.538793] Oops: 0003 [#1] SMP PTI [ 44.539074] CPU: 0 PID: 1470 Comm: poc Not tainted 4.16.0-rc1+ #1 ... [ 44.539475] Call Trace: [ 44.539832] ext4_xattr_set_entry+0x9e7/0xf80 ... [ 44.539972] ext4_xattr_block_set+0x212/0xea0 ... [ 44.540041] ext4_xattr_set_handle+0x514/0x610 [ 44.540065] ext4_xattr_set+0x7f/0x120 [ 44.540090] __vfs_removexattr+0x4d/0x60 [ 44.540112] vfs_removexattr+0x75/0xe0 [ 44.540132] removexattr+0x4d/0x80 ... [ 44.540279] path_removexattr+0x91/0xb0 [ 44.540300] SyS_removexattr+0xf/0x20 [ 44.540322] do_syscall_64+0x71/0x120 [ 44.540344] entry_SYSCALL_64_after_hwframe+0x21/0x86 https://bugzilla.kernel.org/show_bug.cgi?id=199347 This addresses CVE-2018-10840. Reported-by: "Xu, Wen" <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Fixes: dec214d00e0d7 ("ext4: xattr inode deduplication") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget()Theodore Ts'o
commit eb9b5f01c33adebc31cbc236c02695f605b0e417 upstream. If ext4_find_inline_data_nolock() returns an error it needs to get reflected up to ext4_iget(). In order to fix this, ext4_iget_extra_inode() needs to return an error (and not return void). This is related to "ext4: do not allow external inodes for inline data" (which fixes CVE-2018-11412) in that in the errors=continue case, it would be useful to for userspace to receive an error indicating that file system is corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>