summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2019-11-15x86/bugs: Add ITLB_MULTIHIT bug infrastructureVineela Tummalapalli
commit db4d30fbb71b47e4ecb11c4efa5d8aad4b03dfae upstream. Some processors may incur a machine check error possibly resulting in an unrecoverable CPU lockup when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. The relevant erratum can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205195 There are other processors affected for which the erratum does not fully disclose the impact. This issue affects both bare-metal x86 page tables and EPT. It can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 3.16: - Use next available X86_BUG bit - Don't use BIT() in msr-index.h because it's a UAPI header - No support for X86_VENDOR_HYGON, ATOM_AIRMONT_NP - Adjust filename, context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUsJosh Poimboeuf
commit 012206a822a8b6ac09125bfaa210a95b9eb8f1c1 upstream. For new IBRS_ALL CPUs, the Enhanced IBRS check at the beginning of cpu_bugs_smt_update() causes the function to return early, unintentionally skipping the MDS and TAA logic. This is not a problem for MDS, because there appears to be no overlap between IBRS_ALL and MDS-affected CPUs. So the MDS mitigation would be disabled and nothing would need to be done in this function anyway. But for TAA, the TAA_MSG_SMT string will never get printed on Cascade Lake and newer. The check is superfluous anyway: when 'spectre_v2_enabled' is SPECTRE_V2_IBRS_ENHANCED, 'spectre_v2_user' is always SPECTRE_V2_USER_NONE, and so the 'spectre_v2_user' switch statement handles it appropriately by doing nothing. So just remove the check. Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort") Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15x86/tsx: Add config options to set tsx=on|off|autoMichal Hocko
commit db616173d787395787ecc93eef075fa975227b10 upstream. There is a general consensus that TSX usage is not largely spread while the history shows there is a non trivial space for side channel attacks possible. Therefore the tsx is disabled by default even on platforms that might have a safe implementation of TSX according to the current knowledge. This is a fair trade off to make. There are, however, workloads that really do benefit from using TSX and updating to a newer kernel with TSX disabled might introduce a noticeable regressions. This would be especially a problem for Linux distributions which will provide TAA mitigations. Introduce config options X86_INTEL_TSX_MODE_OFF, X86_INTEL_TSX_MODE_ON and X86_INTEL_TSX_MODE_AUTO to control the TSX feature. The config setting can be overridden by the tsx cmdline options. [ bp: Text cleanups from Josh. ] Suggested-by: Borislav Petkov <bpetkov@suse.de> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> [bwh: Backported to 3.16: adjust doc filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15x86/tsx: Add "auto" option to the tsx= cmdline parameterPawan Gupta
commit 7531a3596e3272d1f6841e0d601a614555dc6b65 upstream. Platforms which are not affected by X86_BUG_TAA may want the TSX feature enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto disable TSX when X86_BUG_TAA is present, otherwise enable TSX. More details on X86_BUG_TAA can be found here: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html [ bp: Extend the arg buffer to accommodate "auto\0". ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> [bwh: Backported to 4.4: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15kvm/x86: Export MDS_NO=0 to guests when TSX is enabledPawan Gupta
commit e1d38b63acd843cfdd4222bf19a26700fd5c699e upstream. Export the IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0 to guests on TSX Async Abort(TAA) affected hosts that have TSX enabled and updated microcode. This is required so that the guests don't complain, "Vulnerable: Clear CPU buffers attempted, no microcode" when the host has the updated microcode to clear CPU buffers. Microcode update also adds support for MSR_IA32_TSX_CTRL which is enumerated by the ARCH_CAP_TSX_CTRL bit in IA32_ARCH_CAPABILITIES MSR. Guests can't do this check themselves when the ARCH_CAP_TSX_CTRL bit is not exported to the guests. In this case export MDS_NO=0 to the guests. When guests have CPUID.MD_CLEAR=1, they deploy MDS mitigation which also mitigates TAA. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15x86/speculation/taa: Add sysfs reporting for TSX Async AbortPawan Gupta
commit 6608b45ac5ecb56f9e171252229c39580cc85f0f upstream. Add the sysfs reporting file for TSX Async Abort. It exposes the vulnerability and the mitigation state similar to the existing files for the other hardware vulnerabilities. Sysfs file path is: /sys/devices/system/cpu/vulnerabilities/tsx_async_abort Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15x86/speculation/taa: Add mitigation for TSX Async AbortPawan Gupta
commit 1b42f017415b46c317e71d41c34ec088417a1883 upstream. TSX Async Abort (TAA) is a side channel vulnerability to the internal buffers in some Intel processors similar to Microachitectural Data Sampling (MDS). In this case, certain loads may speculatively pass invalid data to dependent operations when an asynchronous abort condition is pending in a TSX transaction. This includes loads with no fault or assist condition. Such loads may speculatively expose stale data from the uarch data structures as in MDS. Scope of exposure is within the same-thread and cross-thread. This issue affects all current processors that support TSX, but do not have ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES. On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0, CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers using VERW or L1D_FLUSH, there is no additional mitigation needed for TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by disabling the Transactional Synchronization Extensions (TSX) feature. A new MSR IA32_TSX_CTRL in future and current processors after a microcode update can be used to control the TSX feature. There are two bits in that MSR: * TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted Transactional Memory (RTM). * TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally disabled with updated microcode but still enumerated as present by CPUID(EAX=7).EBX{bit4}. The second mitigation approach is similar to MDS which is clearing the affected CPU buffers on return to user space and when entering a guest. Relevant microcode update is required for the mitigation to work. More details on this approach can be found here: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html The TSX feature can be controlled by the "tsx" command line parameter. If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is deployed. The effective mitigation state can be read from sysfs. [ bp: - massage + comments cleanup - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh. - remove partial TAA mitigation in update_mds_branch_idle() - Josh. - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> [bwh: Backported to 3.16: - Use next available X86_BUG bit - Don't use BIT() in msr-index.h because it's a UAPI header - Add #include "cpu.h" in bugs.c - Drop __ro_after_init attribute - Drop "nosmt" support - Adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15x86/cpu: Add a "tsx=" cmdline option with TSX disabled by defaultPawan Gupta
commit 95c5824f75f3ba4c9e8e5a4b1a623c95390ac266 upstream. Add a kernel cmdline parameter "tsx" to control the Transactional Synchronization Extensions (TSX) feature. On CPUs that support TSX control, use "tsx=on|off" to enable or disable TSX. Not specifying this option is equivalent to "tsx=off". This is because on certain processors TSX may be used as a part of a speculative side channel attack. Carve out the TSX controlling functionality into a separate compilation unit because TSX is a CPU feature while the TSX async abort control machinery will go to cpu/bugs.c. [ bp: - Massage, shorten and clear the arg buffer. - Clarifications of the tsx= possible options - Josh. - Expand on TSX_CTRL availability - Pawan. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> [bwh: Backported to 3.16: - Drop __ro_after_init attribute - Adjust filenames, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15x86/cpu: Add a helper function x86_read_arch_cap_msr()Pawan Gupta
commit 286836a70433fb64131d2590f4bf512097c255e1 upstream. Add a helper function to read the IA32_ARCH_CAPABILITIES MSR. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15x86/msr: Add the IA32_TSX_CTRL MSRPawan Gupta
commit c2955f270a84762343000f103e0640d29c7a96f3 upstream. Transactional Synchronization Extensions (TSX) may be used on certain processors as part of a speculative side channel attack. A microcode update for existing processors that are vulnerable to this attack will add a new MSR - IA32_TSX_CTRL to allow the system administrator the option to disable TSX as one of the possible mitigations. The CPUs which get this new MSR after a microcode upgrade are the ones which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all CPU buffers takes care of the TAA case as well. [ Note that future processors that are not vulnerable will also support the IA32_TSX_CTRL MSR. ] Add defines for the new IA32_TSX_CTRL MSR and its bits. TSX has two sub-features: 1. Restricted Transactional Memory (RTM) is an explicitly-used feature where new instructions begin and end TSX transactions. 2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of "old" style locks are used by software. Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the IA32_TSX_CTRL MSR. There are two control bits in IA32_TSX_CTRL MSR: Bit 0: When set, it disables the Restricted Transactional Memory (RTM) sub-feature of TSX (will force all transactions to abort on the XBEGIN instruction). Bit 1: When set, it disables the enumeration of the RTM and HLE feature (i.e. it will make CPUID(EAX=7).EBX{bit4} and CPUID(EAX=7).EBX{bit11} read as 0). The other TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally disabled by the new microcode but still enumerated as present by CPUID(EAX=7).EBX{bit4}, unless disabled by IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> [bwh: Backported to 3.16: - Don't use BIT() in msr-index.h because it's a UAPI header - Adjust filename, context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15KVM: x86: use Intel speculation bugs and features as derived in generic x86 codePaolo Bonzini
commit 0c54914d0c52a15db9954a76ce80fee32cf318f4 upstream. Similar to AMD bits, set the Intel bits from the vendor-independent feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care about the vendor and they should be set on AMD processors as well. Suggested-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-11-15KVM: Introduce kvm_get_arch_capabilities()Ben Hutchings
Extracted from commit 5b76a3cff011 "KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry". We will need this to let a nested hypervisor know that we have applied the mitigation for TAA. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31KVM: x86/vPMU: refine kvm_pmu err msg when event creation failedLike Xu
commit 6fc3977ccc5d3c22e851f2dce2d3ce2a0a843842 upstream. If a perf_event creation fails due to any reason of the host perf subsystem, it has no chance to log the corresponding event for guest which may cause abnormal sampling data in guest result. In debug mode, this message helps to understand the state of vPMC and we may not limit the number of occurrences but not in a spamming style. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1Helge Deller
commit 10835c854685393a921b68f529bf740fa7c9984d upstream. On parisc the privilege level of a process is stored in the lowest two bits of the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0 for the kernel and privilege level 3 for user-space. So userspace should not be allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege level to e.g. 0 to try to gain kernel privileges. This patch prevents such modifications by always setting the two lowest bits to one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are modified via ptrace calls in the native and compat ptrace paths. Link: https://bugs.gentoo.org/481768 Reported-by: Jeroen Roovers <jer@gentoo.org> Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31ARC: hide unused function unw_hdr_allocArnd Bergmann
commit fd5de2721ea7d16e2b16c4049ac49f229551b290 upstream. As kernelci.org reports, this function is not used in vdk_hs38_defconfig: arch/arc/kernel/unwind.c:188:14: warning: 'unw_hdr_alloc' defined but not used [-Wunused-function] Fixes: bc79c9a72165 ("ARC: dw2 unwind: Reinstante unwinding out of modules") Link: https://kernelci.org/build/id/5d1cae3f59b514300340c132/logs/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31x86/tls: Fix possible spectre-v1 in do_get_thread_area()Dianzhang Chen
commit 993773d11d45c90cb1c6481c2638c3d9f092ea5b upstream. The index to access the threads tls array is controlled by userspace via syscall: sys_ptrace(), hence leading to a potential exploitation of the Spectre variant 1 vulnerability. The index can be controlled from: ptrace -> arch_ptrace -> do_get_thread_area. Fix this by sanitizing the user supplied index before using it to access the p->thread.tls_array. Signed-off-by: Dianzhang Chen <dianzhangchen0@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/1561524630-3642-1-git-send-email-dianzhangchen0@gmail.com Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()Dianzhang Chen
commit 31a2fbb390fee4231281b939e1979e810f945415 upstream. The index to access the threads ptrace_bps is controlled by userspace via syscall: sys_ptrace(), hence leading to a potential exploitation of the Spectre variant 1 vulnerability. The index can be controlled from: ptrace -> arch_ptrace -> ptrace_get_debugreg. Fix this by sanitizing the user supplied index before using it access thread->ptrace_bps. Signed-off-by: Dianzhang Chen <dianzhangchen0@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/1561476617-3759-1-git-send-email-dianzhangchen0@gmail.com [bwh: Backported to 3.16: fold in fix-up from commit 223cea6a4f05 "Merge branch 'x86-pti-for-linus' of ..."] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31s390: fix stfle zero paddingHeiko Carstens
commit 4f18d869ffd056c7858f3d617c71345cf19be008 upstream. The stfle inline assembly returns the number of double words written (condition code 0) or the double words it would have written (condition code 3), if the memory array it got as parameter would have been large enough. The current stfle implementation assumes that the array is always large enough and clears those parts of the array that have not been written to with a subsequent memset call. If however the array is not large enough memset will get a negative length parameter, which means that memset clears memory until it gets an exception and the kernel crashes. To fix this simply limit the maximum length. Move also the inline assembly to an extra function to avoid clobbering of register 0, which might happen because of the added min_t invocation together with code instrumentation. The bug was introduced with commit 14375bc4eb8d ("[S390] cleanup facility list handling") but was rather harmless, since it would only write to a rather large array. It became a potential problem with commit 3ab121ab1866 ("[S390] kernel: Add z/VM LGR detection"). Since then it writes to an array with only four double words, while some machines already deliver three double words. As soon as machines have a facility bit within the fifth double a crash on IPL would happen. Fixes: 14375bc4eb8d ("[S390] cleanup facility list handling") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31powerpc/watchpoint: Restore NV GPRs while returning from exceptionRavi Bangoria
commit f474c28fbcbe42faca4eb415172c07d76adcb819 upstream. powerpc hardware triggers watchpoint before executing the instruction. To make trigger-after-execute behavior, kernel emulates the instruction. If the instruction is 'load something into non-volatile register', exception handler should restore emulated register state while returning back, otherwise there will be register state corruption. eg, adding a watchpoint on a list can corrput the list: # cat /proc/kallsyms | grep kthread_create_list c00000000121c8b8 d kthread_create_list Add watchpoint on kthread_create_list->prev: # perf record -e mem:0xc00000000121c8c0 Run some workload such that new kthread gets invoked. eg, I just logged out from console: list_add corruption. next->prev should be prev (c000000001214e00), \ but was c00000000121c8b8. (next=c00000000121c8b8). WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0 CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69 ... NIP __list_add_valid+0xb4/0xc0 LR __list_add_valid+0xb0/0xc0 Call Trace: __list_add_valid+0xb0/0xc0 (unreliable) __kthread_create_on_node+0xe0/0x260 kthread_create_on_node+0x34/0x50 create_worker+0xe8/0x260 worker_thread+0x444/0x560 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x70 List corruption happened because it uses 'load into non-volatile register' instruction: Snippet from __kthread_create_on_node: c000000000136be8: addis r29,r2,-19 c000000000136bec: ld r29,31424(r29) if (!__list_add_valid(new, prev, next)) c000000000136bf0: mr r3,r30 c000000000136bf4: mr r5,r28 c000000000136bf8: mr r4,r29 c000000000136bfc: bl c00000000059a2f8 <__list_add_valid+0x8> Register state from WARN_ON(): GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075 GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000 GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00 GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370 GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000 GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628 GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0 Watchpoint hit at 0xc000000000136bec. addis r29,r2,-19 => r29 = 0xc000000001344e00 + (-19 << 16) => r29 = 0xc000000001214e00 ld r29,31424(r29) => r29 = *(0xc000000001214e00 + 31424) => r29 = *(0xc00000000121c8c0) 0xc00000000121c8c0 is where we placed a watchpoint and thus this instruction was emulated by emulate_step. But because handle_dabr_fault did not restore emulated register state, r29 still contains stale value in above register state. Fixes: 5aae8a5370802 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31powerpc/32s: fix suspend/resume when IBATs 4-7 are usedChristophe Leroy
commit 6ecb78ef56e08d2119d337ae23cb951a640dc52d upstream. Previously, only IBAT1 and IBAT2 were used to map kernel linear mem. Since commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX"), we may have all 8 BATs used for mapping kernel text. But the suspend/restore functions only save/restore BATs 0 to 3, and clears BATs 4 to 7. Make suspend and restore functions respectively save and reload the 8 BATs on CPUs having MMU_FTR_USE_HIGH_BATS feature. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-31ARM: riscpc: fix DMARussell King
commit ffd9a1ba9fdb7f2bd1d1ad9b9243d34e96756ba2 upstream. DMA got broken a while back in two different ways: 1) a change in the behaviour of disable_irq() to wait for the interrupt to finish executing causes us to deadlock at the end of DMA. 2) a change to avoid modifying the scatterlist left the first transfer uninitialised. DMA is only used with expansion cards, so has gone unnoticed. Fixes: fa4e99899932 ("[ARM] dma: RiscPC: don't modify DMA SG entries") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-05x86/speculation: Allow guests to use SSBD even if host does notAlejandro Jimenez
commit c1f7fec1eb6a2c86d01bc22afce772c743451d88 upstream. The bits set in x86_spec_ctrl_mask are used to calculate the guest's value of SPEC_CTRL that is written to the MSR before VMENTRY, and control which mitigations the guest can enable. In the case of SSBD, unless the host has enabled SSBD always on mode (by passing "spec_store_bypass_disable=on" in the kernel parameters), the SSBD bit is not set in the mask and the guest can not properly enable the SSBD always on mitigation mode. This has been confirmed by running the SSBD PoC on a guest using the SSBD always on mitigation mode (booted with kernel parameter "spec_store_bypass_disable=on"), and verifying that the guest is vulnerable unless the host is also using SSBD always on mode. In addition, the guest OS incorrectly reports the SSB vulnerability as mitigated. Always set the SSBD bit in x86_spec_ctrl_mask when the host CPU supports it, allowing the guest to use SSBD whether or not the host has chosen to enable the mitigation in any of its modes. Fixes: be6fcb5478e9 ("x86/bugs: Rework spec_ctrl base and mask logic") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: bp@alien8.de Cc: rkrcmar@redhat.com Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/1560187210-11054-1-git-send-email-alejandro.j.jimenez@oracle.com Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-05MIPS: Add missing EHB in mtc0 -> mfc0 sequence.Dmitry Korotin
commit 0b24cae4d535045f4c9e177aa228d4e97bad212c upstream. Add a missing EHB (Execution Hazard Barrier) in mtc0 -> mfc0 sequence. Without this execution hazard barrier it's possible for the value read back from the KScratch register to be the value from before the mtc0. Reproducible on P5600 & P6600. The hazard is documented in the MIPS Architecture Reference Manual Vol. III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev 6.03 table 8.1 which includes: Producer | Consumer | Hazard ----------|----------|---------------------------- mtc0 | mfc0 | any coprocessor 0 register Signed-off-by: Dmitry Korotin <dkorotin@wavecomp.com> [paul.burton@mips.com: - Commit message tweaks. - Add Fixes tags. - Mark for stable back to v3.15 where P5600 support was introduced.] Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 3d8bfdd03072 ("MIPS: Use C0_KScratch (if present) to hold PGD pointer.") Fixes: 829dcc0a956a ("MIPS: Add MIPS P5600 probe support") Cc: linux-mips@vger.kernel.org [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-05x86/apic: Fix integer overflow on 10 bit left shift of cpu_khzColin Ian King
commit ea136a112d89bade596314a1ae49f748902f4727 upstream. The left shift of unsigned int cpu_khz will overflow for large values of cpu_khz, so cast it to a long long before shifting it to avoid overvlow. For example, this can happen when cpu_khz is 4194305, i.e. ~4.2 GHz. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on kexec") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20190619181446.13635-1-colin.king@canonical.com [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-05KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LISTDave Martin
commit df205b5c63281e4f32caac22adda18fd68795e80 upstream. Since commit d26c25a9d19b ("arm64: KVM: Tighten guest core register access from userspace"), KVM_{GET,SET}_ONE_REG rejects register IDs that do not correspond to a single underlying architectural register. KVM_GET_REG_LIST was not changed to match however: instead, it simply yields a list of 32-bit register IDs that together cover the whole kvm_regs struct. This means that if userspace tries to use the resulting list of IDs directly to drive calls to KVM_*_ONE_REG, some of those calls will now fail. This was not the intention. Instead, iterating KVM_*_ONE_REG over the list of IDs returned by KVM_GET_REG_LIST should be guaranteed to work. This patch fixes the problem by splitting validate_core_offset() into a backend core_reg_size_from_offset() which does all of the work except for checking that the size field in the register ID matches, and kvm_arm_copy_reg_indices() and num_core_regs() are converted to use this to enumerate the valid offsets. kvm_arm_copy_reg_indices() now also sets the register ID size field appropriately based on the value returned, so the register ID supplied to userspace is fully qualified for use with the register access ioctls. Fixes: d26c25a9d19b ("arm64: KVM: Tighten guest core register access from userspace") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Tested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [bwh: Backported to 3.16: - Don't add unused vcpu parameter - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-05s390/crypto: fix possible sleep during spinlock aquiredHarald Freudenberger
commit 1c2c7029c008922d4d48902cc386250502e73d51 upstream. This patch fixes a complain about possible sleep during spinlock aquired "BUG: sleeping function called from invalid context at include/crypto/algapi.h:426" for the ctr(aes) and ctr(des) s390 specific ciphers. Instead of using a spinlock this patch introduces a mutex which is save to be held in sleeping context. Please note a deadlock is not possible as mutex_trylock() is used. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reported-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> [bwh: Backported to 3.16: - Replace spin_unlock() on all exit paths - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-05powerpc/perf: Fix MMCRA corruption by bhrb_filterRavi Bangoria
commit 3202e35ec1c8fc19cea24253ff83edf702a60a02 upstream. Consider a scenario where user creates two events: 1st event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; fd = perf_event_open(attr, 0, 1, -1, 0); This sets cpuhw->bhrb_filter to 0 and returns valid fd. 2nd event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL; fd = perf_event_open(attr, 0, 1, -1, 0); It overrides cpuhw->bhrb_filter to -1 and returns with error. Now if power_pmu_enable() gets called by any path other than power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1. Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [bwh: Backported to 3.16: drop changes in power9-pmu.c] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-10-05powerpc/perf: add missing put_cpu_var in power_pmu_event_initJan Stancek
commit 68de8867ea5d99127e836c23f6bccf4d44859623 upstream. One path in power_pmu_event_init() calls get_cpu_var(), but is missing matching call to put_cpu_var(), which causes preemption imbalance and crash in user-space: Page fault in user mode with in_atomic() = 1 mm = c000001fefa5a280 NIP = 3fff9bf2cae0 MSR = 900000014280f032 Oops: Weird page fault, sig: 11 [#23] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: <snip> CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 task: c000001fe82c9200 ti: c000001fe835c000 task.ti: c000001fe835c000 NIP: 00003fff9bf2cae0 LR: 00003fff9bee4898 CTR: 00003fff9bf2cae0 REGS: c000001fe835fea0 TRAP: 0401 Tainted: G D (4.0.0-rc5+) MSR: 900000014280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 22000028 XER: 00000000 CFAR: 00003fff9bee4894 SOFTE: 1 GPR00: 00003fff9bee494c 00003fffe01c2ee0 00003fff9c084410 0000000010020068 GPR04: 0000000000000000 0000000000000002 0000000000000008 0000000000000001 GPR08: 0000000000000001 00003fff9c074a30 00003fff9bf2cae0 00003fff9bf2cd70 GPR12: 0000000052000022 00003fff9c10b700 NIP [00003fff9bf2cae0] 0x3fff9bf2cae0 LR [00003fff9bee4898] 0x3fff9bee4898 Call Trace: ---[ end trace 5d3d952b5d4185d4 ]--- BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 in_atomic(): 1, irqs_disabled(): 0, pid: 10285, name: a.out INFO: lockdep is turned off. CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 Call Trace: [c000001fe835f990] [c00000000089c014] .dump_stack+0x98/0xd4 (unreliable) [c000001fe835fa10] [c0000000000e4138] .___might_sleep+0x1d8/0x2e0 [c000001fe835faa0] [c000000000888da8] .down_read+0x38/0x110 [c000001fe835fb30] [c0000000000bf2f4] .exit_signals+0x24/0x160 [c000001fe835fbc0] [c0000000000abde0] .do_exit+0xd0/0xe70 [c000001fe835fcb0] [c00000000001f4c4] .die+0x304/0x450 [c000001fe835fd60] [c00000000088e1f4] .do_page_fault+0x2d4/0x900 [c000001fe835fe30] [c000000000008664] handle_page_fault+0x10/0x30 note: a.out[10285] exited with preempt_count 1 Reproducer: #include <stdio.h> #include <unistd.h> #include <syscall.h> #include <sys/types.h> #include <sys/stat.h> #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> static struct perf_event_attr event = { .type = PERF_TYPE_RAW, .size = sizeof(struct perf_event_attr), .sample_type = PERF_SAMPLE_BRANCH_STACK, .branch_sample_type = PERF_SAMPLE_BRANCH_ANY_RETURN, }; int main() { syscall(__NR_perf_event_open, &event, 0, -1, -1, 0); } Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23x86/speculation/mds: Revert CPU buffer clear on double fault exitAndy Lutomirski
commit 88640e1dcd089879530a49a8d212d1814678dfe7 upstream. The double fault ESPFIX path doesn't return to user mode at all -- it returns back to the kernel by simulating a #GP fault. prepare_exit_to_usermode() will run on the way out of general_protection before running user code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jon Masters <jcm@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user") Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23powerpc/booke64: set RI in default MSRLaurentiu Tudor
commit 5266e58d6cd90ac85c187d673093ad9cb649e16d upstream. Set RI in the default kernel's MSR so that the architected way of detecting unrecoverable machine check interrupts has a chance to work. This is inline with the MSR setup of the rest of booke powerpc architectures configured here. Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23x86/uaccess: Dont leak the AC flag into __put_user() argument evaluationPeter Zijlstra
commit 6ae865615fc43d014da2fd1f1bba7e81ee622d1b upstream. The __put_user() macro evaluates it's @ptr argument inside the __uaccess_begin() / __uaccess_end() region. While this would normally not be expected to be an issue, an UBSAN bug (it ignored -fwrapv, fixed in GCC 8+) would transform the @ptr evaluation for: drivers/gpu/drm/i915/i915_gem_execbuffer.c: if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) { into a signed-overflow-UB check and trigger the objtool AC validation. Finish this commit: 2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation") and explicitly evaluate all 3 arguments early. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Fixes: 2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation") Link: http://lkml.kernel.org/r/20190424072208.695962771@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23powerpc/83xx: Add missing of_node_put() after of_device_is_available()Julia Lawall
commit 4df2cb633b5b22ba152511f1a55e718efca6c0d9 upstream. Add an of_node_put() when a tested device node is not available. Fixes: c026c98739c7e ("powerpc/83xx: Do not configure or probe disabled FSL DR USB controllers") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23crypto: arm/aes-neonbs - don't access already-freed walk.ivEric Biggers
commit 767f015ea0b7ab9d60432ff6cd06b664fd71f50f upstream. If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free. arm32 xts-aes-neonbs doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and it was actually broken prior to the alignmask being removed by commit cc477bf64573 ("crypto: arm/aes - replace bit-sliced OpenSSL NEON code"). Thus, update xts-aes-neonbs to start checking the return value of skcipher_walk_virt(). Fixes: e4e7f10bfc40 ("ARM: add support for bit sliced AES using NEON instructions") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23ARM: pxa: ssp: Fix "WARNING: invalid free of devm_ allocated data"YueHaibing
commit 9ee8578d953023cc57e7e736ae48502c707c0210 upstream. Since commit 1c459de1e645 ("ARM: pxa: ssp: use devm_ functions") kfree, iounmap, clk_put etc are not needed anymore in remove path. Fixes: 1c459de1e645 ("ARM: pxa: ssp: use devm_ functions") Signed-off-by: YueHaibing <yuehaibing@huawei.com> [ commit message spelling fix ] Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23arm64: compat: Reduce address limitVincenzo Frascino
commit d263119387de9975d2acba1dfd3392f7c5979c18 upstream. Currently, compat tasks running on arm64 can allocate memory up to TASK_SIZE_32 (UL(0x100000000)). This means that mmap() allocations, if we treat them as returning an array, are not compliant with the sections 6.5.8 of the C standard (C99) which states that: "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P". Redefine TASK_SIZE_32 to address the issue. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Jann Horn <jannh@google.com> Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> [will: fixed typo in comment] Signed-off-by: Will Deacon <will.deacon@arm.com> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()Eric Biggers
commit dec3d0b1071a0f3194e66a83d26ecf4aa8c5910e upstream. The ->digest() method of crct10dif-pclmul reads the current CRC value from the shash_desc context. But this value is uninitialized, causing crypto_shash_digest() to compute the wrong result. Fix it. Probably this wasn't noticed before because lib/crc-t10dif.c only uses crypto_shash_update(), not crypto_shash_digest(). Likewise, crypto_shash_digest() is not yet tested by the crypto self-tests because those only test the ahash API which only uses shash init/update/final. Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform") Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23ARM: OMAP2+: Fix potentially uninitialized return value for _setup_reset()Tony Lindgren
commit 7f0d078667a494466991aa7133f49594f32ff6a2 upstream. Commit 747834ab8347 ("ARM: OMAP2+: hwmod: revise hardreset behavior") made the call to _enable() conditional based on no oh->rst_lines_cnt. This caused the return value to be potentially uninitialized. Curiously we see no compiler warnings for this, probably as this gets inlined. We call _setup_reset() from _setup() and only _setup_postsetup() if the return value is zero. Currently the return value can be uninitialized for cases where oh->rst_lines_cnt is set and HWMOD_INIT_NO_RESET is not set. Fixes: 747834ab8347 ("ARM: OMAP2+: hwmod: revise hardreset behavior") Cc: Paul Walmsley <paul@pwsan.com> Cc: Tero Kristo <t-kristo@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-09-23ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260Stuart Menefy
commit b7ed69d67ff0788d8463e599dd5dd1b45c701a7e upstream. Fix the interrupt information for the GPIO lines with a shared EINT interrupt. Fixes: 16d7ff2642e7 ("ARM: dts: add dts files for exynos5260 SoC") Signed-off-by: Stuart Menefy <stuart.menefy@mathembedded.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13powerpc/tm: Fix oops on sigreturn on systems without TMMichael Neuling
commit f16d80b75a096c52354c6e0a574993f3b0dfbdfe upstream. On systems like P9 powernv where we have no TM (or P8 booted with ppc_tm=off), userspace can construct a signal context which still has the MSR TS bits set. The kernel tries to restore this context which results in the following crash: Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033 Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69 NIP: c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000 REGS: c00000003fffbd70 TRAP: 0700 Not tainted (5.2.0-11045-g7142b497d8) MSR: 8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]> CR: 42004242 XER: 00000000 CFAR: c0000000000022e0 IRQMASK: 0 GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669 GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000 GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420 GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000 GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000 GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728 NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80 LR [00007fffb2d67e48] 0x7fffb2d67e48 Call Trace: Instruction dump: e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00 e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18 The problem is the signal code assumes TM is enabled when CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as with P9 powernv or if `ppc_tm=off` is used on P8. This means any local user can crash the system. Fix the problem by returning a bad stack frame to the user if they try to set the MSR TS bits with sigreturn() on systems where TM is not supported. Found with sigfuz kernel selftest on P9. This fixes CVE-2019-13648. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGSThomas Gleixner
commit f36cf386e3fec258a341d446915862eded3e13d8 upstream. Intel provided the following information: On all current Atom processors, instructions that use a segment register value (e.g. a load or store) will not speculatively execute before the last writer of that segment retires. Thus they will not use a speculatively written segment value. That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS entry paths can be excluded from the extra LFENCE if PTI is disabled. Create a separate bug flag for the through SWAPGS speculation and mark all out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs are excluded from the whole mitigation mess anyway. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> [bwh: Backported to 3.16: - There's no whitelist entry (or any support) for Hygon CPUs - Use the next available X86_BUG number - Adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/entry/64: Use JMP instead of JMPQJosh Poimboeuf
commit 64dbc122b20f75183d8822618c24f85144a5a94d upstream. Somehow the swapgs mitigation entry code patch ended up with a JMPQ instruction instead of JMP, where only the short jump is needed. Some assembler versions apparently fail to optimize JMPQ into a two-byte JMP when possible, instead always using a 7-byte JMP with relocation. For some reason that makes the entry code explode with a #GP during boot. Change it back to "JMP" as originally intended. Fixes: 18ec54fdd6d1 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations") Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/speculation: Enable Spectre v1 swapgs mitigationsJosh Poimboeuf
commit a2059825986a1c8143fd6698774fa9d83733bb11 upstream. The previous commit added macro calls in the entry code which mitigate the Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are enabled. Enable those features where applicable. The mitigations may be disabled with "nospectre_v1" or "mitigations=off". There are different features which can affect the risk of attack: - When FSGSBASE is enabled, unprivileged users are able to place any value in GS, using the wrgsbase instruction. This means they can write a GS value which points to any value in kernel space, which can be useful with the following gadget in an interrupt/exception/NMI handler: if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg1 // dependent load or store based on the value of %reg // for example: mov %(reg1), %reg2 If an interrupt is coming from user space, and the entry code speculatively skips the swapgs (due to user branch mistraining), it may speculatively execute the GS-based load and a subsequent dependent load or store, exposing the kernel data to an L1 side channel leak. Note that, on Intel, a similar attack exists in the above gadget when coming from kernel space, if the swapgs gets speculatively executed to switch back to the user GS. On AMD, this variant isn't possible because swapgs is serializing with respect to future GS-based accesses. NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case doesn't exist quite yet. - When FSGSBASE is disabled, the issue is mitigated somewhat because unprivileged users must use prctl(ARCH_SET_GS) to set GS, which restricts GS values to user space addresses only. That means the gadget would need an additional step, since the target kernel address needs to be read from user space first. Something like: if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg1 mov (%reg1), %reg2 // dependent load or store based on the value of %reg2 // for example: mov %(reg2), %reg3 It's difficult to audit for this gadget in all the handlers, so while there are no known instances of it, it's entirely possible that it exists somewhere (or could be introduced in the future). Without tooling to analyze all such code paths, consider it vulnerable. Effects of SMAP on the !FSGSBASE case: - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not susceptible to Meltdown), the kernel is prevented from speculatively reading user space memory, even L1 cached values. This effectively disables the !FSGSBASE attack vector. - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP still prevents the kernel from speculatively reading user space memory. But it does *not* prevent the kernel from reading the user value from L1, if it has already been cached. This is probably only a small hurdle for an attacker to overcome. Thanks to Dave Hansen for contributing the speculative_smap() function. Thanks to Andrew Cooper for providing the inside scoop on whether swapgs is serializing on AMD. [ tglx: Fixed the USER fence decision and polished the comment as suggested by Dave Hansen ] Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> [bwh: Backported to 3.16: - Check for X86_FEATURE_KAISER instead of X86_FEATURE_PTI - mitigations= parameter is x86-only here - powerpc doesn't have Spectre mitigations - Don't use __ro_after_init - Adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/speculation: Prepare entry code for Spectre v1 swapgs mitigationsJosh Poimboeuf
commit 18ec54fdd6d18d92025af097cd042a75cf0ea24c upstream. Spectre v1 isn't only about array bounds checks. It can affect any conditional checks. The kernel entry code interrupt, exception, and NMI handlers all have conditional swapgs checks. Those may be problematic in the context of Spectre v1, as kernel code can speculatively run with a user GS. For example: if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg mov (%reg), %reg1 When coming from user space, the CPU can speculatively skip the swapgs, and then do a speculative percpu load using the user GS value. So the user can speculatively force a read of any kernel value. If a gadget exists which uses the percpu value as an address in another load/store, then the contents of the kernel value may become visible via an L1 side channel attack. A similar attack exists when coming from kernel space. The CPU can speculatively do the swapgs, causing the user GS to get used for the rest of the speculative window. The mitigation is similar to a traditional Spectre v1 mitigation, except: a) index masking isn't possible; because the index (percpu offset) isn't user-controlled; and b) an lfence is needed in both the "from user" swapgs path and the "from kernel" non-swapgs path (because of the two attacks described above). The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a CR3 write when PTI is enabled. Since CR3 writes are serializing, the lfences can be skipped in those cases. On the other hand, the kernel entry swapgs paths don't depend on PTI. To avoid unnecessary lfences for the user entry case, create two separate features for alternative patching: X86_FEATURE_FENCE_SWAPGS_USER X86_FEATURE_FENCE_SWAPGS_KERNEL Use these features in entry code to patch in lfences where needed. The features aren't enabled yet, so there's no functional change. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> [bwh: Backported to 3.16: - Assign the CPU feature bits from word 7 - Add FENCE_SWAPGS_KERNEL_ENTRY to NMI entry, since it does not use paranoid_entry - Add a return after .Lerror_entry_from_usermode_after_swapgs, done upstream by commit f10750536fa7 "x86/entry/64: Fix irqflag tracing wrt context tracking" - Include <asm/cpufeatures.h> in calling.h - Adjust filenames, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/entry/64: Fix context tracking state warning when load_gs_index failsWanpeng Li
commit 2fa5f04f85730d0c4f49f984b7efeb4f8d5bd1fc upstream. This warning: WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50 CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 enter_from_user_mode+0x32/0x50 error_entry+0x6d/0xc0 ? general_protection+0x12/0x30 ? native_load_gs_index+0xd/0x20 ? do_set_thread_area+0x19c/0x1f0 SyS_set_thread_area+0x24/0x30 do_int80_syscall_32+0x7c/0x220 entry_INT80_compat+0x38/0x50 ... can be reproduced by running the GS testcase of the ldt_gdt test unit in the x86 selftests. do_int80_syscall_32() will call enter_form_user_mode() to convert context tracking state from user state to kernel state. The load_gs_index() call can fail with user gsbase, gsbase will be fixed up and proceed if this happen. However, enter_from_user_mode() will be called again in the fixed up path though it is context tracking kernel state currently. This patch fixes it by just fixing up gsbase and telling lockdep that IRQs are off once load_gs_index() failed with user gsbase. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16 as dependency of commit 18ec54fdd6d1 "x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations": - Adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/entry/64: Really create an error-entry-from-usermode code pathAndy Lutomirski
commit cb6f64ed5a04036eef07e70b57dd5dd78f2fbcef upstream. In 539f51136500 ("x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code"), I arranged the code slightly wrong -- IRET faults would skip the code path that was intended to execute on all error entries from user mode. Fix it up. While we're at it, make all the labels in error_entry local. This does not fix a bug, but we'll need it, and it slightly shrinks the code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/91e17891e49fa3d61357eadc451529ad48143ee1.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16 as dependency of commit 18ec54fdd6d1 "x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations": - Adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode codeAndy Lutomirski
commit 539f5113650068ba221197f190267ab727296ef5 upstream. The error_entry/error_exit code to handle gsbase and whether we return to user mdoe was a mess: - error_sti was misnamed. In particular, it did not enable interrupts. - Error handling for gs_change was hopelessly tangled the normal usermode path. Separate it out. This saves a branch in normal entries from kernel mode. - The comments were bad. Fix it up. As a nice side effect, there's now a code path that happens on error entries from user mode. We'll use it soon. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f1be898ab93360169fb845ab85185948832209ee.1433878454.git.luto@kernel.org [ Prettified it, clarified comments some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16 as dependency of commit 18ec54fdd6d1 "x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations": - We do not use %ebx as a flag since we already have a backport of commit b3681dd548d0 "x86/entry/64: Remove %ebx handling from error_entry/exit", so don't add the comments about that - Adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86: cpufeatures: Renumber feature word 7Ben Hutchings
Use the same bit numbers for all features that are also present in 4.4.y and 4.9.y, to make further backports slightly easier. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/kprobes: Avoid kretprobe recursion bugMasami Hiramatsu
commit b191fa96ea6dc00d331dcc28c1f7db5e075693a0 upstream. Avoid kretprobe recursion loop bg by setting a dummy kprobes to current_kprobe per-CPU variable. This bug has been introduced with the asm-coded trampoline code, since previously it used another kprobe for hooking the function return placeholder (which only has a nop) and trampoline handler was called from that kprobe. This revives the old lost kprobe again. With this fix, we don't see deadlock anymore. And you can see that all inner-called kretprobe are skipped. event_1 235 0 event_2 19375 19612 The 1st column is recorded count and the 2nd is missed count. Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed) (some difference are here because the counter is racy) Reported-by: Andrea Righi <righi.andrea@gmail.com> Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster") Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13x86/kprobes: Verify stack frame on kretprobeMasami Hiramatsu
commit 3ff9c075cc767b3060bdac12da72fc94dd7da1b8 upstream. Verify the stack frame pointer on kretprobe trampoline handler, If the stack frame pointer does not match, it skips the wrong entry and tries to find correct one. This can happen if user puts the kretprobe on the function which can be used in the path of ftrace user-function call. Such functions should not be probed, so this adds a warning message that reports which function should be blacklisted. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-08-13kvm: mmu: Fix overflow on kvm mmu page limit calculationBen Gardon
commit bc8a3d8925a8fa09fa550e0da115d95851ce33c6 upstream. KVM bases its memory usage limits on the total number of guest pages across all memslots. However, those limits, and the calculations to produce them, use 32 bit unsigned integers. This can result in overflow if a VM has more guest pages that can be represented by a u32. As a result of this overflow, KVM can use a low limit on the number of MMU pages it will allocate. This makes KVM unable to map all of guest memory at once, prompting spurious faults. Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch introduced no new failures. Signed-off-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>