bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2019-04-04	powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM	Breno Leitao
	commit 897bc3df8c5aebb54c32d831f917592e873d0559 upstream. Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") moved a code block around and this block uses a 'msr' variable outside of the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when CONFIG_PPC_TRANSACTION_MEM is not defined. error: 'msr' undeclared (first use in this function) This is not causing a compilation error in the mainline kernel, because 'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as the following when CONFIG_PPC_TRANSACTIONAL_MEM is not set: #define MSR_TM_ACTIVE(x) 0 This patch just fixes this issue avoiding the 'msr' variable usage outside the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the MSR_TM_ACTIVE() definition. Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	powerpc/tm: Unset MSR[TS] if not recheckpointing	Breno Leitao
	commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream. There is a TM Bad Thing bug that can be caused when you return from a signal context in a suspended transaction but with ucontext MSR[TS] unset. This forces regs->msr[TS] to be set at syscall entrance (since the CPU state is transactional). It also calls treclaim() to flush the transaction state, which is done based on the live (mfmsr) MSR state. Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not called, thus, not executing recheckpoint, keeping the CPU state as not transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU state is non transactional, causing the TM Bad Thing with the following stack: [ 33.862316] Bad kernel stack pointer 3fffd9dce3e0 at c00000000000c47c cpu 0x8: Vector: 700 (Program Check) at [c00000003ff7fd40] pc: c00000000000c47c: fast_exception_return+0xac/0xb4 lr: 00003fff865f442c sp: 3fffd9dce3e0 msr: 8000000102a03031 current = 0xc00000041f68b700 paca = 0xc00000000fb84800 softe: 0 irq_happened: 0x01 pid = 1721, comm = tm-signal-sigre Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) WARNING: exception is not recoverable, can't continue The same problem happens on 32-bits signal handler, and the fix is very similar, if tm_recheckpoint() is not executed, then regs->msr[TS] should be zeroed. This patch also fixes a sparse warning related to lack of indentation when CONFIG_PPC_TRANSACTIONAL_MEM is set. Fixes: 2b0a576d15e0e ("powerpc: Add new transactional memory state to the signal context") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	powerpc/tm: Set MSR[TS] just prior to recheckpoint	Breno Leitao
	commit e1c3743e1a20647c53b719dbf28b48f45d23f2cd upstream. On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr. At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed. This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid. The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXASR[FS] bit set also. Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON(). kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0] pc: c000000000054550: restore_gprs+0xb0/0x180 lr: 0000000000000000 sp: c00000041f157950 msr: 8000000100021033 current = 0xc00000041f143000 paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01 pid = 1021, comm = kworker/11:1 kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) enter ? for help [c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0 [c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0 [c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990 [c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0 [c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610 [c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140 [c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule. With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state. Other than that, if CONFIG_PREEMPT is set, there might be a preemption just after setting MSR[TS] and before tm_recheckpoint(), thus, this block must be atomic from a preemption perspective, thus, calling preempt_disable/enable() on this code. It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set. The 32-bits signal handler seems to be safe this current issue, but, it might be exposed to the preemption issue, thus, disabling preemption in this chunk of code. Changes from v2: * Run the critical section with preempt_disable. Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [bwh: Backported to 3.16: - We don't forceably enable TM here; don't change that, and drop the comment about it - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	powerpc/configs: Don't enable PPC_EARLY_DEBUG in defconfigs	Michael Ellerman
	commit 2b874a5c7b75fdc90fdd1e2ffaa3ec5a9d21e253 upstream. This reverts the remains of commit b9ef7d6b11c1 ("powerpc: Update default configurations"). That commit was proceeded by a commit which added a config option to control use of BOOTX for early debug, ie. PPC_EARLY_DEBUG_BOOTX, and then the update of the defconfigs was intended to not change behaviour by then enabling the new config option. However enabling PPC_EARLY_DEBUG had other consequences, notably causing us to register the udbg console at the end of udbg_early_init(). This means on a system which doesn't have anything that BOOTX can use (most systems), we register the udbg console very early but the bootx code just throws everything away, meaning early boot messages are never printed to the console. What we want to happen is for the udbg console to only be registered later (from setup_arch()) once we've setup udbg_putc, and then all early boot messages will be replayed. Fixes: b9ef7d6b11c1 ("powerpc: Update default configurations") Reported-by: Torsten Duwe <duwe@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less	Christoffer Dall
	commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream. We recently addressed a VMID generation race by introducing a read/write lock around accesses and updates to the vmid generation values. However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but does so without taking the read lock. As far as I can tell, this can lead to the same kind of race: VM 0, VCPU 0 VM 0, VCPU 1 ------------ ------------ update_vttbr (vmid 254) update_vttbr (vmid 1) // roll over read_lock(kvm_vmid_lock); force_vm_exit() local_irq_disable need_new_vmid_gen == false //because vmid gen matches enter_guest (vmid 254) kvm_arch.vttbr = <PGD>:<VMID 1> read_unlock(kvm_vmid_lock); enter_guest (vmid 1) Which results in running two VCPUs in the same VM with different VMIDs and (even worse) other VCPUs from other VMs could now allocate clashing VMID 254 from the new generation as long as VCPU 0 is not exiting. Attempt to solve this by making sure vttbr is updated before another CPU can observe the updated VMID generation. Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race" Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [bwh: Backported to 3.16: - Use ACCESS_ONCE() instead of {READ,WRITE}_ONCE() - Adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	kvm: vmx: Set IA32_TSC_AUX for legacy mode guests	Jim Mattson
	commit 0023ef39dc35c773c436eaa46ca539a26b308b55 upstream. RDTSCP is supported in legacy mode as well as long mode. The IA32_TSC_AUX MSR should be set to the correct guest value before entering any guest that supports RDTSCP. Fixes: 4e47c7a6d714 ("KVM: VMX: Add instruction rdtscp support for guest") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: - Keep testing vmx->rdtscp_enabled instead of guest_cpuid_has() - Adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	mips: bpf: fix encoding bug for mm_srlv32_op	Jiong Wang
	commit 17f6c83fb5ebf7db4fcc94a5be4c22d5a7bfe428 upstream. For micro-mips, srlv inside POOL32A encoding space should use 0x50 sub-opcode, NOT 0x90. Some early version ISA doc describes the encoding as 0x90 for both srlv and srav, this looks to me was a typo. I checked Binutils libopcode implementation which is using 0x50 for srlv and 0x90 for srav. v1->v2: - Keep mm_srlv32_op sorted by value. Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction") Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	MIPS: Expand MIPS32 ASIDs to 64 bits	Paul Burton
	commit ff4dd232ec45a0e45ea69f28f069f2ab22b4908a upstream. ASIDs have always been stored as unsigned longs, ie. 32 bits on MIPS32 kernels. This is problematic because it is feasible for the ASID version to overflow & wrap around to zero. We currently attempt to handle this overflow by simply setting the ASID version to 1, using asid_first_version(), but we make no attempt to account for the fact that there may be mm_structs with stale ASIDs that have versions which we now reuse due to the overflow & wrap around. Encountering this requires that: 1) A struct mm_struct X is active on CPU A using ASID (V,n). 2) That mm is not used on CPU A for the length of time that it takes for CPU A's asid_cache to overflow & wrap around to the same version V that the mm had in step 1. During this time tasks using the mm could either be sleeping or only scheduled on other CPUs. 3) Some other mm Y becomes active on CPU A and is allocated the same ASID (V,n). 4) mm X now becomes active on CPU A again, and now incorrectly has the same ASID as mm Y. Where struct mm_struct ASIDs are represented above in the format (version, EntryHi.ASID), and on a typical MIPS32 system version will be 24 bits wide & EntryHi.ASID will be 8 bits wide. The length of time required in step 2 is highly dependent upon the CPU & workload, but for a hypothetical 2GHz CPU running a workload which generates a new ASID every 10000 cycles this period is around 248 days. Due to this long period of time & the fact that tasks need to be scheduled in just the right (or wrong, depending upon your inclination) way, this is obviously a difficult bug to encounter but it's entirely possible as evidenced by reports. In order to fix this, simply extend ASIDs to 64 bits even on MIPS32 builds. This will extend the period of time required for the hypothetical system above to encounter the problem from 28 days to around 3 trillion years, which feels safely outside of the realms of possibility. The cost of this is slightly more generated code in some commonly executed paths, but this is pretty minimal: \| Code Size Gain \| Percentage -----------------------\|----------------\|------------- decstation_defconfig \| +270 \| +0.00% 32r2el_defconfig \| +652 \| +0.01% 32r6el_defconfig \| +1000 \| +0.01% I have been unable to measure any change in performance of the LMbench lat_ctx or lat_proc tests resulting from the 64b ASIDs on either 32r2el_defconfig+interAptiv or 32r6el_defconfig+I6500 systems. Signed-off-by: Paul Burton <paul.burton@mips.com> Suggested-by: James Hogan <jhogan@kernel.org> References: https://lore.kernel.org/linux-mips/80B78A8B8FEE6145A87579E8435D78C30205D5F3@fzex.ruijie.com.cn/ References: https://lore.kernel.org/linux-mips/1488684260-18867-1-git-send-email-jiwei.sun@windriver.com/ Cc: Jiwei Sun <jiwei.sun@windriver.com> Cc: Yu Huabing <yhb@ruijie.com.cn> Cc: linux-mips@vger.kernel.org [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	MIPS: Align kernel load address to 64KB	Huacai Chen
	commit bec0de4cfad21bd284dbddee016ed1767a5d2823 upstream. KEXEC needs the new kernel's load address to be aligned on a page boundary (see sanity_check_segment_list()), but on MIPS the default vmlinuz load address is only explicitly aligned to 16 bytes. Since the largest PAGE_SIZE supported by MIPS kernels is 64KB, increase the alignment calculated by calc_vmlinuz_load_addr to 64KB. Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/21131/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <james.hogan@mips.com> Cc: Steven J . Hill <Steven.Hill@cavium.com> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	MIPS: Ensure pmd_present() returns false after pmd_mknotpresent()	Huacai Chen
	commit 92aa0718c9fa5160ad2f0e7b5bffb52f1ea1e51a upstream. This patch is borrowed from ARM64 to ensure pmd_present() returns false after pmd_mknotpresent(). This is needed for THP. References: 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after pmd_mknotpresent()") Reviewed-by: James Hogan <jhogan@kernel.org> Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/21135/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <james.hogan@mips.com> Cc: Steven J . Hill <Steven.Hill@cavium.com> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	MIPS: SiByte: Enable ZONE_DMA32 for LittleSur	Maciej W. Rozycki
	commit 756d6d836dbfb04a5a486bc2ec89397aa4533737 upstream. The LittleSur board is marked for high memory support and therefore clearly must provide a way to have enough memory installed for some to be present outside the low 4GiB physical address range. With the memory map of the BCM1250 SOC it has been built around it means over 1GiB of actual DRAM, as only the first 1GiB is mapped in the low 4GiB physical address range[1]. Complement commit cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32.") then and also enable ZONE_DMA32 for LittleSur. References: [1] "BCM1250/BCM1125/BCM1125H User Manual", Revision 1250_1125-UM100-R, Broadcom Corporation, 21 Oct 2002, Section 3: "System Overview", "Memory Map", pp. 34-38 Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Patchwork: https://patchwork.linux-mips.org/patch/21107/ Fixes: cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32.") Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-04-04	x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)	Colin Ian King
	commit 53bb565fc5439f2c8c57a786feea5946804aa3e9 upstream. In the expression "word1 << 16", word1 starts as u16, but is promoted to a signed int, then sign-extended to resource_size_t, which is probably not what was intended. Cast to resource_size_t to avoid the sign extension. This fixes an identical issue as fixed by commit 0b2d70764bb3 ("x86/PCI: Fix Broadcom CNB20LE unintended sign extension") back in 2014. Detected by CoverityScan, CID#138749, 138750 ("Unintended sign extension") Fixes: 3f6ea84a3035 ("PCI: read memory ranges out of Broadcom CNB20LE host bridge") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-03-25	KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)	Paolo Bonzini
	commit 353c0956a618a07ba4bbe7ad00ff29fe70e8412a upstream. Bugzilla: 1671930 Emulation of certain instructions (VMXON, VMCLEAR, VMPTRLD, VMWRITE with memory operand, INVEPT, INVVPID) can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Embargoed until Feb 7th 2019. Reported-by: Felix Wilhelm <fwilhelm@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-03-25	KVM: nVMX: unconditionally cancel preemption timer in free_nested ↵	Peter Shier
	(CVE-2019-7221) commit ecec76885bcfe3294685dc363fd1273df0d5d65f upstream. Bugzilla: 1671904 There are multiple code paths where an hrtimer may have been started to emulate an L1 VMX preemption timer that can result in a call to free_nested without an intervening L2 exit where the hrtimer is normally cancelled. Unconditionally cancel in free_nested to cover all cases. Embargoed until Feb 7th 2019. Signed-off-by: Peter Shier <pshier@google.com> Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reported-by: Felix Wilhelm <fwilhelm@google.com> Message-Id: <20181011184646.154065-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-03-25	KVM: Protect device ops->create and list_add with kvm->lock	Christoffer Dall
	commit a28ebea2adc4a2bef5989a5a181ec238f59fbcad upstream. KVM devices were manipulating list data structures without any form of synchronization, and some implementations of the create operations also suffered from a lack of synchronization. Now when we've split the xics create operation into create and init, we can hold the kvm->lock mutex while calling the create operation and when manipulating the devices list. The error path in the generic code gets slightly ugly because we have to take the mutex again and delete the device from the list, but holding the mutex during anon_inode_getfd or releasing/locking the mutex in the common non-error path seemed wrong. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> [bwh: Backported to 3.16: - Drop change to a failure path that doesn't exist in kvm_vgic_create() - Adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-03-25	KVM: PPC: Move xics_debugfs_init out of create	Christoffer Dall
	commit 023e9fddc3616b005c3753fc1bb6526388cd7a30 upstream. As we are about to hold the kvm->lock during the create operation on KVM devices, we should move the call to xics_debugfs_init into its own function, since holding a mutex over extended amounts of time might not be a good idea. Introduce an init operation on the kvm_device_ops struct which cannot fail and call this, if configured, after the device has been created. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	x86/vdso: Fix vDSO syscall fallback asm constraint regression	Andy Lutomirski
	commit 02e425668f5c9deb42787d10001a3b605993ad15 upstream. When I added the missing memory outputs, I failed to update the index of the first argument (ebx) on 32-bit builds, which broke the fallbacks. Somehow I must have screwed up my testing or gotten lucky. Add another test to cover gettimeofday() as well. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 715bd9d12f84 ("x86/vdso: Fix asm constraints on vDSO syscall fallbacks") Link: http://lkml.kernel.org/r/21bd45ab04b6d838278fa5bebfa9163eceffa13c.1538608971.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: - Drop selftest changes - Adjust filename] Tested-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	x86/mtrr: Don't copy uninitialized gentry fields back to userspace	Colin Ian King
	commit 32043fa065b51e0b1433e48d118821c71b5cd65d upstream. Currently the copy_to_user of data in the gentry struct is copying uninitiaized data in field _pad from the stack to userspace. Fix this by explicitly memset'ing gentry to zero, this also will zero any compiler added padding fields that may be in struct (currently there are none). Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable") Fixes: b263b31e8ad6 ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: security@kernel.org Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs	Eduardo Habkost
	commit 0e1b869fff60c81b510c2d00602d778f8f59dd9a upstream. Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	KVM: x86: Add MSR_AMD64_DC_CFG to the list of ignored MSRs	Ladi Prosek
	commit 405a353a0e20d09090ad96147da6afad9b0ce056 upstream. Hyper-V writes 0x800000000000 to MSR_AMD64_DC_CFG when running on AMD CPUs as recommended in erratum 383, analogous to our svm_init_erratum_383. By ignoring the MSR, this patch enables running Hyper-V in L1 on AMD. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	KVM: Handle MSR_IA32_PERF_CTL	Dmitry Bilunov
	commit 0c2df2a1affd183ba9c114915f42a2d464b4f58f upstream. Intel CPUs having Turbo Boost feature implement an MSR to provide a control interface via rdmsr/wrmsr instructions. One could detect the presence of this feature by issuing one of these instructions and handling the #GP exception which is generated in case the referenced MSR is not implemented by the CPU. KVM's vCPU model behaves exactly as a real CPU in this case by injecting a fault when MSR_IA32_PERF_CTL is called (which KVM does not support). However, some operating systems use this register during an early boot stage in which their kernel is not capable of handling #GP correctly, causing #DP and finally a triple fault effectively resetting the vCPU. This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the crashes. Signed-off-by: Dmitry Bilunov <kmeaw@yandex-team.ru> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	KVM: X86: Fix NULL deref in vcpu_scan_ioapic	Wanpeng Li
	commit dcbd3e49c2f0b2c2d8a321507ff8f3de4af76d7c upstream. Reported by syzkaller: CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Call Trace: kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This patch fixes it by also considering whether or not apic is present. Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	ARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt	Lubomir Rintel
	commit 76f4e2c3b6a560cdd7a75b87df543e04d05a9e5f upstream. cpu_is_mmp2() was equivalent to cpu_is_pj4(), wouldn't be correct for multiplatform kernels. Fix it by also considering mmp_chip_id, as is done for cpu_is_pxa168() and cpu_is_pxa910() above. Moreover, it is only available with CONFIG_CPU_MMP2 and thus doesn't work on DT-based MMP2 machines. Enable it on CONFIG_MACH_MMP2_DT too. Note: CONFIG_CPU_MMP2 is only used for machines that use board files instead of DT. It should perhaps be renamed. I'm not doing it now, because I don't have a better idea. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb	Jim Mattson
	commit fd65d3142f734bc4376053c8d75670041903134d upstream. Previously, we only called indirect_branch_prediction_barrier on the logical CPU that freed a vmcb. This function should be called on all logical CPUs that last loaded the vmcb in question. Fixes: 15d45071523d ("KVM/x86: Add IBPB support") Reported-by: Neel Natu <neelnatu@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	kvm: mmu: Fix race in emulated page table writes	Junaid Shahid
	commit 0e0fee5c539b61fdd098332e0e2cc375d9073706 upstream. When a guest page table is updated via an emulated write, kvm_mmu_pte_write() is called to update the shadow PTE using the just written guest PTE value. But if two emulated guest PTE writes happened concurrently, it is possible that the guest PTE and the shadow PTE end up being out of sync. Emulated writes do not mark the shadow page as unsync-ed, so this inconsistency will not be resolved even by a guest TLB flush (unless the page was marked as unsync-ed at some other point). This is fixed by re-reading the current value of the guest PTE after the MMU lock has been acquired instead of just using the value that was written prior to calling kvm_mmu_pte_write(). Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: Use kvm_read_guest_atomic()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	xtensa: fix coprocessor part of ptrace_{get,set}xregs	Max Filippov
	commit 38a35a78c5e270cbe53c4fef6b0d3c2da90dd849 upstream. Layout of coprocessor registers in the elf_xtregs_t and xtregs_coprocessor_t may be different due to alignment. Thus it is not always possible to copy data between the xtregs_coprocessor_t structure and the elf_xtregs_t and get correct values for all registers. Use a table of offsets and sizes of individual coprocessor register groups to do coprocessor context copying in the ptrace_getxregs and ptrace_setxregs. This fixes incorrect coprocessor register values reading from the user process by the native gdb on an xtensa core with multiple coprocessors and registers with high alignment requirements. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	xtensa: fix coprocessor context offset definitions	Max Filippov
	commit 03bc996af0cc71c7f30c384d8ce7260172423b34 upstream. Coprocessor context offsets are used by the assembly code that moves coprocessor context between the individual fields of the thread_info::xtregs_cp structure and coprocessor registers. This fixes coprocessor context clobbering on flushing and reloading during normal user code execution and user process debugging in the presence of more than one coprocessor in the core configuration. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	xtensa: enable coprocessors that are being flushed	Max Filippov
	commit 2958b66694e018c552be0b60521fec27e8d12988 upstream. coprocessor_flush_all may be called from a context of a thread that is different from the thread being flushed. In that case contents of the cpenable special register may not match ti->cpenable of the target thread, resulting in unhandled coprocessor exception in the kernel context. Set cpenable special register to the ti->cpenable of the target register for the duration of the flush and restore it afterwards. This fixes the following crash caused by coprocessor register inspection in native gdb: (gdb) p/x $w0 Illegal instruction in kernel: sig: 9 [#1] PREEMPT Call Trace: ___might_sleep+0x184/0x1a4 __might_sleep+0x41/0xac exit_signals+0x14/0x218 do_exit+0xc9/0x8b8 die+0x99/0xa0 do_illegal_instruction+0x18/0x6c common_exception+0x77/0x77 coprocessor_flush+0x16/0x3c arch_ptrace+0x46c/0x674 sys_ptrace+0x2ce/0x3b4 system_call+0x54/0x80 common_exception+0x77/0x77 note: gdb[100] exited with preempt_count 1 Killed Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	powerpc: Fix COFF zImage booting on old powermacs	Paul Mackerras
	commit 5564597d51c8ff5b88d95c76255e18b13b760879 upstream. Commit 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper as a relocatable ET_DYN", 2011-04-12) changed the procedure descriptor at the start of crt0.S to have a hard-coded start address of 0x500000 rather than a reference to _zimage_start, presumably because having a reference to a symbol introduced a relocation which is awkward to handle in a position-independent executable. Unfortunately, what is at 0x500000 in the COFF image is not the first instruction, but the procedure descriptor itself, that is, a word containing 0x500000, which is not a valid instruction. Hence, booting a COFF zImage results in a "DEFAULT CATCH!, code=FFF00700" message from Open Firmware. This fixes the problem by (a) putting the procedure descriptor in the data section and (b) adding a branch to _zimage_start as the first instruction in the program. Fixes: 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper as a relocatable ET_DYN") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	mips: fix mips_get_syscall_arg o32 check	Dmitry V. Levin
	commit c50cbd85cd7027d32ac5945bb60217936b4f7eaf upstream. When checking for TIF_32BIT_REGS flag, mips_get_syscall_arg() should use the task specified as its argument instead of the current task. This potentially affects all syscall_get_arguments() users who specify tasks different from the current. Fixes: c0ff3c53d4f99 ("MIPS: Enable HAVE_ARCH_TRACEHOOK.") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/21185/ Cc: Elvira Khabirova <lineprinter@altlinux.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	xtensa: fix boot parameters address translation	Max Filippov
	commit 40dc948f234b73497c3278875eb08a01d5854d3f upstream. The bootloader may pass physical address of the boot parameters structure to the MMUv3 kernel in the register a2. Code in the _SetupMMU block in the arch/xtensa/kernel/head.S is supposed to map that physical address to the virtual address in the configured virtual memory layout. This code haven't been updated when additional 256+256 and 512+512 memory layouts were introduced and it may produce wrong addresses when used with these layouts. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	ARM: OMAP1: ams-delta: Fix possible use of uninitialized field	Janusz Krzysztofik
	commit cec83ff1241ec98113a19385ea9e9cfa9aa4125b upstream. While playing with initialization order of modem device, it has been discovered that under some circumstances (early console init, I believe) its .pm() callback may be called before the uart_port->private_data pointer is initialized from plat_serial8250_port->private_data, resulting in NULL pointer dereference. Fix it by checking for uninitialized pointer before using it in modem_pm(). Fixes: aabf31737a6a ("ARM: OMAP1: ams-delta: update the modem to use regulator API") Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	arch/alpha, termios: implement BOTHER, IBSHIFT and termios2	H. Peter Anvin (Intel)
	commit d0ffb805b729322626639336986bc83fc2e60871 upstream. Alpha has had c_ispeed and c_ospeed, but still set speeds in c_cflags using arbitrary flags. Because BOTHER is not defined, the general Linux code doesn't allow setting arbitrary baud rates, and because CBAUDEX == 0, we can have an array overrun of the baud_rate[] table in drivers/tty/tty_baudrate.c if (c_cflags & CBAUD) == 037. Resolve both problems by #defining BOTHER to 037 on Alpha. However, userspace still needs to know if setting BOTHER is actually safe given legacy kernels (does anyone actually care about that on Alpha anymore?), so enable the TCGETS2/TCSETS*2 ioctls on Alpha, even though they use the same structure. Define struct termios2 just for compatibility; it is the exact same structure as struct termios. In a future patchset, this will be cleaned up so the uapi headers are usable from libc. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: <linux-alpha@vger.kernel.org> Cc: <linux-serial@vger.kernel.org> Cc: Johan Hovold <johan@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	xtensa: make sure bFLT stack is 16 byte aligned	Max Filippov
	commit 0773495b1f5f1c5e23551843f87b5ff37e7af8f7 upstream. Xtensa ABI requires stack alignment to be at least 16. In noMMU configuration ARCH_SLAB_MINALIGN is used to align stack. Make it at least 16. This fixes the following runtime error in noMMU configuration, caused by interaction between insufficiently aligned stack and alloca function, that results in corruption of on-stack variable in the libc function glob: Caught unhandled exception in 'sh' (pid = 47, pc = 0x02d05d65) - should not happen EXCCAUSE is 15 Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	x86/hyper-v: Enable PIT shutdown quirk	Michael Kelley
	commit 1de72c706488b7be664a601cf3843bd01e327e58 upstream. Hyper-V emulation of the PIT has a quirk such that the normal PIT shutdown path doesn't work, because clearing the counter register restarts the timer. Disable the counter clearing on PIT shutdown. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org> Cc: "jgross@suse.com" <jgross@suse.com> Cc: "akataria@vmware.com" <akataria@vmware.com> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: vkuznets <vkuznets@redhat.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: KY Srinivasan <kys@microsoft.com> Link: https://lkml.kernel.org/r/1541303219-11142-3-git-send-email-mikelley@microsoft.com [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	xtensa: add NOTES section to the linker script	Max Filippov
	commit 4119ba211bc4f1bf638f41e50b7a0f329f58aa16 upstream. This section collects all source .note.* sections together in the vmlinux image. Without it .note.Linux section may be placed at address 0, while the rest of the kernel is at its normal address, resulting in a huge vmlinux.bin image that may not be linked into the xtensa Image.elf. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	um: Give start_idle_thread() a return code	Richard Weinberger
	commit 7ff1e34bbdc15acab823b1ee4240e94623d50ee8 upstream. Fixes: arch/um/os-Linux/skas/process.c:613:1: warning: control reaches end of non-void function [-Wreturn-type] longjmp() never returns but gcc still warns that the end of the function can be reached. Add a return code and debug aid to detect this impossible case. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	um: Drop own definition of PTRACE_SYSEMU/_SINGLESTEP	Richard Weinberger
	commit 0676b957c24bfb6e495449ba7b7e72c5b5d79233 upstream. 32bit UML used to define PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP own its own because many years ago not all libcs had these request codes in their UAPI. These days PTRACE_SYSEMU/_SINGLESTEP is well known and part of glibc and our own define becomes problematic. With change c48831d0eebf ("linux/x86: sync sys/ptrace.h with Linux 4.14 [BZ #22433]") glibc turned PTRACE_SYSEMU/_SINGLESTEP into a enum and UML failed to build. Let's drop our define and rely on the fact that every libc has PTRACE_SYSEMU/_SINGLESTEP. Cc: Ritesh Raj Sarraf <rrs@researchut.com> Reported-and-tested-by: Ritesh Raj Sarraf <rrs@researchut.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	parisc: Fix address in HPMC IVA	John David Anglin
	commit 1138b6718ff74d2a934459643e3754423d23b5e2 upstream. Helge noticed that the address of the os_hpmc handler was not being correctly calculated in the hpmc macro. As a result, PDCE_CHECK would fail to call os_hpmc: <Cpu2> e800009802e00000 0000000000000000 CC_ERR_CHECK_HPMC <Cpu2> 37000f7302e00000 8040004000000000 CC_ERR_CPU_CHECK_SUMMARY <Cpu2> f600105e02e00000 fffffff0f0c00000 CC_MC_HPMC_MONARCH_SELECTED <Cpu2> 140003b202e00000 000000000000000b CC_ERR_HPMC_STATE_ENTRY <Cpu2> 5600100b02e00000 00000000000001a0 CC_MC_OS_HPMC_LEN_ERR <Cpu2> 5600106402e00000 fffffff0f0438e70 CC_MC_BR_TO_OS_HPMC_FAILED <Cpu2> e800009802e00000 0000000000000000 CC_ERR_CHECK_HPMC <Cpu2> 37000f7302e00000 8040004000000000 CC_ERR_CPU_CHECK_SUMMARY <Cpu2> 4000109f02e00000 0000000000000000 CC_MC_HPMC_INITIATED <Cpu2> 4000101902e00000 0000000000000000 CC_MC_MULTIPLE_HPMCS <Cpu2> 030010d502e00000 0000000000000000 CC_CPU_STOP The address problem can be seen by dumping the fault vector: 0000000040159000 <fault_vector_20>: 40159000: 63 6f 77 73 stb r15,-2447(dp) 40159004: 20 63 61 6e ldil L%b747000,r3 40159008: 20 66 6c 79 ldil L%-1c3b3000,r3 ... 40159020: 08 00 02 40 nop 40159024: 20 6e 60 02 ldil L%15d000,r3 40159028: 34 63 00 00 ldo 0(r3),r3 4015902c: e8 60 c0 02 bv,n r0(r3) 40159030: 08 00 02 40 nop 40159034: 00 00 00 00 break 0,0 40159038: c0 00 70 00 bb,*< r0,sar,40159840 <fault_vector_20+0x840> 4015903c: 00 00 00 00 break 0,0 Location 40159038 should contain the physical address of os_hpmc: 000000004015d000 <os_hpmc>: 4015d000: 08 1a 02 43 copy r26,r3 4015d004: 01 c0 08 a4 mfctl iva,r4 4015d008: 48 85 00 68 ldw 34(r4),r5 This patch moves the address setup into initialize_ivt to resolve the above problem. I tested the change by dumping the HPMC entry after setup: 0000000040209020: 8000240 0000000040209024: 206a2004 0000000040209028: 34630ac0 000000004020902c: e860c002 0000000040209030: 8000240 0000000040209034: 1bdddce6 0000000040209038: 15d000 000000004020903c: 1a0 Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	parisc: Fix map_pages() to not overwrite existing pte entries	Helge Deller
	commit 3c229b3f2dd8133f61bb81d3cb018be92f4bba39 upstream. Fix a long-existing small nasty bug in the map_pages() implementation which leads to overwriting already written pte entries with zero, if map_pages() is called a second time with an end address which isn't aligned on a pmd boundry. This happens for example if we want to remap only the text segment read/write in order to run alternative patching on the code. Exiting the loop when we reach the end address fixes this. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	KVM: nVMX: Always reflect #NM VM-exits to L1	Jim Mattson
	commit 3c6e099fa15fdb6fb1892199ed8709012e1294f2 upstream. When bit 3 (corresponding to CR0.TS) of the VMCS12 cr0_guest_host_mask field is clear, the VMCS12 guest_cr0 field does not necessarily hold the current value of the L2 CR0.TS bit, so the code that checked for L2's CR0.TS bit being set was incorrect. Moreover, I'm not sure that the CR0.TS check was adequate. (What if L2's CR0.EM was set, for instance?) Fortunately, lazy FPU has gone away, so L0 has lost all interest in intercepting #NM exceptions. See commit bd7e5b0899a4 ("KVM: x86: remove code for lazy FPU handling"). Therefore, there is no longer any question of which hypervisor gets first dibs. The #NM VM-exit should always be reflected to L1. (Note that the corresponding bit must be set in the VMCS12 exception_bitmap field for there to be an #NM VM-exit at all.) Fixes: ccf9844e5d99c ("kvm, vmx: Really fix lazy FPU on nested guest") Reported-by: Abhiroop Dabral <adabral@paloaltonetworks.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Tested-by: Abhiroop Dabral <adabral@paloaltonetworks.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: - is_no_device() hadn't been converted to use is_exception_n() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	KVM: x86: remove code for lazy FPU handling	Paolo Bonzini
	commit bd7e5b0899a429445cc6e3037c13f8b5ae3be903 upstream. The FPU is always active now when running KVM. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: - eagerfpu is still optional (but enabled by default) so disable KVM if eagerfpu is disabled - Remove one additional use of KVM_REQ_DEACTIVATE_FPU which was removed earlier upstream in commit c592b5734706 "x86/fpu: Remove use_eager_fpu()" - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	x86, hibernate: Fix nosave_regions setup for hibernation	Zhimin Gu
	commit cc55f7537db6af371e9c1c6a71161ee40f918824 upstream. On 32bit systems, nosave_regions(non RAM areas) located between max_low_pfn and max_pfn are not excluded from hibernation snapshot currently, which may result in a machine check exception when trying to access these unsafe regions during hibernation: [ 612.800453] Disabling lock debugging due to kernel taint [ 612.805786] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: fe00000000801136 [ 612.814344] mce: [Hardware Error]: RIP !INEXACT! 60:<00000000d90be566> {swsusp_save+0x436/0x560} [ 612.823167] mce: [Hardware Error]: TSC 1f5939fe276 ADDR dd000000 MISC 30e0000086 [ 612.830677] mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1529487426 SOCKET 0 APIC 0 microcode 24 [ 612.839581] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [ 612.846394] mce: [Hardware Error]: Machine check: Processor context corrupt [ 612.853380] Kernel panic - not syncing: Fatal machine check [ 612.858978] Kernel Offset: 0x18000000 from 0xc1000000 (relocation range: 0xc0000000-0xf7ffdfff) This is because on 32bit systems, pages above max_low_pfn are regarded as high memeory, and accessing unsafe pages might cause expected MCE. On the problematic 32bit system, there are reserved memory above low memory, which triggered the MCE: e820 memory mapping: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000d160cfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d160d000-0x00000000d1613fff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000d1614000-0x00000000d1a44fff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d1a45000-0x00000000d1ecffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000d1ed0000-0x00000000d7eeafff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d7eeb000-0x00000000d7ffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000d8000000-0x00000000d875ffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d8760000-0x00000000d87fffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000d8800000-0x00000000d8fadfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d8fae000-0x00000000d8ffffff] ACPI data [ 0.000000] BIOS-e820: [mem 0x00000000d9000000-0x00000000da71bfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000da71c000-0x00000000da7fffff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000da800000-0x00000000dbb8bfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000dbb8c000-0x00000000dbffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000dd000000-0x00000000df1fffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000041edfffff] usable Fix this problem by changing pfn limit from max_low_pfn to max_pfn. This fix does not impact 64bit system because on 64bit max_low_pfn is the same as max_pfn. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	powerpc/pseries: Fix how we iterate over the DTL entries	Naveen N. Rao
	commit 9258227e9dd1da8feddb07ad9702845546a581c9 upstream. When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we look up dtl_idx in the lppaca to determine the number of entries in the buffer. Since lppaca is in big endian, we need to do an endian conversion before using this in our calculation to determine the number of entries in the buffer. Without this, we do not iterate over the existing entries in the DTL buffer properly. Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	powerpc/pseries: Fix DTL buffer registration	Naveen N. Rao
	commit db787af1b8a6b4be428ee2ea7d409dafcaa4a43c upstream. When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we register the DTL buffer for a cpu when the associated file under powerpc/dtl in debugfs is opened. When doing so, we need to set the size of the buffer being registered in the second u32 word of the buffer. This needs to be in big endian, but we are not doing the conversion resulting in the below error showing up in dmesg: dtl_start: DTL registration for cpu 0 (hw 0) failed with -4 Fix this in the obvious manner. Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	x86/corruption-check: Fix panic in memory_corruption_check() when boot ↵	He Zhe
	option without value is provided commit ccde460b9ae5c2bd5e4742af0a7f623c2daad566 upstream. memory_corruption_check[{_period\|_size}]()'s handlers do not check input argument before passing it to kstrtoul() or simple_strtoull(). The argument would be a NULL pointer if each of the kernel parameters, without its value, is set in command line and thus cause the following panic. PANIC: early exception 0xe3 IP 10:ffffffff73587c22 error 0 cr2 0x0 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18-rc8+ #2 [ 0.000000] RIP: 0010:kstrtoull+0x2/0x10 ... [ 0.000000] Call Trace [ 0.000000] ? set_corruption_check+0x21/0x49 [ 0.000000] ? do_early_param+0x4d/0x82 [ 0.000000] ? parse_args+0x212/0x330 [ 0.000000] ? rdinit_setup+0x26/0x26 [ 0.000000] ? parse_early_options+0x20/0x23 [ 0.000000] ? rdinit_setup+0x26/0x26 [ 0.000000] ? parse_early_param+0x2d/0x39 [ 0.000000] ? setup_arch+0x2f7/0xbf4 [ 0.000000] ? start_kernel+0x5e/0x4c2 [ 0.000000] ? load_ucode_bsp+0x113/0x12f [ 0.000000] ? secondary_startup_64+0xa5/0xb0 This patch adds checks to prevent the panic. Signed-off-by: He Zhe <zhe.he@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: kstewart@linuxfoundation.org Cc: pombredanne@nexb.com Link: http://lkml.kernel.org/r/1534260823-87917-1-git-send-email-zhe.he@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	ARM: dts: exynos: Disable pull control for MAX8997 interrupts on Origen	Marek Szyprowski
	commit f5e758b8358f6c27e8a351ddf0b441a64cdabb94 upstream. PMIC_IRQB and PMIC_KEYINB lines on Exynos4210-based Origen board have external pull-up resistors, so disable any pull control for those lines in respective pin controller node. This fixes support for MAX8997 interrupts and enables operation of wakeup from MAX8997 RTC alarm. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: 17419726aaa1 ("ARM: dts: add max8997 device node for exynos4210-origen board") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> [bwh: Backported to 3.16: - Use literal 0 instead of EXYNOS_PIN_PULL_NONE - Adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	sparc32: Fix inverted invalid_frame_pointer checks on sigreturns	Andreas Larsson
	commit 07b5ab3f71d318e52c18cc3b73c1d44c908aacfa upstream. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	ARM: fix put_user() for gcc-8	Arnd Bergmann
	Building kernels before linux-4.7 with gcc-8 results in many build failures when gcc triggers a check that was meant to catch broken compilers: /tmp/ccCGMQmS.s:648: Error: .err encountered According to the discussion in the gcc bugzilla, a local "register asm()" variable is still supposed to be the correct way to force an inline assembly to use a particular register, but marking it 'const' lets the compiler do optimizations that break that, i.e the compiler is free to treat the variable as either 'const' or 'register' in that case. Upstream commit 9f73bd8bb445 ("ARM: uaccess: remove put_user() code duplication") fixed this problem in linux-4.8 as part of a larger change, but seems a little too big to be backported to 4.4. Let's take the simplest fix and change only the one broken line in the same way as newer kernels. Suggested-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85745 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86673 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Pointner <h4nn35.work@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-02-11	x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear	Juergen Gross
	commit b2d7a075a1ccef2fb321d595802190c8e9b39004 upstream. Using only 32-bit writes for the pte will result in an intermediate L1TF vulnerable PTE. When running as a Xen PV guest this will at once switch the guest to shadow mode resulting in a loss of performance. Use arch_atomic64_xchg() instead which will perform the requested operation atomically with all 64 bits. Some performance considerations according to: https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf The main number should be the latency, as there is no tight loop around native_ptep_get_and_clear(). "lock cmpxchg8b" has a latency of 20 cycles, while "lock xchg" (with a memory operand) isn't mentioned in that document. "lock xadd" (with xadd having 3 cycles less latency than xchg) has a latency of 11, so we can assume a latency of 14 for "lock xchg". Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [bwh: Backported to 3.16: Use atomic64_cxhg()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>