summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2018-10-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGreg Kroah-Hartman
Paolo writes: "KVM changes for 4.19-rc7 x86 and PPC bugfixes, mostly introduced in 4.19-rc1." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: nVMX: fix entry with pending interrupt if APICv is enabled KVM: VMX: hide flexpriority from guest when disabled at the module level KVM: VMX: check for existence of secondary exec controls before accessing KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault KVM: x86: fix L1TF's MMIO GFN calculation tools/kvm_stat: cut down decimal places in update interval dialog KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled KVM: x86: never trap MSR_KERNEL_GS_BASE
2018-10-05Merge tag 'kvm-ppc-fixes-4.19-3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master Third set of PPC KVM fixes for 4.19 One patch here, fixing a potential host crash introduced (or at least exacerbated) by a previous fix for corruption relating to radix guest page faults and THP operations.
2018-10-04Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armGreg Kroah-Hartman
Russell writes: "A couple of small ARM fixes from Stefan and Thomas: - Adding the io_pgetevents syscall - Fixing a bounds check in pci_ioremap_io()" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8799/1: mm: fix pci_ioremap_io() offset check ARM: 8787/1: wire up io_pgetevents syscall
2018-10-04Merge tag 'riscv-for-linus-4.19-rc7' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Palmer writes: "A Single RISC-V Fix for 4.19-rc7 This tag contains a single patch that managed to get lost in the shuffle, which explains why it's so late. This single line has been floating around in various patch sets for months, and fixes our DMA32 region." * tag 'riscv-for-linus-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: RISCV: Fix end PFN for low memory
2018-10-04kvm: nVMX: fix entry with pending interrupt if APICv is enabledPaolo Bonzini
Commit b5861e5cf2fcf83031ea3e26b0a69d887adf7d21 introduced a check on the interrupt-window and NMI-window CPU execution controls in order to inject an external interrupt vmexit before the first guest instruction executes. However, when APIC virtualization is enabled the host does not need a vmexit in order to inject an interrupt at the next interrupt window; instead, it just places the interrupt vector in RVI and the processor will inject it as soon as possible. Therefore, on machines with APICv it is not enough to check the CPU execution controls: the same scenario can also happen if RVI>vPPR. Fixes: b5861e5cf2fcf83031ea3e26b0a69d887adf7d21 Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04KVM: VMX: hide flexpriority from guest when disabled at the module levelPaolo Bonzini
As of commit 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0, whereas previously KVM would allow a nested guest to enable VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware. That is, KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't (always) allow setting it when kvm-intel.flexpriority=0, and may even initially allow the control and then clear it when the nested guest writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause functional issues. Hide the control completely when the module parameter is cleared. reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Fixes: 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings") Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04KVM: VMX: check for existence of secondary exec controls before accessingSean Christopherson
Return early from vmx_set_virtual_apic_mode() if the processor doesn't support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of which reside in SECONDARY_VM_EXEC_CONTROL. This eliminates warnings due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing on processors without secondary exec controls. Remove the similar check for TPR shadowing as it is incorporated in the flexpriority_enabled check and the APIC-related code in vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE. Reported-by: Gerhard Wiesinger <redhat@wiesinger.com> Fixes: 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings") Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page faultPaul Mackerras
Commit 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size", 2018-09-11) added a call to __find_linux_pte() and a dereference of the returned PTE pointer to the radix page fault path in the common case where the page is normal system memory. Previously, __find_linux_pte() was only called for mappings to physical addresses which don't have a page struct (e.g. memory-mapped I/O) or where the page struct is marked as reserved memory. This exposes us to the possibility that the returned PTE pointer could be NULL, for example in the case of a concurrent THP collapse operation. Dereferencing the returned NULL pointer causes a host crash. To fix this, we check for NULL, and if it is NULL, we retry the operation by returning to the guest, with the expectation that it will generate the same page fault again (unless of course it has been fixed up by another CPU in the meantime). Fixes: 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-02RISCV: Fix end PFN for low memoryAtish Patra
Use memblock_end_of_DRAM which provides correct last low memory PFN. Without that, DMA32 region becomes empty resulting in zero pages being allocated for DMA32. This patch is based on earlier patch from palmer which never merged into 4.19. I just edited the commit text to make more sense. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-10-01Merge tag 'arm64-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Will writes: "Late arm64 fixes - Fix handling of young contiguous ptes for hugetlb mappings - Fix livelock when taking access faults on contiguous hugetlb mappings - Tighten up register accesses via KVM SET_ONE_REG ioctl()s" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: KVM: Sanitize PSTATE.M when being set from userspace arm64: KVM: Tighten guest core register access from userspace arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags arm64: hugetlb: Fix handling of young ptes
2018-10-01Merge tag 'armsoc-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Olof writes: "ARM: SoC fixes A handful of fixes that have been coming in the last couple of weeks: - Freescale fixes for on-chip accellerators - A DT fix for stm32 to avoid fallback to non-DMA SPI mode - Fixes for badly specified interrupts on BCM63xx SoCs - Allwinner A64 HDMI was incorrectly specified as fully compatble with R40 - Drive strength fix for SAMA5D2 NAND pins on one board" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: stm32: update SPI6 dmas property on stm32mp157c soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift() soc: fsl: qbman: qman: avoid allocating from non existing gen_pool ARM: dts: BCM63xx: Fix incorrect interrupt specifiers MAINTAINERS: update the Annapurna Labs maintainer email ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT ARM: dts: at91: sama5d2_ptc_ek: fix nand pinctrl
2018-10-01KVM: x86: fix L1TF's MMIO GFN calculationSean Christopherson
One defense against L1TF in KVM is to always set the upper five bits of the *legal* physical address in the SPTEs for non-present and reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the MMIO SPTE may overlap with the upper five bits that are being usurped to defend against L1TF. To preserve the GFN, the bits of the GFN that overlap with the repurposed bits are shifted left into the reserved bits, i.e. the GFN in the SPTE will be split into high and low parts. When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO access, get_mmio_spte_gfn() unshifts the affected bits and restores the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() neglects to mask off the reserved bits in the SPTE that were used to store the upper chunk of the GFN. As a result, KVM fails to detect MMIO accesses whose GPA overlaps the repurprosed bits, which in turn causes guest panics and hangs. Fix the bug by generating a mask that covers the lower chunk of the GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The alternative approach would be to explicitly zero the five reserved bits that are used to store the upper chunk of the GFN, but that requires additional run-time computation and makes an already-ugly bit of code even more inscrutable. I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to warn if GENMASK_ULL() generated a nonsensical value, but that seemed silly since that would mean a system that supports VMX has less than 18 bits of physical address space... Reported-by: Sakari Ailus <sakari.ailus@iki.fi> Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Reviewed-by: Junaid Shahid <junaids@google.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGSLiran Alon
L2 IA32_BNDCFGS should be updated with vmcs12->guest_bndcfgs only when VM_ENTRY_LOAD_BNDCFGS is specified in vmcs12->vm_entry_controls. Otherwise, L2 IA32_BNDCFGS should be set to vmcs01->guest_bndcfgs which is L1 IA32_BNDCFGS. Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: x86: Do not use kvm_x86_ops->mpx_supported() directlyLiran Alon
Commit a87036add092 ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") introduced kvm_mpx_supported() to return true iff MPX is enabled in the host. However, that commit seems to have missed replacing some calls to kvm_x86_ops->mpx_supported() to kvm_mpx_supported(). Complete original commit by replacing remaining calls to kvm_mpx_supported(). Fixes: a87036add092 ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabledLiran Alon
Before this commit, KVM exposes MPX VMX controls to L1 guest only based on if KVM and host processor supports MPX virtualization. However, these controls should be exposed to guest only in case guest vCPU supports MPX. Without this change, a L1 guest running with kernel which don't have commit 691bd4340bef ("kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS") asserts in QEMU on the following: qemu-kvm: error: failed to set MSR 0xd90 to 0x0 qemu-kvm: .../qemu-2.10.0/target/i386/kvm.c:1801 kvm_put_msrs: Assertion 'ret == cpu->kvm_msr_buf->nmsrs failed' This is because L1 KVM kvm_init_msr_list() will see that vmx_mpx_supported() (As it only checks MPX VMX controls support) and therefore KVM_GET_MSR_INDEX_LIST IOCTL will include MSR_IA32_BNDCFGS. However, later when L1 will attempt to set this MSR via KVM_SET_MSRS IOCTL, it will fail because !guest_cpuid_has_mpx(vcpu). Therefore, fix the issue by exposing MPX VMX controls to L1 guest only when vCPU supports MPX. Fixes: 36be0b9deb23 ("KVM: x86: Add nested virtualization support for MPX") Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01arm64: KVM: Sanitize PSTATE.M when being set from userspaceMarc Zyngier
Not all execution modes are valid for a guest, and some of them depend on what the HW actually supports. Let's verify that what userspace provides is compatible with both the VM settings and the HW capabilities. Cc: <stable@vger.kernel.org> Fixes: 0d854a60b1d7 ("arm64: KVM: enable initialization of a 32bit vcpu") Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-01arm64: KVM: Tighten guest core register access from userspaceDave Martin
We currently allow userspace to access the core register file in about any possible way, including straddling multiple registers and doing unaligned accesses. This is not the expected use of the ABI, and nobody is actually using it that way. Let's tighten it by explicitly checking the size and alignment for each field of the register file. Cc: <stable@vger.kernel.org> Fixes: 2f4a07c5f9fe ("arm64: KVM: guest one-reg interface") Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> [maz: rewrote Dave's initial patch to be more easily backported] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-29Merge branch 'x86-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Thomas writes: "A single fix for the AMD memory encryption boot code so it does not read random garbage instead of the cached encryption bit when a kexec kernel is allocated above the 32bit address limit." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Fix kexec booting failure in the SEV bit detection code
2018-09-28Merge tag 'powerpc-4.19-3' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Michael writes: "powerpc fixes for 4.19 #3 A reasonably big batch of fixes due to me being away for a few weeks. A fix for the TM emulation support on Power9, which could result in corrupting the guest r11 when running under KVM. Two fixes to the TM code which could lead to userspace GPR corruption if we take an SLB miss at exactly the wrong time. Our dynamic patching code had a bug that meant we could patch freed __init text, which could lead to corrupting userspace memory. csum_ipv6_magic() didn't work on little endian platforms since we optimised it recently. A fix for an endian bug when reading a device tree property telling us how many storage keys the machine has available. Fix a crash seen on some configurations of PowerVM when migrating the partition from one machine to another. A fix for a regression in the setup of our CPU to NUMA node mapping in KVM guests. A fix to our selftest Makefiles to make them work since a recent change to the shared Makefile logic." * tag 'powerpc-4.19-3' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix Makefiles for headers_install change powerpc/numa: Use associativity if VPHN hcall is successful powerpc/tm: Avoid possible userspace r1 corruption on reclaim powerpc/tm: Fix userspace r13 corruption powerpc/pseries: Fix unitialized timer reset on migration powerpc/pkeys: Fix reading of ibm, processor-storage-keys property powerpc: fix csum_ipv6_magic() on little endian platforms powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again) powerpc: Avoid code patching freed init sections KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds
2018-09-27x86/boot: Fix kexec booting failure in the SEV bit detection codeKairui Song
Commit 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active") can occasionally cause system resets when kexec-ing a second kernel even if SEV is not active. That's because get_sev_encryption_bit() uses 32-bit rIP-relative addressing to read the value of enc_bit - a variable which caches a previously detected encryption bit position - but kexec may allocate the early boot code to a higher location, beyond the 32-bit addressing limit. In this case, garbage will be read and get_sev_encryption_bit() will return the wrong value, leading to accessing memory with the wrong encryption setting. Therefore, remove enc_bit, and thus get rid of the need to do 32-bit rIP-relative addressing in the first place. [ bp: massage commit message heavily. ] Fixes: 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active") Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-kernel@vger.kernel.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: brijesh.singh@amd.com Cc: kexec@lists.infradead.org Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: ghook@redhat.com Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
2018-09-25ARM: dts: stm32: update SPI6 dmas property on stm32mp157cAmelie Delaunay
Remove unused parameter from SPI6 dmas property on stm32mp157c SoC. Fixes: dc3f8c86c10d ("ARM: dts: stm32: add SPI support on stm32mp157c") Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com> Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com> [olof: Without this patch, SPI6 will fall back to interrupt mode with lower perfmance] Signed-off-by: Olof Johansson <olof@lixom.net>
2018-09-25Merge tag 'arm-soc/for-4.19/devicetree-fixes' of ↵Olof Johansson
https://github.com/Broadcom/stblinux into fixes This pull request contains Broadcom ARM-based SoCs Device Tree changes intended for 4.19, please pull the following: - Florian fixes the PPI and SPI interrupts in the BCM63138 (DSL) SoC DTS * tag 'arm-soc/for-4.19/devicetree-fixes' of https://github.com/Broadcom/stblinux: ARM: dts: BCM63xx: Fix incorrect interrupt specifiers Signed-off-by: Olof Johansson <olof@lixom.net>
2018-09-25Merge tag 'at91-4.19-fixes' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into fixes AT91 fixes for 4.19: - fix a NAND issue on sama5d2_ptc_ek (drive strength setting to fix corruption) * tag 'at91-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: dts: at91: sama5d2_ptc_ek: fix nand pinctrl Signed-off-by: Olof Johansson <olof@lixom.net>
2018-09-25powerpc/numa: Use associativity if VPHN hcall is successfulSrikar Dronamraju
Currently associativity is used to lookup node-id even if the preceding VPHN hcall failed. However this can cause CPU to be made part of the wrong node, (most likely to be node 0). This is because VPHN is not enabled on KVM guests. With 2ea6263 ("powerpc/topology: Get topology for shared processors at boot"), associativity is used to set to the wrong node. Hence KVM guest topology is broken. For example : A 4 node KVM guest before would have reported. [root@localhost ~]# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 node 0 size: 1746 MB node 0 free: 1604 MB node 1 cpus: 4 5 6 7 node 1 size: 2044 MB node 1 free: 1765 MB node 2 cpus: 8 9 10 11 node 2 size: 2044 MB node 2 free: 1837 MB node 3 cpus: 12 13 14 15 node 3 size: 2044 MB node 3 free: 1903 MB node distances: node 0 1 2 3 0: 10 40 40 40 1: 40 10 40 40 2: 40 40 10 40 3: 40 40 40 10 Would now report: [root@localhost ~]# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 1746 MB node 0 free: 1244 MB node 1 cpus: node 1 size: 2044 MB node 1 free: 2032 MB node 2 cpus: 1 node 2 size: 2044 MB node 2 free: 2028 MB node 3 cpus: node 3 size: 2044 MB node 3 free: 2032 MB node distances: node 0 1 2 3 0: 10 40 40 40 1: 40 10 40 40 2: 40 40 10 40 3: 40 40 40 10 Fix this by skipping associativity lookup if the VPHN hcall failed. Fixes: 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot") Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-25powerpc/tm: Avoid possible userspace r1 corruption on reclaimMichael Neuling
Current we store the userspace r1 to PACATMSCRATCH before finally saving it to the thread struct. In theory an exception could be taken here (like a machine check or SLB miss) that could write PACATMSCRATCH and hence corrupt the userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but others do. We've never actually seen this happen but it's theoretically possible. Either way, the code is fragile as it is. This patch saves r1 to the kernel stack (which can't fault) before we turn MSR[RI] back on. PACATMSCRATCH is still used but only with MSR[RI] off. We then copy r1 from the kernel stack to the thread struct once we have MSR[RI] back on. Suggested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-25powerpc/tm: Fix userspace r13 corruptionMichael Neuling
When we treclaim we store the userspace checkpointed r13 to a scratch SPR and then later save the scratch SPR to the user thread struct. Unfortunately, this doesn't work as accessing the user thread struct can take an SLB fault and the SLB fault handler will write the same scratch SPRG that now contains the userspace r13. To fix this, we store r13 to the kernel stack (which can't fault) before we access the user thread struct. Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen as a random userspace segfault with r13 looking like a kernel address. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-24RISC-V: include linux/ftrace.h in asm-prototypes.hJames Cowgill
Building a riscv kernel with CONFIG_FUNCTION_TRACER and CONFIG_MODVERSIONS enabled results in these two warnings: MODPOST vmlinux.o WARNING: EXPORT symbol "return_to_handler" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned. When exporting symbols from an assembly file, the MODVERSIONS code requires their prototypes to be defined in asm-prototypes.h (see scripts/Makefile.build). Since both of these symbols have prototypes defined in linux/ftrace.h, include this header from RISC-V's asm-prototypes.h. Reported-by: Karsten Merker <merker@debian.org> Signed-off-by: James Cowgill <jcowgill@debian.org> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-09-24ARM: dts: BCM63xx: Fix incorrect interrupt specifiersFlorian Fainelli
A number of our interrupts were incorrectly specified, fix both the PPI and SPI interrupts to be correct. Fixes: b5762cacc411 ("ARM: bcm63138: add NAND DT support") Fixes: 46d4bca0445a ("ARM: BCM63XX: add BCM63138 minimal Device Tree") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2018-09-24arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flagsSteve Capper
For contiguous hugetlb, huge_ptep_set_access_flags performs a get_clear_flush (which then flushes the TLBs) even when no change of ptes is necessary. Unfortunately, this behaviour can lead to back-to-back page faults being generated when running with multiple threads that access the same contiguous huge page. Thread 1 | Thread 2 -----------------------------+------------------------------ hugetlb_fault | huge_ptep_set_access_flags | -> invalidate pte range | hugetlb_fault continue processing | wait for hugetlb_fault_mutex release mutex and return | huge_ptep_set_access_flags | -> invalidate pte range hugetlb_fault ... This patch changes huge_ptep_set_access_flags s.t. we first read the contiguous range of ptes (whilst preserving dirty information); the pte range is only then invalidated where necessary and this prevents further spurious page faults. Fixes: d8bdcff28764 ("arm64: hugetlb: Add break-before-make logic for contiguous entries") Reported-by: Lei Zhang <zhang.lei@jp.fujitsu.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-24arm64: hugetlb: Fix handling of young ptesSteve Capper
In the contiguous bit hugetlb break-before-make code we assume that all hugetlb pages are young. In fact, remove_migration_pte is able to place an old hugetlb pte so this assumption is not valid. This patch fixes the contiguous hugetlb scanning code to preserve young ptes. Fixes: d8bdcff28764 ("arm64: hugetlb: Add break-before-make logic for contiguous entries") Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-24KVM: x86: never trap MSR_KERNEL_GS_BASEPaolo Bonzini
KVM has an old optimization whereby accesses to the kernel GS base MSR are trapped when the guest is in 32-bit and not when it is in 64-bit mode. The idea is that swapgs is not available in 32-bit mode, thus the guest has no reason to access the MSR unless in 64-bit mode and 32-bit applications need not pay the price of switching the kernel GS base between the host and the guest values. However, this optimization adds complexity to the code for little benefit (these days most guests are going to be 64-bit anyway) and in fact broke after commit 678e315e78a7 ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base", 2018-08-06); the guest kernel GS base can be corrupted across SMIs and UEFI Secure Boot is therefore broken (a secure boot Linux guest, for example, fails to reach the login prompt about half the time). This patch just removes the optimization; the kernel GS base MSR is now never trapped by KVM, similarly to the FS and GS base MSRs. Fixes: 678e315e78a780dbef384b92339c8414309dbc11 Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-24powerpc/pseries: Fix unitialized timer reset on migrationMichael Bringmann
After migration of a powerpc LPAR, the kernel executes code to update the system state to reflect new platform characteristics. Such changes include modifications to device tree properties provided to the system by PHYP. Property notifications received by the post_mobility_fixup() code are passed along to the kernel in general through a call to of_update_property() which in turn passes such events back to all modules through entries like the '.notifier_call' function within the NUMA module. When the NUMA module updates its state, it resets its event timer. If this occurs after a previous call to stop_topology_update() or on a system without VPHN enabled, the code runs into an unitialized timer structure and crashes. This patch adds a safety check along this path toward the problem code. An example crash log is as follows. ibmvscsi 30000081: Re-enabling adapter! ------------[ cut here ]------------ kernel BUG at kernel/time/timer.c:958! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag lockd unix_diag af_packet_diag netlink_diag grace fscache sunrpc xts vmx_crypto pseries_rng sg binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod CPU: 11 PID: 3067 Comm: drmgr Not tainted 4.17.0+ #179 ... NIP mod_timer+0x4c/0x400 LR reset_topology_timer+0x40/0x60 Call Trace: 0xc0000003f9407830 (unreliable) reset_topology_timer+0x40/0x60 dt_update_callback+0x100/0x120 notifier_call_chain+0x90/0x100 __blocking_notifier_call_chain+0x60/0x90 of_property_notify+0x90/0xd0 of_update_property+0x104/0x150 update_dt_property+0xdc/0x1f0 pseries_devicetree_update+0x2d0/0x510 post_mobility_fixup+0x7c/0xf0 migration_store+0xa4/0xc0 kobj_attr_store+0x30/0x60 sysfs_kf_write+0x64/0xa0 kernfs_fop_write+0x16c/0x240 __vfs_write+0x40/0x200 vfs_write+0xc8/0x240 ksys_write+0x5c/0x100 system_call+0x58/0x6c Fixes: 5d88aa85c00b ("powerpc/pseries: Update CPU maps when device tree is updated") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-23Merge tag 'sunxi-fixes-for-4.19-2' of ↵Olof Johansson
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes Allwinner fixes - round 2 One additional fix regarding HDMI on the R40 SoC. Based on preliminary tests and code dumps for the R40, it was thought that the whole HDMI block was the same on the R40 and A64. Recent tests regarding the A64 showed that this was not the case. The HDMI PHY on the A64 only has one clock parent. How this occurs at the hardware level is unclear, as Allwinner has not given any feedback on this matter. Nevertheless it is clear that the hardware acts differently between the A64 and R40 in such a way that the R40's HDMI PHY is not backward compatible with the A64's. As such we need to drop the fallback compatible string in the R40's device tree. This was added in v4.19-rc1. * tag 'sunxi-fixes-for-4.19-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT Signed-off-by: Olof Johansson <olof@lixom.net>
2018-09-23Merge tag 'for-linus-4.19d-rc5-tag' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Juergen writes: "xen: Two small fixes for xen drivers." * tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: issue warning message when out of grant maptrack entries xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
2018-09-23Merge branch 'x86-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Thomas writes: "A set of fixes for x86: - Resolve the kvmclock regression on AMD systems with memory encryption enabled. The rework of the kvmclock memory allocation during early boot results in encrypted storage, which is not shareable with the hypervisor. Create a new section for this data which is mapped unencrypted and take care that the later allocations for shared kvmclock memory is unencrypted as well. - Fix the build regression in the paravirt code introduced by the recent spectre v2 updates. - Ensure that the initial static page tables cover the fixmap space correctly so early console always works. This worked so far by chance, but recent modifications to the fixmap layout can - depending on kernel configuration - move the relevant entries to a different place which is not covered by the initial static page tables. - Address the regressions and issues which got introduced with the recent extensions to the Intel Recource Director Technology code. - Update maintainer entries to document reality" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Expand static page table for fixmap space MAINTAINERS: Add X86 MM entry x86/intel_rdt: Add Reinette as co-maintainer for RDT MAINTAINERS: Add Borislav to the x86 maintainers x86/paravirt: Fix some warning messages x86/intel_rdt: Fix incorrect loop end condition x86/intel_rdt: Fix exclusive mode handling of MBA resource x86/intel_rdt: Fix incorrect loop end condition x86/intel_rdt: Do not allow pseudo-locking of MBA resource x86/intel_rdt: Fix unchecked MSR access x86/intel_rdt: Fix invalid mode warning when multiple resources are managed x86/intel_rdt: Global closid helper to support future fixes x86/intel_rdt: Fix size reporting of MBA resource x86/intel_rdt: Fix data type in parsing callbacks x86/kvm: Use __bss_decrypted attribute in shared variables x86/mm: Add .bss..decrypted section to hold shared variables
2018-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGreg Kroah-Hartman
Paolo writes: "It's mostly small bugfixes and cleanups, mostly around x86 nested virtualization. One important change, not related to nested virtualization, is that the ability for the guest kernel to trap CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now masked by default. This is because the feature is detected through an MSR; a very bad idea that Intel seems to like more and more. Some applications choke if the other fields of that MSR are not initialized as on real hardware, hence we have to disable the whole MSR by default, as was the case before Linux 4.12." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits) KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs kvm: selftests: Add platform_info_test KVM: x86: Control guest reads of MSR_PLATFORM_INFO KVM: x86: Turbo bits in MSR_PLATFORM_INFO nVMX x86: Check VPID value on vmentry of L2 guests nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2 KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv KVM: VMX: check nested state and CR4.VMXE against SMM kvm: x86: make kvm_{load|put}_guest_fpu() static x86/hyper-v: rename ipi_arg_{ex,non_ex} structures KVM: VMX: use preemption timer to force immediate VMExit KVM: VMX: modify preemption timer bit only when arming timer KVM: VMX: immediately mark preemption timer expired only for zero value KVM: SVM: Switch to bitmap_zalloc() KVM/MMU: Fix comment in walk_shadow_page_lockless_end() kvm: selftests: use -pthread instead of -lpthread KVM: x86: don't reset root in kvm_mmu_setup() kvm: mmu: Don't read PDPTEs when paging is not enabled x86/kvm/lapic: always disable MMIO interface in x2APIC mode KVM: s390: Make huge pages unavailable in ucontrol VMs ...
2018-09-20x86/mm: Expand static page table for fixmap spaceFeng Tang
We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup. Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled. So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables. Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: kernel test robot <rong.a.chen@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts) Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
2018-09-20KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLsLiran Alon
The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set their return value in "r" local var and break out of switch block when they encounter some error. This is because vcpu_load() is called before the switch block which have a proper cleanup of vcpu_put() afterwards. However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return immediately on error without performing above mentioned cleanup. Thus, change these handlers to behave as expected. Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Patrick Colp <patrick.colp@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20powerpc/pkeys: Fix reading of ibm, processor-storage-keys propertyThiago Jung Bauermann
scan_pkey_feature() uses of_property_read_u32_array() to read the ibm,processor-storage-keys property and calls be32_to_cpu() on the value it gets. The problem is that of_property_read_u32_array() already returns the value converted to the CPU byte order. The value of pkeys_total ends up more or less sane because there's a min() call in pkey_initialize() which reduces pkeys_total to 32. So in practice the kernel ignores the fact that the hypervisor reserved one key for itself (the device tree advertises 31 keys in my test VM). This is wrong, but the effect in practice is that when a process tries to allocate the 32nd key, it gets an -EINVAL error instead of -ENOSPC which would indicate that there aren't any keys available Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-20powerpc: fix csum_ipv6_magic() on little endian platformsChristophe Leroy
On little endian platforms, csum_ipv6_magic() keeps len and proto in CPU byte order. This generates a bad results leading to ICMPv6 packets from other hosts being dropped by powerpc64le platforms. In order to fix this, len and proto should be converted to network byte order ie bigendian byte order. However checksumming 0x12345678 and 0x56341278 provide the exact same result so it is enough to rotate the sum of len and proto by 1 byte. PPC32 only support bigendian so the fix is needed for PPC64 only Fixes: e9c4943a107b ("powerpc: Implement csum_ipv6_magic in assembly") Reported-by: Jianlin Shi <jishi@redhat.com> Reported-by: Xin Long <lucien.xin@gmail.com> Cc: <stable@vger.kernel.org> # 4.18+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-20powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again)Alexey Kardashevskiy
mpe: This was fixed originally in commit d3d4ffaae439 ("powerpc/powernv/ioda2: Reduce upper limit for DMA window size"), but contrary to what the merge commit says was inadvertently lost by me in commit ce57c6610cc2 ("Merge branch 'topic/ppc-kvm' into next") which brought in changes that moved the code to a new file. So reapply it to the new file. Original commit message follows: We use PHB in mode1 which uses bit 59 to select a correct DMA window. However there is mode2 which uses bits 59:55 and allows up to 32 DMA windows per a PE. Even though documentation does not clearly specify that, it seems that the actual hardware does not support bits 59:55 even in mode1, in other words we can create a window as big as 1<<58 but DMA simply won't work. This reduces the upper limit from 59 to 55 bits to let the userspace know about the hardware limits. Fixes: ce57c6610cc2 ("Merge branch 'topic/ppc-kvm' into next") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-20ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DTIcenowy Zheng
The R40 HDMI PHY seems to be different to the A64 one, the A64 one has no input mux, but the R40 one has. Drop the A64 fallback compatible from the HDMI PHY node in R40 DT. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> [wens@csie.org: Fix subject prefix order] Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2018-09-20KVM: x86: Control guest reads of MSR_PLATFORM_INFODrew Schmitt
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access to reads of MSR_PLATFORM_INFO. Disabling access to reads of this MSR gives userspace the control to "expose" this platform-dependent information to guests in a clear way. As it exists today, guests that read this MSR would get unpopulated information if userspace hadn't already set it (and prior to this patch series, only the CPUID faulting information could have been populated). This existing interface could be confusing if guests don't handle the potential for incorrect/incomplete information gracefully (e.g. zero reported for base frequency). Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: x86: Turbo bits in MSR_PLATFORM_INFODrew Schmitt
Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only the CPUID faulting bit was settable. But now any bit in MSR_PLATFORM_INFO would be settable. This can be used, for example, to convey frequency information about the platform on which the guest is running. Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20nVMX x86: Check VPID value on vmentry of L2 guestsKrish Sadhukhan
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check needs to be enforced on vmentry of L2 guests: If the 'enable VPID' VM-execution control is 1, the value of the of the VPID VM-execution control field must not be 0000H. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2Krish Sadhukhan
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check needs to be enforced on vmentry of L2 guests: - Bits 5:0 of the posted-interrupt descriptor address are all 0. - The posted-interrupt descriptor address does not set any bits beyond the processor's physical-address width. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICvLiran Alon
In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state, it is possible for a vCPU to be blocked while it is in guest-mode. According to Intel SDM 26.6.5 Interrupt-Window Exiting and Virtual-Interrupt Delivery: "These events wake the logical processor if it just entered the HLT state because of a VM entry". Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU should be waken from the HLT state and injected with the interrupt. In addition, if while the vCPU is blocked (while it is in guest-mode), it receives a nested posted-interrupt, then the vCPU should also be waken and injected with the posted interrupt. To handle these cases, this patch enhances kvm_vcpu_has_events() to also check if there is a pending interrupt in L2 virtual APICv provided by L1. That is, it evaluates if there is a pending virtual interrupt for L2 by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1 Evaluation of Pending Interrupts. Note that this also handles the case of nested posted-interrupt by the fact RVI is updated in vmx_complete_nested_posted_interrupt() which is called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_running() -> vmx_check_nested_events() -> vmx_complete_nested_posted_interrupt(). Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: VMX: check nested state and CR4.VMXE against SMMPaolo Bonzini
VMX cannot be enabled under SMM, check it when CR4 is set and when nested virtualization state is restored. This should fix some WARNs reported by syzkaller, mostly around alloc_shadow_vmcs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20kvm: x86: make kvm_{load|put}_guest_fpu() staticSebastian Andrzej Siewior
The functions kvm_load_guest_fpu() kvm_put_guest_fpu() are only used locally, make them static. This requires also that both functions are moved because they are used before their implementation. Those functions were exported (via EXPORT_SYMBOL) before commit e5bb40251a920 ("KVM: Drop kvm_{load,put}_guest_fpu() exports"). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20x86/hyper-v: rename ipi_arg_{ex,non_ex} structuresVitaly Kuznetsov
These structures are going to be used from KVM code so let's make their names reflect their Hyper-V origin. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>