summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2020-10-01x86/nmi: Fix nmi_handle() duration miscalculationx86_misc_for_v5.10Libing Zhou
When nmi_check_duration() is checking the time an NMI handler took to execute, the whole_msecs value used should be read from the @duration argument, not from the ->max_duration, the latter being used to store the current maximal duration. [ bp: Rewrite commit message. ] Fixes: 248ed51048c4 ("x86/nmi: Remove irq_work from the long duration NMI handler") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Libing Zhou <libing.zhou@nokia-sbell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Changbin Du <changbin.du@gmail.com> Link: https://lkml.kernel.org/r/20200820025641.44075-1-libing.zhou@nokia-sbell.com
2020-10-01x86/asm: Replace __force_order with a memory clobberx86_asm_for_v5.10Arvind Sankar
The CRn accessor functions use __force_order as a dummy operand to prevent the compiler from reordering CRn reads/writes with respect to each other. The fact that the asm is volatile should be enough to prevent this: volatile asm statements should be executed in program order. However GCC 4.9.x and 5.x have a bug that might result in reordering. This was fixed in 8.1, 7.3 and 6.5. Versions prior to these, including 5.x and 4.9.x, may reorder volatile asm statements with respect to each other. There are some issues with __force_order as implemented: - It is used only as an input operand for the write functions, and hence doesn't do anything additional to prevent reordering writes. - It allows memory accesses to be cached/reordered across write functions, but CRn writes affect the semantics of memory accesses, so this could be dangerous. - __force_order is not actually defined in the kernel proper, but the LLVM toolchain can in some cases require a definition: LLVM (as well as GCC 4.9) requires it for PIE code, which is why the compressed kernel has a definition, but also the clang integrated assembler may consider the address of __force_order to be significant, resulting in a reference that requires a definition. Fix this by: - Using a memory clobber for the write functions to additionally prevent caching/reordering memory accesses across CRn writes. - Using a dummy input operand with an arbitrary constant address for the read functions, instead of a global variable. This will prevent reads from being reordered across writes, while allowing memory loads to be cached/reordered across CRn reads, which should be safe. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Link: https://lore.kernel.org/lkml/20200527135329.1172644-1-arnd@arndb.de/ Link: https://lkml.kernel.org/r/20200902232152.3709896-1-nivedita@alum.mit.edu
2020-09-30x86/mce: Use idtentry_nmi_enter/exit()Thomas Gleixner
The recent fix for NMI vs. IRQ state tracking missed to apply the cure to the MCE handler. Fixes: ba1f2b2eaa2a ("x86/entry: Fix NMI vs IRQ state tracking") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/87mu17ism2.fsf@nanos.tec.linutronix.de
2020-09-30x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule listTony Luck
Way back in v3.19 Intel and AMD shared the same machine check severity grading code. So it made sense to add a case for AMD DEFERRED errors in commit e3480271f592 ("x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error") But later in v4.2 AMD switched to a separate grading function in commit bf80bbd7dcf5 ("x86/mce: Add an AMD severities-grading function") Belatedly drop the DEFERRED case from the Intel rule list. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200930021313.31810-3-tony.luck@intel.com
2020-09-30x86/mce: Add Skylake quirk for patrol scrub reported errorsBorislav Petkov
The patrol scrubber in Skylake and Cascade Lake systems can be configured to report uncorrected errors using a special signature in the machine check bank and to signal using CMCI instead of machine check. Update the severity calculation mechanism to allow specifying the model, minimum stepping and range of machine check bank numbers. Add a new rule to detect the special signature (on model 0x55, stepping >=4 in any of the memory controller banks). [ bp: Rewrite it. aegl: Productize it. ] Suggested-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200930021313.31810-2-tony.luck@intel.com
2020-09-28cpuidle-haltpoll: fix error comments in arch_haltpoll_disableLi Qiang
The 'arch_haltpoll_disable' is used to disable guest halt poll. Correct the comments. Fixes: a1c4423b02b21 ("cpuidle-haltpoll: disable host side polling when kvm virtualized") Signed-off-by: Li Qiang <liq3ea@163.com> Message-Id: <20200924155800.4939-1-liq3ea@163.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28x86/hyperv: Remove aliases with X64 in their nameJoseph Salisbury
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-09-27x86/apic/msi: Unbreak DMAR and HPET MSIThomas Gleixner
Switching the DMAR and HPET MSI code to use the generic MSI domain ops missed to add the flag which tells the core code to update the domain operations with the defaults. As a consequence the core code crashes when an interrupt in one of those domains is allocated. Add the missing flags. Fixes: 9006c133a422 ("x86/msi: Use generic MSI domain ops") Reported-by: Qian Cai <cai@redhat.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87wo0fli8b.fsf@nanos.tec.linutronix.de
2020-09-27x86/hyperv: Remove aliases with X64 in their namex86-hyperv-2020-10-12Joseph Salisbury
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com
2020-09-25x86/sev-es: Use GHCB accessor for setting the MMIO scratch bufferx86_seves_for_v5.10Tom Lendacky
Use ghcb_set_sw_scratch() to set the GHCB scratch field, which will also set the corresponding bit in the GHCB valid_bitmap field to denote that sw_scratch is actually valid. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/ba84deabdf44a7a880454fb351d189c6ad79d4ba.1601041106.git.thomas.lendacky@amd.com
2020-09-25dma-mapping: add a new dma_alloc_pages APIChristoph Hellwig
This API is the equivalent of alloc_pages, except that the returned memory is guaranteed to be DMA addressable by the passed in device. The implementation will also be used to provide a more sensible replacement for DMA_ATTR_NON_CONSISTENT flag. Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages as its backend. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> (MIPS part)
2020-09-25Merge branch 'master' of ↵Christoph Hellwig
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into dma-mapping-for-next Pull in the latest 5.9 tree for the commit to revert the V4L2_FLAG_MEMORY_NON_CONSISTENT uapi addition.
2020-09-23x86/ioapic: Unbreak check_timer()x86-urgent-2020-09-27Thomas Gleixner
Several people reported in the kernel bugzilla that between v4.12 and v4.13 the magic which works around broken hardware and BIOSes to find the proper timer interrupt delivery mode stopped working for some older affected platforms which need to fall back to ExtINT delivery mode. The reason is that the core code changed to keep track of the masked and disabled state of an interrupt line more accurately to avoid the expensive hardware operations. That broke an assumption in i8259_make_irq() which invokes disable_irq_nosync(); irq_set_chip_and_handler(); enable_irq(); Up to v4.12 this worked because enable_irq() unconditionally unmasked the interrupt line, but after the state tracking improvements this is not longer the case because the IO/APIC uses lazy disabling. So the line state is unmasked which means that enable_irq() does not call into the new irq chip to unmask it. In principle this is a shortcoming of the core code, but it's more than unclear whether the core code should try to reset state. At least this cannot be done unconditionally as that would break other existing use cases where the chip type is changed, e.g. when changing the trigger type, but the callers expect the state to be preserved. As the way how check_timer() is switching the delivery modes is truly unique, the obvious fix is to simply unmask the i8259 manually after changing the mode to ExtINT delivery and switching the irq chip to the legacy PIC. Note, that the fixes tag is not really precise, but identifies the commit which broke the assumptions in the IO/APIC and i8259 code and that's the kernel version to which this needs to be backported. Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls") Reported-by: p_c_chan@hotmail.com Reported-by: ecm4@mail.com Reported-by: perdigao1@yahoo.com Reported-by: matzes@users.sourceforge.net Reported-by: rvelascog@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: p_c_chan@hotmail.com Tested-by: matzes@users.sourceforge.net Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=197769
2020-09-22x86/irq: Make run_on_irqstack_cond() typesafeThomas Gleixner
Sami reported that run_on_irqstack_cond() requires the caller to cast functions to mismatching types, which trips indirect call Control-Flow Integrity (CFI) in Clang. Instead of disabling CFI on that function, provide proper helpers for the three call variants. The actual ASM code stays the same as that is out of reach. [ bp: Fix __run_on_irqstack() prototype to match. ] Fixes: 931b94145981 ("x86/entry: Provide helpers for executing on the irqstack") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Sami Tolvanen <samitolvanen@google.com> Cc: <stable@vger.kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1052 Link: https://lkml.kernel.org/r/87pn6eb5tv.fsf@nanos.tec.linutronix.de
2020-09-22Merge branch 'x86-seves-for-paolo' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD
2020-09-22x86/fpu: Handle FPU-related and clearcpuid command line arguments earlierx86_fpu_for_v5.10Mike Hommey
FPU initialization handles them currently. However, in the case of clearcpuid=, some other early initialization code may check for features before the FPU initialization code is called. Handling the argument earlier allows the command line to influence those early initializations. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200921215638.37980-1-mh@glandium.org
2020-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - fix fault on page table writes during instruction fetch s390: - doc improvement x86: - The obvious patches are always the ones that turn out to be completely broken. /me hangs his head in shame" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: Revert "KVM: Check the allocation of pv cpu mask" KVM: arm64: Remove S1PTW check from kvm_vcpu_dabt_iswrite() KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch docs: kvm: add documentation for KVM_CAP_S390_DIAG318
2020-09-20Merge tag 'x86_urgent_for_v5.9_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - A defconfig fix (Daniel Díaz) - Disable relocation relaxation for the compressed kernel when not built as -pie as in that case kernels built with clang and linked with LLD fail to boot due to the linker optimizing some instructions in non-PIE form; the gory details in the commit message (Arvind Sankar) - A fix for the "bad bp value" warning issued by the frame-pointer unwinder (Josh Poimboeuf) * tag 'x86_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/fp: Fix FP unwinding in ret_from_fork x86/boot/compressed: Disable relocation relaxation x86/defconfigs: Explicitly unset CONFIG_64BIT in i386_defconfig
2020-09-20Revert "KVM: Check the allocation of pv cpu mask"Vitaly Kuznetsov
The commit 0f990222108d ("KVM: Check the allocation of pv cpu mask") we have in 5.9-rc5 has two issue: 1) Compilation fails for !CONFIG_SMP, see: https://bugzilla.kernel.org/show_bug.cgi?id=209285 2) This commit completely disables PV TLB flush, see https://lore.kernel.org/kvm/87y2lrnnyf.fsf@vitty.brq.redhat.com/ The allocation problem is likely a theoretical one, if we don't have memory that early in boot process we're likely doomed anyway. Let's solve it properly later. This reverts commit 0f990222108d214a0924d920e6095b58107d7b59. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-18stacktrace: Remove reliable argument from arch_stack_walk() callbackMark Brown
Currently the callback passed to arch_stack_walk() has an argument called reliable passed to it to indicate if the stack entry is reliable, a comment says that this is used by some printk() consumers. However in the current kernel none of the arch_stack_walk() implementations ever set this flag to true and the only callback implementation we have is in the generic stacktrace code which ignores the flag. It therefore appears that this flag is redundant so we can simplify and clarify things by removing it. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/r/20200914153409.25097-2-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-18x86/mce: Annotate mce_rd/wrmsrl() with noinstrBorislav Petkov
They do get called from the #MC handler which is already marked "noinstr". Commit e2def7d49d08 ("x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR") already got rid of the instrumentation in the MSR accessors, fix the annotation now too, in order to get rid of: vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200915194020.28807-1-bp@alien8.de
2020-09-18x86/cpu: Add hardware-enforced cache coherency as a CPUID featureKrish Sadhukhan
In some hardware implementations, coherency between the encrypted and unencrypted mappings of the same physical page is enforced. In such a system, it is not required for software to flush the page from all CPU caches in the system prior to changing the value of the C-bit for a page. This hardware- enforced cache coherency is indicated by EAX[10] in CPUID leaf 0x8000001f. [ bp: Use one of the free slots in word 3. ] Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200917212038.5090-2-krish.sadhukhan@oracle.com
2020-09-18x86/unwind/fp: Fix FP unwinding in ret_from_forkx86_urgent_for_v5.9_rc6Josh Poimboeuf
There have been some reports of "bad bp value" warnings printed by the frame pointer unwinder: WARNING: kernel stack regs at 000000005bac7112 in sh:1014 has bad 'bp' value 0000000000000000 This warning happens when unwinding from an interrupt in ret_from_fork(). If entry code gets interrupted, the state of the frame pointer (rbp) may be undefined, which can confuse the unwinder, resulting in warnings like the above. There's an in_entry_code() check which normally silences such warnings for entry code. But in this case, ret_from_fork() is getting interrupted. It recently got moved out of .entry.text, so the in_entry_code() check no longer works. It could be moved back into .entry.text, but that would break the noinstr validation because of the call to schedule_tail(). Instead, initialize each new task's RBP to point to the task's entry regs via an encoded frame pointer. That will allow the unwinder to reach the end of the stack gracefully. Fixes: b9f6976bfb94 ("x86/entry/64: Move non entry code into .text section") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/f366bbf5a8d02e2318ee312f738112d0af74d16f.1600103007.git.jpoimboe@redhat.com
2020-09-17x86/mmu: Allocate/free a PASIDFenghua Yu
A PASID is allocated for an "mm" the first time any thread binds to an SVA-capable device and is freed from the "mm" when the SVA is unbound by the last thread. It's possible for the "mm" to have different PASID values in different binding/unbinding SVA cycles. The mm's PASID (non-zero for valid PASID or 0 for invalid PASID) is propagated to a per-thread PASID MSR for all threads within the mm through IPI, context switch, or inherited. This is done to ensure that a running thread has the right PASID in the MSR matching the mm's PASID. [ bp: s/SVM/SVA/g; massage. ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1600187413-163670-10-git-send-email-fenghua.yu@intel.com
2020-09-17x86/fpu/xstate: Add supervisor PASID state for ENQCMDYu-cheng Yu
The ENQCMD instruction reads a PASID from the IA32_PASID MSR. The MSR is stored in the task's supervisor XSAVE* PASID state and is context-switched by XSAVES/XRSTORS. [ bp: Add (in-)definite articles and massage. ] Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1600187413-163670-6-git-send-email-fenghua.yu@intel.com
2020-09-17x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructionsFenghua Yu
Work submission instruction comes in two flavors. ENQCMD can be called both in ring 3 and ring 0 and always uses the contents of a PASID MSR when shipping the command to the device. ENQCMDS allows a kernel driver to submit commands on behalf of a user process. The driver supplies the PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel as of now. The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1600187413-163670-5-git-send-email-fenghua.yu@intel.com
2020-09-16efi: Support for MOK variable config tableLenny Szubowicz
Because of system-specific EFI firmware limitations, EFI volatile variables may not be capable of holding the required contents of the Machine Owner Key (MOK) certificate store when the certificate list grows above some size. Therefore, an EFI boot loader may pass the MOK certs via a EFI configuration table created specifically for this purpose to avoid this firmware limitation. An EFI configuration table is a much more primitive mechanism compared to EFI variables and is well suited for one-way passage of static information from a pre-OS environment to the kernel. This patch adds initial kernel support to recognize, parse, and validate the EFI MOK configuration table, where named entries contain the same data that would otherwise be provided in similarly named EFI variables. Additionally, this patch creates a sysfs binary file for each EFI MOK configuration table entry found. These files are read-only to root and are provided for use by user space utilities such as mokutil. A subsequent patch will load MOK certs into the trusted platform key ring using this infrastructure. Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com> Link: https://lore.kernel.org/r/20200905013107.10457-2-lszubowi@redhat.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-09-16x86/irq: Cleanup the arch_*_msi_irqs() leftoversThomas Gleixner
Get rid of all the gunk and remove the 'select PCI_MSI_ARCH_FALLBACK' from the x86 Kconfig so the weak functions in the PCI core are replaced by stubs which emit a warning, which ensures that any fail to set the irq domain pointer results in a warning when the device is used. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112334.086003720@linutronix.de
2020-09-16x86/pci: Set default irq domain in pcibios_add_device()Thomas Gleixner
Now that interrupt remapping sets the irqdomain pointer when a PCI device is added it's possible to store the default irq domain in the device struct in pcibios_add_device(). If the bus to which a device is connected has an irq domain associated then this domain is used otherwise the default domain (PCI/MSI native or XEN PCI/MSI) is used. Using the bus domain ensures that special MSI bus domains like VMD work. This makes XEN and the non-remapped native case work solely based on the irq domain pointer in struct device for PCI/MSI and allows to remove the arch fallback and make most of the x86_msi ops private to XEN in the next steps. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112333.900423047@linutronix.de
2020-09-16x86/irq: Initialize PCI/MSI domain at PCI init timeThomas Gleixner
No point in initializing the default PCI/MSI interrupt domain early and no point to create it when XEN PV/HVM/DOM0 are active. Move the initialization to pci_arch_init() and convert it to init ops so that XEN can override it as XEN has it's own PCI/MSI management. The XEN override comes in a later step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112332.859209894@linutronix.de
2020-09-16x86/irq: Move apic_post_init() invocation to one placeThomas Gleixner
No point to call it from both 32bit and 64bit implementations of default_setup_apic_routing(). Move it to the caller. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112332.658496557@linutronix.de
2020-09-16x86/msi: Use generic MSI domain opsThomas Gleixner
pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the generic MSI domain ops in the core and PCI MSI code unconditionally and get rid of the x86 specific implementations in the X86 MSI code and in the hyperv PCI driver. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112332.564274859@linutronix.de
2020-09-16x86/msi: Consolidate MSI allocationThomas Gleixner
Convert the interrupt remap drivers to retrieve the pci device from the msi descriptor and use info::hwirq. This is the first step to prepare x86 for using the generic MSI domain ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112332.466405395@linutronix.de
2020-09-16PCI/MSI: Rework pci_msi_domain_calc_hwirq()Thomas Gleixner
Retrieve the PCI device from the msi descriptor instead of doing so at the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112332.352583299@linutronix.de
2020-09-16x86/irq: Consolidate DMAR irq allocationThomas Gleixner
None of the DMAR specific fields are required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112332.163462706@linutronix.de
2020-09-16x86_ioapic_Consolidate_IOAPIC_allocationThomas Gleixner
Move the IOAPIC specific fields into their own struct and reuse the common devid. Get rid of the #ifdeffery as it does not matter at all whether the alloc info is a couple of bytes longer or not. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112332.054367732@linutronix.de
2020-09-16x86/msi: Consolidate HPET allocationThomas Gleixner
None of the magic HPET fields are required in any way. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112331.943993771@linutronix.de
2020-09-16iommu/irq_remapping: Consolidate irq domain lookupThomas Gleixner
Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN types, consolidate the two getter functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112331.741909337@linutronix.de
2020-09-16x86/irq: Add allocation type for parent domain retrievalThomas Gleixner
irq_remapping_ir_irq_domain() is used to retrieve the remapping parent domain for an allocation type. irq_remapping_irq_domain() is for retrieving the actual device domain for allocating interrupts for a device. The two functions are similar and can be unified by using explicit modes for parent irq domain retrieval. Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu implementations. Drop the parent domain retrieval for PCI_MSI/X as that is unused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112331.436350257@linutronix.de
2020-09-16x86_irq_Rename_X86_IRQ_ALLOC_TYPE_MSI_to_reflect_PCI_dependencyThomas Gleixner
No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112331.343103175@linutronix.de
2020-09-16x86/msi: Remove pointless vcpu_affinity callbackThomas Gleixner
Setting the irq_set_vcpu_affinity() callback to irq_chip_set_vcpu_affinity_parent() is a pointless exercise because the function which utilizes it searchs the domain hierarchy to find a parent domain which has such a callback. Remove the useless indirection. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112331.250130127@linutronix.de
2020-09-16x86/msi: Move compose message callback where it belongsThomas Gleixner
Composing the MSI message at the MSI chip level is wrong because the underlying parent domain is the one which knows how the message should be composed for the direct vector delivery or the interrupt remapping table entry. The interrupt remapping aware PCI/MSI chip does that already. Make the direct delivery chip do the same and move the composition of the direct delivery MSI message to the vector domain irq chip. This prepares for the upcoming device MSI support to avoid having architecture specific knowledge in the device MSI domain irq chips. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112331.157603198@linutronix.de
2020-09-16genirq/chip: Use the first chip in irq_chip_compose_msi_msg()Thomas Gleixner
The documentation of irq_chip_compose_msi_msg() claims that with hierarchical irq domains the first chip in the hierarchy which has an irq_compose_msi_msg() callback is chosen. But the code just keeps iterating after it finds a chip with a compose callback. The x86 HPET MSI implementation relies on that behaviour, but that does not make it more correct. The message should always be composed at the domain which manages the underlying resource (e.g. APIC or remap table) because that domain knows about the required layout of the message. On X86 the following hierarchies exist: 1) vector -------- PCI/MSI 2) vector -- IR -- PCI/MSI The vector domain has a different message format than the IR (remapping) domain. So obviously the PCI/MSI domain can't compose the message without having knowledge about the parent domain, which is exactly the opposite of what hierarchical domains want to achieve. X86 actually has two different PCI/MSI chips where #1 has a compose callback and #2 does not. #2 delegates the composition to the remap domain where it belongs, but #1 does it at the PCI/MSI level. For the upcoming device MSI support it's necessary to change this and just let the first domain which can compose the message take care of it. That way the top level chip does not have to worry about it and the device MSI code does not need special knowledge about topologies. It just sets the compose callback to NULL and lets the hierarchy pick the first chip which has one. Due to that the attempt to move the compose callback from the direct delivery PCI/MSI domain to the vector domain made the system fail to boot with interrupt remapping enabled because in the remapping case irq_chip_compose_msi_msg() keeps iterating and choses the compose callback of the vector domain which obviously creates the wrong format for the remap table. Break out of the loop when the first irq chip with a compose callback is found and fixup the HPET code temporarily. That workaround will be removed once the direct delivery compose callback is moved to the place where it belongs in the vector domain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112331.047917603@linutronix.de
2020-09-16x86/init: Remove unused init opsThomas Gleixner
Some past platform removal forgot to get rid of this unused ballast. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112330.806095671@linutronix.de
2020-09-15x86/mce/dev-mcelog: Do not update kflags on AMD systemsSmita Koralahalli
The mcelog utility is not commonly used on AMD systems. Therefore, errors logged only by the dev_mce_log() notifier will be missed. This may occur if the EDAC modules are not loaded, in which case it's preferable to print the error record by the default notifier. However, the mce->kflags set by dev_mce_log() notifier makes the default notifier skip over the errors assuming they are processed by dev_mce_log(). Do not update kflags in the dev_mce_log() notifier on AMD systems. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200903234531.162484-3-Smita.KoralahalliChannabasappa@amd.com
2020-09-14x86/mce: Stop mce_reign() from re-computing severity for every CPUTony Luck
Back in commit: 20d51a426fe9 ("x86/mce: Reuse one of the u16 padding fields in 'struct mce'") a field was added to "struct mce" to save the computed error severity. Make use of this in mce_reign() to avoid re-computing the severity for every CPU. In the case where the machine panics, one call to mce_severity() is still needed in order to provide the correct message giving the reason for the panic. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200908175519.14223-2-tony.luck@intel.com
2020-09-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "A bit on the bigger side, mostly due to me being on vacation, then busy, then on parental leave, but there's nothing worrisome. ARM: - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values x86: - nSVM state restore fixes - Async page fault fixes - Lots of small fixes everywhere" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: emulator: more strict rsm checks. KVM: nSVM: more strict SMM checks when returning to nested guest SVM: nSVM: setup nested msr permission bitmap on nested state load SVM: nSVM: correctly restore GIF on vmexit from nesting after migration x86/kvm: don't forget to ACK async PF IRQ x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit KVM: SVM: avoid emulation with stale next_rip KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN KVM: SVM: Periodically schedule when unregistering regions on destroy KVM: MIPS: Change the definition of kvm type kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control KVM: fix memory leak in kvm_io_bus_unregister_dev() KVM: Check the allocation of pv cpu mask KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected KVM: arm64: Update page shift if stage 2 block mapping not supported KVM: arm64: Fix address truncation in traces KVM: arm64: Do not try to map PUDs when they are folded into PMD arm64/x86: KVM: Introduce steal-time cap ...
2020-09-12x86/kvm: don't forget to ACK async PF IRQVitaly Kuznetsov
Merge commit 26d05b368a5c0 ("Merge branch 'kvm-async-pf-int' into HEAD") tried to adapt the new interrupt based async PF mechanism to the newly introduced IDTENTRY magic but unfortunately it missed the fact that DEFINE_IDTENTRY_SYSVEC() doesn't call ack_APIC_irq() on its own and all DEFINE_IDTENTRY_SYSVEC() users have to call it manually. As the result all multi-CPU KVM guest hang on boot when KVM_FEATURE_ASYNC_PF_INT is present. The breakage went unnoticed because no KVM userspace (e.g. QEMU) currently set it (and thus async PF mechanism is currently disabled) but we're about to change that. Fixes: 26d05b368a5c0 ("Merge branch 'kvm-async-pf-int' into HEAD") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908135350.355053-3-vkuznets@redhat.com> Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macroVitaly Kuznetsov
DEFINE_IDTENTRY_SYSVEC() already contains irqentry_enter()/ irqentry_exit(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908135350.355053-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: Check the allocation of pv cpu maskHaiwei Li
check the allocation of per-cpu __pv_cpu_mask. Initialize ops only when successful. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <d59f05df-e6d3-3d31-a036-cc25a2b2f33f@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>