summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm
AgeCommit message (Collapse)Author
2020-11-17KVM: SVM: fix error return code in svm_create_vcpu()Chen Zhou
Fix to return a negative error code from the error handling case instead of 0 in function svm_create_vcpu(), as done elsewhere in this function. Fixes: f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Message-Id: <20201117025426.167824-1-chenzhou10@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().Ashish Kalra
Fix offset computation in __sev_dbg_decrypt() to include the source paddr before it is rounded down to be aligned to 16 bytes as required by SEV API. This fixes incorrect guest memory dumps observed when using qemu monitor. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guestsBabu Moger
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask the memory encryption bit in reserved bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
2020-10-21Merge branch 'kvm-fixes' into 'next'Paolo Bonzini
Pick up bugfixes from 5.9, otherwise various tests fail.
2020-10-21KVM: SVM: Initialize prev_ga_tag before useSuravee Suthikulpanit
The function amd_ir_set_vcpu_affinity makes use of the parameter struct amd_iommu_pi_data.prev_ga_tag to determine if it should delete struct amd_iommu_pi_data from a list when not running in AVIC mode. However, prev_ga_tag is initialized only when AVIC is enabled. The non-zero uninitialized value can cause unintended code path, which ends up making use of the struct vcpu_svm.ir_list and ir_list_lock without being initialized (since they are intended only for the AVIC case). This triggers NULL pointer dereference bug in the function vm_ir_list_del with the following call trace: svm_update_pi_irte+0x3c2/0x550 [kvm_amd] ? proc_create_single_data+0x41/0x50 kvm_arch_irq_bypass_add_producer+0x40/0x60 [kvm] __connect+0x5f/0xb0 [irqbypass] irq_bypass_register_producer+0xf8/0x120 [irqbypass] vfio_msi_set_vector_signal+0x1de/0x2d0 [vfio_pci] vfio_msi_set_block+0x77/0xe0 [vfio_pci] vfio_pci_set_msi_trigger+0x25c/0x2f0 [vfio_pci] vfio_pci_set_irqs_ioctl+0x88/0xb0 [vfio_pci] vfio_pci_ioctl+0x2ea/0xed0 [vfio_pci] ? alloc_file_pseudo+0xa5/0x100 vfio_device_fops_unl_ioctl+0x26/0x30 [vfio] ? vfio_device_fops_unl_ioctl+0x26/0x30 [vfio] __x64_sys_ioctl+0x96/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, initialize prev_ga_tag to zero before use. This should be safe because ga_tag value 0 is invalid (see function avic_vm_init). Fixes: dfa20099e26e ("KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20201003232707.4662-1-suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21KVM: nSVM: implement on demand allocation of the nested stateMaxim Levitsky
This way we don't waste memory on VMs which don't use nesting virtualization even when the host enabled it for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21KVM: x86: allow kvm_x86_ops.set_efer to return an error valueMaxim Levitsky
This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-14Merge tag 'x86_seves_for_v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV-ES support from Borislav Petkov: "SEV-ES enhances the current guest memory encryption support called SEV by also encrypting the guest register state, making the registers inaccessible to the hypervisor by en-/decrypting them on world switches. Thus, it adds additional protection to Linux guests against exfiltration, control flow and rollback attacks. With SEV-ES, the guest is in full control of what registers the hypervisor can access. This is provided by a guest-host exchange mechanism based on a new exception vector called VMM Communication Exception (#VC), a new instruction called VMGEXIT and a shared Guest-Host Communication Block which is a decrypted page shared between the guest and the hypervisor. Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so in order for that exception mechanism to work, the early x86 init code needed to be made able to handle exceptions, which, in itself, brings a bunch of very nice cleanups and improvements to the early boot code like an early page fault handler, allowing for on-demand building of the identity mapping. With that, !KASLR configurations do not use the EFI page table anymore but switch to a kernel-controlled one. The main part of this series adds the support for that new exchange mechanism. The goal has been to keep this as much as possibly separate from the core x86 code by concentrating the machinery in two SEV-ES-specific files: arch/x86/kernel/sev-es-shared.c arch/x86/kernel/sev-es.c Other interaction with core x86 code has been kept at minimum and behind static keys to minimize the performance impact on !SEV-ES setups. Work by Joerg Roedel and Thomas Lendacky and others" * tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits) x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer x86/sev-es: Check required CPU features for SEV-ES x86/efi: Add GHCB mappings when SEV-ES is active x86/sev-es: Handle NMI State x86/sev-es: Support CPU offline/online x86/head/64: Don't call verify_cpu() on starting APs x86/smpboot: Load TSS and getcpu GDT entry before loading IDT x86/realmode: Setup AP jump table x86/realmode: Add SEV-ES specific trampoline entry point x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES x86/sev-es: Handle #DB Events x86/sev-es: Handle #AC Events x86/sev-es: Handle VMMCALL Events x86/sev-es: Handle MWAIT/MWAITX Events x86/sev-es: Handle MONITOR/MONITORX Events x86/sev-es: Handle INVD Events x86/sev-es: Handle RDPMC Events x86/sev-es: Handle RDTSC(P) Events ...
2020-10-14Merge tag 'objtool-core-2020-10-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "Most of the changes are cleanups and reorganization to make the objtool code more arch-agnostic. This is in preparation for non-x86 support. Other changes: - KASAN fixes - Handle unreachable trap after call to noreturn functions better - Ignore unreachable fake jumps - Misc smaller fixes & cleanups" * tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf build: Allow nested externs to enable BUILD_BUG() usage objtool: Allow nested externs to enable BUILD_BUG() objtool: Permit __kasan_check_{read,write} under UACCESS objtool: Ignore unreachable trap after call to noreturn functions objtool: Handle calling non-function symbols in other sections objtool: Ignore unreachable fake jumps objtool: Remove useless tests before save_reg() objtool: Decode unwind hint register depending on architecture objtool: Make unwind hint definitions available to other architectures objtool: Only include valid definitions depending on source file type objtool: Rename frame.h -> objtool.h objtool: Refactor jump table code to support other architectures objtool: Make relocation in alternative handling arch dependent objtool: Abstract alternative special case handling objtool: Move macros describing structures to arch-dependent code objtool: Make sync-check consider the target architecture objtool: Group headers to check in a single list objtool: Define 'struct orc_entry' only when needed objtool: Skip ORC entry creation for non-text sections objtool: Move ORC logic out of check() ...
2020-10-12Merge tag 'x86_cpu_for_v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Add support for hardware-enforced cache coherency on AMD which obviates the need to flush cachelines before changing the PTE encryption bit (Krish Sadhukhan) - Add Centaur initialization support for families >= 7 (Tony W Wang-oc) - Add a feature flag for, and expose TSX suspend load tracking feature to KVM (Cathy Zhang) - Emulate SLDT and STR so that windows programs don't crash on UMIP machines (Brendan Shanks and Ricardo Neri) - Use the new SERIALIZE insn on Intel hardware which supports it (Ricardo Neri) - Misc cleanups and fixes * tag 'x86_cpu_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM: SVM: Don't flush cache if hardware enforces cache coherency across encryption domains x86/mm/pat: Don't flush cache if hardware enforces cache coherency across encryption domnains x86/cpu: Add hardware-enforced cache coherency as a CPUID feature x86/cpu/centaur: Add Centaur family >=7 CPUs initialization support x86/cpu/centaur: Replace two-condition switch-case with an if statement x86/kvm: Expose TSX Suspend Load Tracking feature x86/cpufeatures: Enumerate TSX suspend load address tracking instructions x86/umip: Add emulation/spoofing for SLDT and STR instructions x86/cpu: Fix typos and improve the comments in sync_core() x86/cpu: Use XGETBV and XSETBV mnemonics in fpu/internal.h x86/cpu: Use SERIALIZE in sync_core() when available
2020-09-28KVM: nSVM: delay MSR permission processing to first nested VM runPaolo Bonzini
Allow userspace to set up the memory map after KVM_SET_NESTED_STATE; to do so, move the call to nested_svm_vmrun_msrpm inside the KVM_REQ_GET_NESTED_STATE_PAGES handler (which is currently not used by nSVM). This is similar to what VMX does already. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: SVM: Prevent MSR passthrough when MSR access is deniedAlexander Graf
We will introduce the concept of MSRs that may not be handled in kernel space soon. Some MSRs are directly passed through to the guest, effectively making them handled by KVM from user space's point of view. This patch introduces all logic required to ensure that MSRs that user space wants trapped are not marked as direct access for guests. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-6-graf@amazon.com> [Make terminology a bit more similar to VMX. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Prepare MSR bitmaps for userspace tracked MSRsAaron Lewis
Prepare vmx and svm for a subsequent change that ensures the MSR permission bitmap is set to allow an MSR that userspace is tracking to force a vmx_vmexit in the guest. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Oliver Upton <oupton@google.com> [agraf: rebase, adapt SVM scheme to nested changes that came in between] Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-5-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Use common definition for kvm_nested_vmexit tracepointSean Christopherson
Use the newly introduced TRACE_EVENT_KVM_EXIT to define the guts of kvm_nested_vmexit so that it captures and prints the same information as kvm_exit. This has the bonus side effect of fixing the interrupt info and error code printing for the case where they're invalid, e.g. if the exit was a failed VM-Entry. This also sets the stage for retrieving EXIT_QUALIFICATION and VM_EXIT_INTR_INFO in nested_vmx_reflect_vmexit() if and only if the VM-Exit is being routed to L1. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Add intr/vectoring info and error code to kvm_exit tracepointSean Christopherson
Extend the kvm_exit tracepoint to align it with kvm_nested_vmexit in terms of what information is captured. On SVM, add interrupt info and error code, while on VMX it add IDT vectoring and error code. This sets the stage for macrofying the kvm_exit tracepoint definition so that it can be reused for kvm_nested_vmexit without loss of information. Opportunistically stuff a zero for VM_EXIT_INTR_INFO if the VM-Enter failed, as the field is guaranteed to be invalid. Note, it'd be possible to further filter the interrupt/exception fields based on the VM-Exit reason, but the helper is intended only for tracepoints, i.e. an extra VMREAD or two is a non-issue, the failed VM-Enter case is just low hanging fruit. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Read guest RIP from within the kvm_nested_vmexit tracepointSean Christopherson
Use kvm_rip_read() to read the guest's RIP for the nested VM-Exit tracepoint instead of having the caller pass in an argument. Params that are passed into a tracepoint are evaluated even if the tracepoint is disabled, i.e. passing in RIP for VMX incurs a VMREAD and retpoline to retrieve a value that may never be used, e.g. if the exit is due to a hardware interrupt. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SEV: shorten comments around sev_clflush_pagesPaolo Bonzini
Very similar content is present in four comments in sev.c. Unfortunately there are small differences that make it harder to place the comment in sev_clflush_pages itself, but at least we can make it more concise. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Mark SEV launch secret pages as dirty.Cfir Cohen
The LAUNCH_SECRET command performs encryption of the launch secret memory contents. Mark pinned pages as dirty, before unpinning them. This matches the logic in sev_launch_update_data(). Signed-off-by: Cfir Cohen <cfir@google.com> Message-Id: <20200808003746.66687-1-cfir@google.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nSVM: CR3 MBZ bits are only 63:52Krish Sadhukhan
Commit 761e4169346553c180bbd4a383aedd72f905bc9a created a wrong mask for the CR3 MBZ bits. According to APM vol 2, only the upper 12 bits are MBZ. Fixes: 761e41693465 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests", 2020-07-08) Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200829004824.4577-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Add tracepoint for cr_interceptionHaiwei Li
Add trace_kvm_cr_write and trace_kvm_cr_read for svm. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <f3031602-db3b-c4fe-b719-d402663b0a2b@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Analyze is_guest_mode() in svm_vcpu_run()Wanpeng Li
Analyze is_guest_mode() in svm_vcpu_run() instead of svm_exit_handlers_fastpath() in conformity with VMX version. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1600066548-4343-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Add kvm_x86_ops hook to short circuit emulationSean Christopherson
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a more generic is_emulatable(), and unconditionally call the new function in x86_emulate_instruction(). KVM will use the generic hook to support multiple security related technologies that prevent emulation in one way or another. Similar to the existing AMD #NPF case where emulation of the current instruction is not possible due to lack of information, AMD's SEV-ES and Intel's SGX and TDX will introduce scenarios where emulation is impossible due to the guest's register state being inaccessible. And again similar to the existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(), i.e. outside of the control of vendor-specific code. While the cause and architecturally visible behavior of the various cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean resume or complete shutdown, and SEV-ES and TDX "return" an error, the impact on the common emulation code is identical: KVM must stop emulation immediately and resume the guest. Query is_emulatable() in handle_ud() as well so that the force_emulation_prefix code doesn't incorrectly modify RIP before calling emulate_instruction() in the absurdly unlikely scenario that KVM encounters forced emulation in conjunction with "do not emulate". Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: use __GFP_ZERO instead of clear_page()Haiwei Li
Use __GFP_ZERO while alloc_page(). Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20200916083621.5512-1-lihaiwei.kernel@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Enable INVPCID feature on AMDBabu Moger
The following intercept bit has been added to support VMEXIT for INVPCID instruction: Code Name Cause A2h VMEXIT_INVPCID INVPCID instruction The following bit has been added to the VMCB layout control area to control intercept of INVPCID: Byte Offset Bit(s) Function 14h 2 intercept INVPCID Enable the interceptions when the the guest is running with shadow page table enabled and handle the tlbflush based on the invpcid instruction type. For the guests with nested page table (NPT) support, the INVPCID feature works as running it natively. KVM does not need to do any special handling in this case. AMD documentation for INVPCID feature is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985255929.11252.17346684135277453258.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_interceptBabu Moger
Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep tfor all cr intercepts. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985253016.11252.16945893859439811480.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Add new intercept word in vmcb_control_areaBabu Moger
The new intercept bits have been added in vmcb control area to support few more interceptions. Here are the some of them. - INTERCEPT_INVLPGB, - INTERCEPT_INVLPGB_ILLEGAL, - INTERCEPT_INVPCID, - INTERCEPT_MCOMMIT, - INTERCEPT_TLBSYNC, Add a new intercept word in vmcb_control_area to support these instructions. Also update kvm_nested_vmrun trace function to support the new addition. AMD documentation for these instructions is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Modify 64 bit intercept field to two 32 bit vectorsBabu Moger
Convert all the intercepts to one array of 32 bit vectors in vmcb_control_area. This makes it easy for future intercept vector additions. Also update trace functions. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Modify intercept_exceptions to generic interceptsBabu Moger
Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_exceptions bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Change intercept_dr to generic interceptsBabu Moger
Modify intercept_dr to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_dr bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Change intercept_cr to generic interceptsBabu Moger
Change intercept_cr to generic intercepts in vmcb_control_area. Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept where applicable. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu> [Change constant names. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)Babu Moger
This is in preparation for the future intercept vector additions. Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept using kernel APIs __set_bit, __clear_bit and test_bit espectively. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985247876.11252.16039238014239824460.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nSVM: Remove unused fieldBabu Moger
host_intercept_exceptions is not used anywhere. Clean it up. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985252277.11252.8819848322175521354.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: refactor exit labels in svm_create_vcpuMaxim Levitsky
Kernel coding style suggests not to use labels like error1,error2 Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: use __GFP_ZERO instead of clear_pageMaxim Levitsky
Another small refactoring. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: refactor msr permission bitmap allocationMaxim Levitsky
Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates the msr bitmap and add svm_vcpu_free_msrpm to free it. This will be used later to move the nested msr permission bitmap allocation to nested.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nSVM: rename nested vmcb to vmcb12Maxim Levitsky
This is to be more consistient with VMX, and to support upcoming addition of vmcb02 Hopefully no functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: rename a variable in the svm_create_vcpuMaxim Levitsky
The 'page' is to hold the vcpu's vmcb so name it as such to avoid confusion. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200827171145.374620-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: LAPIC: Reduce world switch latency caused by timer_advance_nsWanpeng Li
All the checks in lapic_timer_int_injected(), __kvm_wait_lapic_expire(), and these function calls waste cpu cycles when the timer mode is not tscdeadline. We can observe ~1.3% world switch time overhead by kvm-unit-tests/vmexit.flat vmcall testing on AMD server. This patch reduces the world switch latency caused by timer_advance_ns feature when the timer mode is not tscdeadline by simpling move the check against apic->lapic_timer.expired_tscdeadline much earlier. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-7-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nSVM: Avoid freeing uninitialized pointers in svm_set_nested_state()Vitaly Kuznetsov
The save and ctl pointers are passed uninitialized to kfree() when svm_set_nested_state() follows the 'goto out_set_gif' path. While the issue could've been fixed by initializing these on-stack varialbles to NULL, it seems preferable to eliminate 'out_set_gif' label completely as it is not actually a failure path and duplicating a single svm_set_gif() call doesn't look too bad. [ bp: Drop obscure Addresses-Coverity: tag. ] Fixes: 6ccbd29ade0d ("KVM: SVM: nested: Don't allocate VMCB structures on stack") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Joerg Roedel <jroedel@suse.de> Reported-by: Colin King <colin.king@canonical.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Joerg Roedel <jroedel@suse.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20200914133725.650221-1-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more kvm fixes from Paolo Bonzini: "Five small fixes. The nested migration bug will be fixed with a better API in 5.10 or 5.11, for now this is a fix that works with existing userspace but keeps the current ugly API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Add a dedicated INVD intercept routine KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE KVM: x86: fix MSR_IA32_TSC read for nested migration selftests: kvm: Fix assert failure in single-step test KVM: x86: VMX: Make smaller physical guest address space support user-configurable
2020-09-25KVM: SVM: Add a dedicated INVD intercept routineTom Lendacky
The INVD instruction intercept performs emulation. Emulation can't be done on an SEV guest because the guest memory is encrypted. Provide a dedicated intercept routine for the INVD intercept. And since the instruction is emulated as a NOP, just skip it instead. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <a0b9a19ffa7fef86a3cc700c7ea01cb2731e04e5.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-22Merge branch 'x86-seves-for-paolo' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD
2020-09-19KVM: SVM: Don't flush cache if hardware enforces cache coherency across ↵x86_cpu_for_v5.10Krish Sadhukhan
encryption domains In some hardware implementations, coherency between the encrypted and unencrypted mappings of the same physical page in a VM is enforced. In such a system, it is not required for software to flush the VM's page from all CPU caches in the system prior to changing the value of the C-bit for the page. So check that bit before flushing the cache. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200917212038.5090-4-krish.sadhukhan@oracle.com
2020-09-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "A bit on the bigger side, mostly due to me being on vacation, then busy, then on parental leave, but there's nothing worrisome. ARM: - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values x86: - nSVM state restore fixes - Async page fault fixes - Lots of small fixes everywhere" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: emulator: more strict rsm checks. KVM: nSVM: more strict SMM checks when returning to nested guest SVM: nSVM: setup nested msr permission bitmap on nested state load SVM: nSVM: correctly restore GIF on vmexit from nesting after migration x86/kvm: don't forget to ACK async PF IRQ x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit KVM: SVM: avoid emulation with stale next_rip KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN KVM: SVM: Periodically schedule when unregistering regions on destroy KVM: MIPS: Change the definition of kvm type kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control KVM: fix memory leak in kvm_io_bus_unregister_dev() KVM: Check the allocation of pv cpu mask KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected KVM: arm64: Update page shift if stage 2 block mapping not supported KVM: arm64: Fix address truncation in traces KVM: arm64: Do not try to map PUDs when they are folded into PMD arm64/x86: KVM: Introduce steal-time cap ...
2020-09-12KVM: nSVM: more strict SMM checks when returning to nested guestMaxim Levitsky
* check that guest is 64 bit guest, otherwise the SVM related fields in the smm state area are not defined * If the SMM area indicates that SMM interrupted a running guest, check that EFER.SVME which is also saved in this area is set, otherwise the guest might have tampered with SMM save area, and so indicate emulation failure which should triple fault the guest. * Check that that guest CPUID supports SVM (due to the same issue as above) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12SVM: nSVM: setup nested msr permission bitmap on nested state loadMaxim Levitsky
This code was missing and was forcing the L2 run with L1's msr permission bitmap Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12SVM: nSVM: correctly restore GIF on vmexit from nesting after migrationMaxim Levitsky
Currently code in svm_set_nested_state copies the current vmcb control area to L1 control area (hsave->control), under assumption that it mostly reflects the defaults that kvm choose, and later qemu overrides these defaults with L2 state using standard KVM interfaces, like KVM_SET_REGS. However nested GIF (which is AMD specific thing) is by default is true, and it is copied to hsave area as such. This alone is not a big deal since on VMexit, GIF is always set to false, regardless of what it was on VM entry. However in nested_svm_vmexit we were first were setting GIF to false, but then we overwrite the control fields with value from the hsave area. (including the nested GIF field itself if GIF virtualization is enabled). Now on normal vm entry this is not a problem, since GIF is usually false prior to normal vm entry, and this is the value that copied to hsave, and then restored, but this is not always the case when the nested state is loaded as explained above. To fix this issue, move svm_set_gif after we restore the L1 control state in nested_svm_vmexit, so that even with wrong GIF in the saved L1 control area, we still clear GIF as the spec says. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12KVM: SVM: avoid emulation with stale next_ripWanpeng Li
svm->next_rip is reset in svm_vcpu_run() only after calling svm_exit_handlers_fastpath(), which will cause SVM's skip_emulated_instruction() to write a stale RIP. We can move svm_exit_handlers_fastpath towards the end of svm_vcpu_run(). To align VMX with SVM, keep svm_complete_interrupts() close as well. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paul K. <kronenpj@kronenpj.dyndns.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> [Also move vmcb_mark_all_clean before any possible write to the VMCB. - Paolo]
2020-09-11KVM: SVM: Periodically schedule when unregistering regions on destroyDavid Rientjes
There may be many encrypted regions that need to be unregistered when a SEV VM is destroyed. This can lead to soft lockups. For example, on a host running 4.15: watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348] CPU: 206 PID: 194348 Comm: t_virtual_machi RIP: 0010:free_unref_page_list+0x105/0x170 ... Call Trace: [<0>] release_pages+0x159/0x3d0 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd] [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd] [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd] [<0>] kvm_arch_destroy_vm+0x47/0x200 [<0>] kvm_put_kvm+0x1a8/0x2f0 [<0>] kvm_vm_release+0x25/0x30 [<0>] do_exit+0x335/0xc10 [<0>] do_group_exit+0x3f/0xa0 [<0>] get_signal+0x1bc/0x670 [<0>] do_signal+0x31/0x130 Although the CLFLUSH is no longer issued on every encrypted region to be unregistered, there are no other changes that can prevent soft lockups for very large SEV VMs in the latest kernel. Periodically schedule if necessary. This still holds kvm->lock across the resched, but since this only happens when the VM is destroyed this is assumed to be acceptable. Signed-off-by: David Rientjes <rientjes@google.com> Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>