summaryrefslogtreecommitdiff
path: root/include/kvm
AgeCommit message (Collapse)Author
2021-06-01KVM: arm64: vgic: Implement SW-driven deactivationMarc Zyngier
In order to deal with these systems that do not offer HW-based deactivation of interrupts, let implement a SW-based approach: - When the irq is queued into a LR, treat it as a pure virtual interrupt and set the EOI flag in the LR. - When the interrupt state is read back from the LR, force a deactivation when the state is invalid (neither active nor pending) Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01KVM: arm64: vgic: move irq->get_input_level into an ops structureMarc Zyngier
We already have the option to attach a callback to an interrupt to retrieve its pending state. As we are planning to expand this facility, move this callback into its own data structure. This will limit the size of individual interrupts as the ops structures can be shared across multiple interrupts. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01KVM: arm64: vgic: Let an interrupt controller advertise lack of HW deactivationMarc Zyngier
The vGIC, as architected by ARM, allows a virtual interrupt to trigger the deactivation of a physical interrupt. This allows the following interrupt to be delivered without requiring an exit. However, some implementations have choosen not to implement this, meaning that we will need some unsavoury workarounds to deal with this. On detecting such a case, taint the kernel and spit a nastygram. We'll deal with this in later patches. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-22Merge branch 'kvm-arm64/kill_oprofile_dependency' into kvmarm-master/nextMarc Zyngier
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-22KVM: arm64: Divorce the perf code from oprofile helpersMarc Zyngier
KVM/arm64 is the sole user of perf_num_counters(), and really could do without it. Stop using the obsolete API by relying on the existing probing code. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210414134409.1266357-2-maz@kernel.org
2021-04-06KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspaceEric Auger
Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace") temporarily fixed a bug identified when attempting to access the GICR_TYPER register before the redistributor region setting, but dropped the support of the LAST bit. Emulating the GICR_TYPER.Last bit still makes sense for architecture compliance though. This patch restores its support (if the redistributor region was set) while keeping the code safe. We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which computes whether a redistributor is the highest one of a series of redistributor contributor pages. With this new implementation we do not need to have a uaccess read accessor anymore. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-9-eric.auger@redhat.com
2021-03-06KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static keyMarc Zyngier
We currently find out about the presence of a HW PMU (or the handling of that PMU by perf, which amounts to the same thing) in a fairly roundabout way, by checking the number of counters available to perf. That's good enough for now, but we will soon need to find about about that on paths where perf is out of reach (in the world switch). Instead, let's turn kvm_arm_support_pmu_v3() into a static key. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210209114844.3278746-2-maz@kernel.org Message-Id: <20210305185254.3730990-5-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-04KVM: arm64: Replace KVM_ARM_PMU with HW_PERF_EVENTSMarc Zyngier
KVM_ARM_PMU only existed for the benefit of 32bit ARM hosts, and makes no sense now that we are 64bit only. Get rid of it. Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-12-04Merge remote-tracking branch 'origin/kvm-arm64/misc-5.11' into ↵Marc Zyngier
kvmarm-master/queue Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-30KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bitShenming Lu
In order to reduce the impact of the VPT parsing happening on the GIC, we can split the vcpu reseidency in two phases: - programming GICR_VPENDBASER: this still happens in vcpu_load() - checking for the VPT parsing to be complete: this can happen on vcpu entry (in kvm_vgic_flush_hwstate()) This allows the GIC and the CPU to work in parallel, rewmoving some of the entry overhead. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Shenming Lu <lushenming@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201128141857.983-3-lushenming@huawei.com
2020-11-27KVM: arm64: Get rid of the PMU ready stateMarc Zyngier
The PMU ready state has no user left. Goodbye. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
2020-09-29KVM: arm64: Mask out filtered events in PCMEID{0,1}_EL1Marc Zyngier
As we can now hide events from the guest, let's also adjust its view of PCMEID{0,1}_EL1 so that it can figure out why some common events are not counting as they should. The astute user can still look into the TRM for their CPU and find out they've been cheated, though. Nobody's perfect. Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-28KVM: arm64: pmu: Make overflow handler NMI safeJulien Thierry
kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from NMI context, defer waking the vcpu to an irq_work queue. A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent running the irq_work for a non-existent vcpu by calling irq_work_sync() on the PMU destroy path. [Alexandru E.: Added irq_work_sync()] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20200924110706.254996-6-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-07-07KVM: arm64: timers: Move timer registers to the sys_regs fileMarc Zyngier
Move the timer gsisters to the sysreg file. This will further help when they are directly changed by a nesting hypervisor in the VNCR page. This requires moving the initialisation of the timer struct so that some of the helpers (such as arch_timer_ctx_index) can work correctly at an early stage. Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_userMarc Zyngier
kvm_timer_sync_hwstate() has nothing to do with the timer HW state, but more to do with the state of a userspace interrupt controller. Change the suffix from _hwstate to_user, in keeping with the rest of the code. Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-05-28KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpuChristoffer Dall
If we move the used_lrs field to the version-specific cpu interface structure, the following functions only operate on the struct vgic_v3_cpu_if and not the full vcpu: __vgic_v3_save_state __vgic_v3_restore_state __vgic_v3_activate_traps __vgic_v3_deactivate_traps __vgic_v3_save_aprs __vgic_v3_restore_aprs This is going to be very useful for nested virt, so move the used_lrs field and change the prototypes and implementations of these functions to take the cpu_if parameter directly. No functional change. Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-03-24KVM: arm64: GICv4.1: Allow SGIs to switch between HW and SW interruptsMarc Zyngier
In order to let a guest buy in the new, active-less SGIs, we need to be able to switch between the two modes. Handle this by stopping all guest activity, transfer the state from one mode to the other, and resume the guest. Nothing calls this code so far, but a later patch will plug it into the MMIO emulation. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20200304203330.4967-20-maz@kernel.org
2020-03-24irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layerMarc Zyngier
In order to hide some of the differences between v4.0 and v4.1, move the doorbell management out of the KVM code, and into the GICv4-specific layer. This allows the calling code to ask for the doorbell when blocking, and otherwise to leave the doorbell permanently disabled. This matches the v4.1 code perfectly, and only results in a minor refactoring of the v4.0 code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20200304203330.4967-14-maz@kernel.org
2019-11-08Merge remote-tracking branch 'kvmarm/misc-5.5' into kvmarm/nextMarc Zyngier
2019-10-29KVM: arm/arm64: vgic: Fix some comments typoZenghui Yu
Fix various comments, including wrong function names, grammar mistakes and specification references. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191029071919.177-3-yuzenghui@huawei.com
2019-10-29KVM: arm/arm64: vgic: Remove the declaration of kvm_send_userspace_msi()Zenghui Yu
The callsite of kvm_send_userspace_msi() is currently arch agnostic. There seems no reason to keep an extra declaration of it in arm_vgic.h (we already have one in include/linux/kvm_host.h). Remove it. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20191029071919.177-2-yuzenghui@huawei.com
2019-10-28KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/putMarc Zyngier
When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-10-21KVM: arm/arm64: Factor out hypercall handling from PSCI codeChristoffer Dall
We currently intertwine the KVM PSCI implementation with the general dispatch of hypercall handling, which makes perfect sense because PSCI is the only category of hypercalls we support. However, as we are about to support additional hypercalls, factor out this functionality into a separate hypercall handler file. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [steven.price@arm.com: rebased] Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-25KVM: arm/arm64: vgic: Use a single IO device per redistributorEric Auger
At the moment we use 2 IO devices per GICv3 redistributor: one one for the RD_base frame and one for the SGI_base frame. Instead we can use a single IO device per redistributor (the 2 frames are contiguous). This saves slots on the KVM_MMIO_BUS which is currently limited to NR_IOBUS_DEVS (1000). This change allows to instantiate up to 512 redistributors and may speed the guest boot with a large number of VCPUs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic: Add LPI translation cache definitionMarc Zyngier
Add the basic data structure that expresses an MSI to LPI translation as well as the allocation/release hooks. The size of the cache is arbitrarily defined as 16*nr_vcpus. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-05KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to blockMarc Zyngier
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: stable@vger.kernel.org # 4.12 Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-07-23KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter indexZenghui Yu
We use "pmc->idx" and the "chained" bitmap to determine if the pmc is chained, in kvm_pmu_pmc_is_chained(). But idx might be uninitialized (and random) when we doing this decision, through a KVM_ARM_VCPU_INIT ioctl -> kvm_pmu_vcpu_reset(). And the test_bit() against this random idx will potentially hit a KASAN BUG [1]. In general, idx is the static property of a PMU counter that is not expected to be modified across resets, as suggested by Julien. It looks more reasonable if we can setup the PMU counter idx for a vcpu in its creation time. Introduce a new function - kvm_pmu_vcpu_init() for this basic setup. Oh, and the KASAN BUG will get fixed this way. [1] https://www.spinics.net/lists/kvm-arm/msg36700.html Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") Suggested-by: Andrew Murray <andrew.murray@arm.com> Suggested-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-05KVM: arm/arm64: Support chained PMU countersAndrew Murray
ARMv8 provides support for chained PMU counters, where an event type of 0x001E is set for odd-numbered counters, the event counter will increment by one for each overflow of the preceding even-numbered counter. Let's emulate this in KVM by creating a 64 bit perf counter when a user chains two emulated counters together. For chained events we only support generating an overflow interrupt on the high counter. We use the attributes of the low counter to determine the attributes of the perf event. Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-05KVM: arm/arm64: Remove pmc->bitmaskAndrew Murray
We currently use pmc->bitmask to determine the width of the pmc - however it's superfluous as the pmc index already describes if the pmc is a cycle counter or event counter. The architecture clearly describes the widths of these counters. Let's remove the bitmask to simplify the code. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-05KVM: arm/arm64: Rename kvm_pmu_{enable/disable}_counter functionsAndrew Murray
The kvm_pmu_{enable/disable}_counter functions can enable/disable multiple counters at once as they operate on a bitmask. Let's make this clearer by renaming the function. Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 342Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 15 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190530000437.237481593@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 136 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190530000436.384967451@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - some cleanups - direct physical timer assignment - cache sanitization for 32-bit guests s390: - interrupt cleanup - introduction of the Guest Information Block - preparation for processor subfunctions in cpu models PPC: - bug fixes and improvements, especially related to machine checks and protection keys x86: - many, many cleanups, including removing a bunch of MMU code for unnecessary optimizations - AVIC fixes Generic: - memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits) kvm: vmx: fix formatting of a comment KVM: doc: Document the life cycle of a VM and its resources MAINTAINERS: Add KVM selftests to existing KVM entry Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char() KVM: PPC: Fix compilation when KVM is not enabled KVM: Minor cleanups for kvm_main.c KVM: s390: add debug logging for cpu model subfunctions KVM: s390: implement subfunction processor calls arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 KVM: arm/arm64: Remove unused timer variable KVM: PPC: Book3S: Improve KVM reference counting KVM: PPC: Book3S HV: Fix build failure without IOMMU support Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()" x86: kvmguest: use TSC clocksource if invariant TSC is exposed KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes() KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children ...
2019-02-19KVM: arm/arm64: Rework the timer code to use a timer_mapChristoffer Dall
We are currently emulating two timers in two different ways. When we add support for nested virtualization in the future, we are going to be emulating either two timers in two diffferent ways, or four timers in a single way. We need a unified data structure to keep track of how we map virtual state to physical state and we need to cleanup some of the timer code to operate more independently on a struct arch_timer_context instead of trying to consider the global state of the VCPU and recomputing all state. Co-written with Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-19KVM: arm/arm64: arch_timer: Assign the phys timer on VHE systemsChristoffer Dall
VHE systems don't have to emulate the physical timer, we can simply assign the EL1 physical timer directly to the VM as the host always uses the EL2 timers. In order to minimize the amount of cruft, AArch32 gets definitions for the physical timer too, but is should be generally unused on this architecture. Co-written with Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-19KVM: arm/arm64: timer: Rework data structures for multiple timersChristoffer Dall
Prepare for having 4 timer data structures (2 for now). Move loaded to the cpu data structure and not the individual timer structure, in preparation for assigning the EL1 phys timer as well. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-19KVM: arm/arm64: consolidate arch timer trap handlersAndre Przywara
At the moment we have separate system register emulation handlers for each timer register. Actually they are quite similar, and we rely on kvm_arm_timer_[gs]et_reg() for the actual emulation anyways, so let's just merge all of those handlers into one function, which just marshalls the arguments and then hands off to a set of common accessors. This makes extending the emulation to include EL2 timers much easier. Signed-off-by: Andre Przywara <andre.przywara@arm.com> [Fixed 32-bit VM breakage and reduced to reworking existing code] Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [Fixed 32bit host, general cleanup] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-19KVM: arm/arm64: Simplify bg_timer programmingChristoffer Dall
Instead of calling into kvm_timer_[un]schedule from the main kvm blocking path, test if the VCPU is on the wait queue from the load/put path and perform the background timer setup/cancel in this path. This has the distinct advantage that we no longer race between load/put and schedule/unschedule and programming and canceling of the bg_timer always happens when the timer state is not loaded. Note that we must now remove the checks in kvm_timer_blocking that do not schedule a background timer if one of the timers can fire, because we no longer have a guarantee that kvm_vcpu_check_block() will be called before kvm_timer_blocking. Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-01-24KVM: arm/arm64: vgic: Make vgic_cpu->ap_list_lock a raw_spinlockJulien Thierry
vgic_cpu->ap_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-01-24KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlockJulien Thierry
vgic_dist->lpi_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-01-24KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlockJulien Thierry
vgic_irq->irq_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-12-19KVM: arm/arm64: Remove arch timer workqueueChristoffer Dall
The use of a work queue in the hrtimer expire function for the bg_timer is a leftover from the time when we would inject interrupts when the bg_timer expired. Since we are no longer doing that, we can instead call kvm_vcpu_wake_up() directly from the hrtimer function and remove all workqueue functionality from the arch timer code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIsMarc Zyngier
Although vgic-v3 now supports Group0 interrupts, it still doesn't deal with Group0 SGIs. As usually with the GIC, nothing is simple: - ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1 with KVM (as per 8.1.10, Non-secure EL1 access) - ICC_SGI0R can only generate Group0 SGIs - ICC_ASGI1R sees its scope refocussed to generate only Group0 SGIs (as per the note at the bottom of Table 8-14) We only support Group1 SGIs so far, so no material change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21KVM: arm/arm64: vgic: Let userspace opt-in to writable v2 IGROUPRChristoffer Dall
Simply letting IGROUPR be writable from userspace would break migration from old kernels to newer kernels, because old kernels incorrectly report interrupt groups as group 1. This would not be a big problem if userspace wrote GICD_IIDR as read from the kernel, because we could detect the incompatibility and return an error to userspace. Unfortunately, this is not the case with current userspace implementations and simply letting IGROUPR be writable from userspace for an emulated GICv2 silently breaks migration and causes the destination VM to no longer run after migration. We now encourage userspace to write the read and expected value of GICD_IIDR as the first part of a GIC register restore, and if we observe a write to GICD_IIDR we know that userspace has been updated and has had a chance to cope with older kernels (VGICv2 IIDR.Revision == 0) incorrectly reporting interrupts as group 1, and therefore we now allow groups to be user writable. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21KVM: arm/arm64: vgic: Add group field to struct irqChristoffer Dall
In preparation for proper group 0 and group 1 support in the vgic, we add a field in the struct irq to store the group of all interrupts. We initialize the group to group 0 when emulating GICv2 and to group 1 when emulating GICv3, just like we treat them today. LPIs are always group 1. We also continue to ignore writes from the guest, preserving existing functionality, for now. Finally, we also add this field to the vgic debug logic to show the group for all interrupts. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21KVM: arm/arm64: vgic: Keep track of implementation revisionChristoffer Dall
As we are about to tweak implementation aspects of the VGIC emulation, while still preserving some level of backwards compatibility support, add a field to keep track of the implementation revision field which is reported to the VM and to userspace. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to 512Eric Auger
Let's raise the number of supported vcpus along with vgic v3 now that HW is looming with more physical CPUs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25KVM: arm/arm64: Remove kvm_vgic_vcpu_early_initEric Auger
kvm_vgic_vcpu_early_init gets called after kvm_vgic_cpu_init which is confusing. The call path is as follows: kvm_vm_ioctl_create_vcpu |_ kvm_arch_cpu_create |_ kvm_vcpu_init |_ kvm_arch_vcpu_init |_ kvm_vgic_vcpu_init |_ kvm_arch_vcpu_postcreate |_ kvm_vgic_vcpu_early_init Static initialization currently done in kvm_vgic_vcpu_early_init() can be moved to kvm_vgic_vcpu_init(). So let's move the code and remove kvm_vgic_vcpu_early_init(). kvm_arch_vcpu_postcreate() does nothing. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>