summaryrefslogtreecommitdiff
path: root/include/linux/kvm_host.h
AgeCommit message (Collapse)Author
2009-08-05KVM: fix ack not being delivered when msi presentMichael S. Tsirkin
kvm_notify_acked_irq does not check irq type, so that it sometimes interprets msi vector as irq. As a result, ack notifiers are not called, which typially hangs the guest. The fix is to track and check irq type. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-28KVM: protect concurrent make_all_cpus_requestMarcelo Tosatti
make_all_cpus_request contains a race condition which can trigger false request completed status, as follows: CPU0 CPU1 if (test_and_set_bit(req,&vcpu->requests)) .... if (test_and_set_bit(req,&vcpu->requests)) .. return proceed to smp_call_function_many(wait=1) Use a spinlock to serialize concurrent CPUs. Cc: stable@kernel.org Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: protect assigned dev workqueue, int handler and irq ackerMarcelo Tosatti
kvm_assigned_dev_ack_irq is vulnerable to a race condition with the interrupt handler function. It does: if (dev->host_irq_disabled) { enable_irq(dev->host_irq); dev->host_irq_disabled = false; } If an interrupt triggers before the host->dev_irq_disabled assignment, it will disable the interrupt and set dev->host_irq_disabled to true. On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to false, and the next kvm_assigned_dev_ack_irq call will fail to reenable it. Other than that, having the interrupt handler and work handlers run in parallel sounds like asking for trouble (could not spot any obvious problem, but better not have to, its fragile). CC: sheng.yang@intel.com Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: use smp_send_reschedule in kvm_vcpu_kickMarcelo Tosatti
KVM uses a function call IPI to cause the exit of a guest running on a physical cpu. For virtual interrupt notification there is no need to wait on IPI receival, or to execute any function. This is exactly what the reschedule IPI does, without the overhead of function IPI. So use it instead of smp_call_function_single in kvm_vcpu_kick. Also change the "guest_mode" variable to a bit in vcpu->requests, and use that to collapse multiple IPI's that would be issued between the first one and zeroing of guest mode. This allows kvm_vcpu_kick to called with interrupts disabled. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Enable snooping control for supported hardwareSheng Yang
Memory aliases with different memory type is a problem for guest. For the guest without assigned device, the memory type of guest memory would always been the same as host(WB); but for the assigned device, some part of memory may be used as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of host memory type then be a potential issue. Snooping control can guarantee the cache correctness of memory go through the DMA engine of VT-d. [avi: fix build on ia64] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Fix interrupt unhalting a vcpu when it shouldn'tGleb Natapov
kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Device assignment framework reworkSheng Yang
After discussion with Marcelo, we decided to rework device assignment framework together. The old problems are kernel logic is unnecessary complex. So Marcelo suggest to split it into a more elegant way: 1. Split host IRQ assign and guest IRQ assign. And userspace determine the combination. Also discard msi2intx parameter, userspace can specific KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to enable MSI to INTx convertion. 2. Split assign IRQ and deassign IRQ. Import two new ioctls: KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ. This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the old interface). [avi: replace homemade bitcount() by hweight_long()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: consolidate ioapic/ipi interrupt delivery logicGleb Natapov
Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in apic_send_ipi() to figure out ipi destination instead of reimplementing the logic. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: ioapic/msi interrupt delivery consolidationGleb Natapov
ioapic_deliver() and kvm_set_msi() have code duplication. Move the code into ioapic_deliver_entry() function and call it from both places. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10KVM: declare ioapic functions only on affected hardwareChristian Borntraeger
Since "KVM: Unify the delivery of IOAPIC and MSI interrupts" I get the following warnings: CC [M] arch/s390/kvm/kvm-s390.o In file included from arch/s390/kvm/kvm-s390.c:22: include/linux/kvm_host.h:357: warning: 'struct kvm_ioapic' declared inside parameter list include/linux/kvm_host.h:357: warning: its scope is only this definition or declaration, which is probably not what you want This patch limits IOAPIC functions for architectures that have one. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Add MSI-X interrupt injection logicSheng Yang
We have to handle more than one interrupt with one handler for MSI-X. Avi suggested to use a flag to indicate the pending. So here is it. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Ioctls for init MSI-X entrySheng Yang
Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls. This two ioctls are used by userspace to specific guest device MSI-X entry number and correlate MSI-X entry with GSI during the initialization stage. MSI-X should be well initialzed before enabling. Don't support change MSI-X entry number for now. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Unify the delivery of IOAPIC and MSI interruptsSheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24KVM: Report IRQ injection status to userspace.Gleb Natapov
IRQ injection status is either -1 (if there was no CPU found that should except the interrupt because IRQ was masked or ioapic was misconfigured or ...) or >= 0 in that case the number indicates to how many CPUs interrupt was injected. If the value is 0 it means that the interrupt was coalesced and probably should be reinjected. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24KVM: Fix kvmclock on !constant_tsc boxesGerd Hoffmann
kvmclock currently falls apart on machines without constant tsc. This patch fixes it. Changes: * keep tsc frequency in a per-cpu variable. * handle kvmclock update using a new request flag, thus checking whenever we need an update each time we enter guest context. * use a cpufreq notifier to track frequency changes and force kvmclock updates. * send ipis to kick cpu out of guest context if needed to make sure the guest doesn't see stale values. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24KVM: Use irq routing API for MSISheng Yang
Merge MSI userspace interface with IRQ routing table. Notice the API have been changed, and using IRQ routing table would be the only interface kvm-userspace supported. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24KVM: make irq ack notifications aware of routing tableMarcelo Tosatti
IRQ ack notifications assume an identity mapping between pin->gsi, which might not be the case with, for example, HPET. Translate before acking. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Gleb Natapov <gleb@redhat.com>
2009-03-24KVM: Userspace controlled irq routingAvi Kivity
Currently KVM has a static routing from GSI numbers to interrupts (namely, 0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to the IOAPIC). This is insufficient for several reasons: - HPET requires non 1:1 mapping for the timer interrupt - MSIs need a new method to assign interrupt numbers and dispatch them - ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the ioapics This patch implements an interrupt routing table (as a linked list, but this can be easily changed) and a userspace interface to replace the table. The routing table is initialized according to the current hardwired mapping. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24KVM: Interrupt mask notifiers for ioapicAvi Kivity
Allow clients to request notifications when the guest masks or unmasks a particular irq line. This complements irq ack notifications, as the guest will not ack an irq line that is masked. Currently implemented for the ioapic only. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24KVM: Remove duplicated prototype of kvm_arch_destroy_vmSheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24KVM: New guest debug interfaceJan Kiszka
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic part, controlling the "main switch" and the single-step feature. The arch specific part adds an x86 interface for intercepting both types of debug exceptions separately and re-injecting them when the host was not interested. Moveover, the foundation for guest debugging via debug registers is layed. To signal breakpoint events properly back to userland, an arch-specific data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block contains the PC, the debug exception, and relevant debug registers to tell debug events properly apart. The availability of this new interface is signaled by KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are provided. Note that both SVM and VTX are supported, but only the latter was tested yet. Based on the experience with all those VTX corner case, I would be fairly surprised if SVM will work out of the box. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15KVM: Add kvm_arch_sync_events to sync with asynchronize eventsSheng Yang
kvm_arch_sync_events is introduced to quiet down all other events may happen contemporary with VM destroy process, like IRQ handler and work struct for assigned device. For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so the state of KVM here is legal and can provide a environment to quiet down other events. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-01-03KVM: change KVM to use IOMMU APIJoerg Roedel
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03Deassign device in kvm_free_assgined_deviceWeidong Han
In kvm_iommu_unmap_memslots(), assigned_dev_head is already empty. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03KVM: support device deassignmentWeidong Han
Support device deassignment, it can be used in device hotplug. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03KVM: use the new intel iommu APIsWeidong Han
intel iommu APIs are updated, use the new APIs. In addition, change kvm_iommu_map_guest() to just create the domain, let kvm_iommu_assign_device() assign device. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-31KVM: fix handling of ACK from shared guest IRQMark McLoughlin
If an assigned device shares a guest irq with an emulated device then we currently interpret an ack generated by the emulated device as originating from the assigned device leading to e.g. "Unbalanced enable for IRQ 4347" from the enable_irq() in kvm_assigned_dev_ack_irq(). The fix is fairly simple - don't enable the physical device irq unless it was previously disabled. Of course, this can still lead to a situation where a non-assigned device ACK can cause the physical device irq to be reenabled before the device was serviced. However, being level sensitive, the interrupt will merely be regenerated. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: Add fields for MSI device assignmentSheng Yang
Prepared for kvm_arch_assigned_device_msi_dispatch(). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: Replace irq_requested with more generic irq_requested_typeSheng Yang
Separate guest irq type and host irq type, for we can support guest using INTx with host using MSI (but not opposite combination). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: IRQ ACK notifier should be used with in-kernel irqchipSheng Yang
Also remove unnecessary parameter of unregister irq ack notifier. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-28KVM: Fix guest shared interrupt with in-kernel irqchipSheng Yang
Every call of kvm_set_irq() should offer an irq_source_id, which is allocated by kvm_request_irq_source_id(). Based on irq_source_id, we identify the irq source and implement logical OR for shared level interrupts. The allocated irq_source_id can be freed by kvm_free_irq_source_id(). Currently, we support at most sizeof(unsigned long) different irq sources. [Amit: - rebase to kvm.git HEAD - move definition of KVM_USERSPACE_IRQ_SOURCE_ID to common file - move kvm_request_irq_source_id to the update_irq ioctl] [Xiantao: - Add kvm/ia64 stuff and make it work for kvm/ia64 guests] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: Separate irq ack notification out of arch/x86/kvm/irq.cXiantao Zhang
Moving irq ack notification logic as common, and make it shared with ia64 side. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Change is_mmio_pfn to kvm_is_mmio_pfn, and make it common for all archsXiantao Zhang
Add a kvm_ prefix to avoid polluting kernel's name space. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Move device assignment logic to common codeXiantao Zhang
To share with other archs, this patch moves device assignment logic to common parts. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: MMU: out of sync shadow coreMarcelo Tosatti
Allow guest pagetables to go out of sync. Instead of emulating write accesses to guest pagetables, or unshadowing them, we un-write-protect the page table and allow the guest to modify it at will. We rely on invlpg executions to synchronize individual ptes, and will synchronize the entire pagetable on tlb flushes. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: Device Assignment with VT-dBen-Ami Yassour
Based on a patch by: Kay, Allen M <allen.m.kay@intel.com> This patch enables PCI device assignment based on VT-d support. When a device is assigned to the guest, the guest memory is pinned and the mapping is updated in the VT-d IOMMU. [Amit: Expose KVM_CAP_IOMMU so we can check if an IOMMU is present and also control enable/disable from userspace] Signed-off-by: Kay, Allen M <allen.m.kay@intel.com> Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com> Signed-off-by: Amit Shah <amit.shah@qumranet.com> Acked-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: x86: do not execute halted vcpusMarcelo Tosatti
Offline or uninitialized vcpu's can be executed if requested to perform userspace work. Follow Avi's suggestion to handle halted vcpu's in the main loop, simplifying kvm_emulate_halt(). Introduce a new vcpu->requests bit to indicate events that promote state from halted to running. Also standardize vcpu wake sites. Signed-off-by: Marcelo Tosatti <mtosatti <at> redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Move KVM TRACE DEFINITIONS to common headerHollis Blanchard
Move KVM trace definitions from x86 specific kvm headers to common kvm headers to create a cross-architecture numbering scheme for trace events. This means the kvmtrace_format userspace tool won't need to know which architecture produced the log file being processed. Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29KVM: Synchronize guest physical memory map to host virtual memory mapAndrea Arcangeli
Synchronize changes to host virtual addresses which are part of a KVM memory slot to the KVM shadow mmu. This allows pte operations like swapping, page migration, and madvise() to transparently work with KVM. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destructionMarcelo Tosatti
Flush the shadow mmu before removing regions to avoid stale entries. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20KVM: Add coalesced MMIO support (common part)Laurent Vivier
This patch adds all needed structures to coalesce MMIOs. Until an architecture uses it, it is not compiled. Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that can be coalesced: - KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone. It requests one parameter (struct kvm_coalesced_mmio_zone) which defines a memory area where MMIOs can be coalesced until the next switch to user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX. - KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone). The userspace client can check kernel coalesced MMIO availability by asking ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability. The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported, or the page offset where will be stored the ring buffer. The page offset depends on the architecture. After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively to the address of the start of th kvm_run structure. The MMIO ring buffer is defined by the structure kvm_coalesced_mmio_ring. [akio: fix oops during guest shutdown] Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20KVM: kvm_io_device: extend in_range() to manage len and write attributeLaurent Vivier
Modify member in_range() of structure kvm_io_device to pass length and the type of the I/O (write or read). This modification allows to use kvm_io_device with coalesced MMIO. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20KVM: Remove decache_vcpus_on_cpu() and related callbacksAvi Kivity
Obsoleted by the vmx-specific per-cpu list. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: close timer injection race window in __vcpu_runMarcelo Tosatti
If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06KVM: migrate PIT timerMarcelo Tosatti
Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on, similarly to what is done for the LAPIC timers, otherwise PIT interrupts will be delayed until an unrelated event causes an exit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: kill file->f_count abuse in kvmAl Viro
Use kvm own refcounting instead of playing with ->filp->f_count. That will allow to get rid of a lot of crap in anon_inode_getfd() and kill a race in kvm_dev_ioctl_create_vm() (file might have been closed immediately by another thread, so ->filp might point to already freed struct file when we get around to setting it). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Rename debugfs_dir to kvm_debugfs_dirHollis Blanchard
It's a globally exported symbol now. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: add ioctls to save/store mpstateMarcelo Tosatti
So userspace can save/restore the mpstate during migration. [avi: export the #define constants describing the value] [christian: add s390 stubs] [avi: ditto for ia64] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: hlt emulation should take in-kernel APIC/PIT timers into accountMarcelo Tosatti
Timers that fire between guest hlt and vcpu_block's add_wait_queue() are ignored, possibly resulting in hangs. Also make sure that atomic_inc and waitqueue_active tests happen in the specified order, otherwise the following race is open: CPU0 CPU1 if (waitqueue_active(wq)) add_wait_queue() if (!atomic_read(pit_timer->pending)) schedule() atomic_inc(pit_timer->pending) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add kvm trace userspace interfaceFeng(Eric) Liu
This interface allows user a space application to read the trace of kvm related events through relayfs. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>