summaryrefslogtreecommitdiff
path: root/arch/x86/xen
AgeCommit message (Collapse)Author
2014-03-19Merge remote-tracking branch 'xen-tip/linux-next'Stephen Rothwell
2014-03-14Merge branch 'x86/vdso'Ingo Molnar
2014-03-14Merge branch 'x86/nuke-platforms'Ingo Molnar
2014-03-13x86, vdso, xen: Remove stray reference to FIX_VDSOH. Peter Anvin
Checkin b0b49f2673f0 x86, vdso: Remove compat vdso support ... removed the VDSO from the fixmap, and thus FIX_VDSO; remove a stray reference in Xen. Found by Fengguang Wu's test robot. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Link: http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-10xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_overrideZoltan Kiss
The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the bulk of the original function (everything after the mapping hypercall) is moved to arch-dependent set/clear_foreign_p2m_mapping - the "if (xen_feature(XENFEAT_auto_translated_physmap))" branch goes to ARM - therefore the ARM function could be much smaller, the m2p_override stubs could be also removed - on x86 the set_phys_to_machine calls were moved up to this new funcion from m2p_override functions - and m2p_override functions are only called when there is a kmap_ops param It also removes a stray space from arch/x86/include/asm/xen/page.h. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by: Anthony Liguori <aliguori@amazon.com> Suggested-by: David Vrabel <david.vrabel@citrix.com> Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-03-03xen: remove XEN_PRIVILEGED_GUESTMichael Opdenacker
This patch removes the Kconfig symbol XEN_PRIVILEGED_GUEST which is used nowhere in the tree. We do know grub2 has a script that greps kernel configuration files for its macro. It shouldn't do that. As Linus summarized: This is a grub bug. It really is that simple. Treat it as one. Besides, grub2's grepping for that macro is actually superfluous. See, that script currently contains this test (simplified): grep -x CONFIG_XEN_DOM0=y $config || grep -x CONFIG_XEN_PRIVILEGED_GUEST=y $config But since XEN_DOM0 and XEN_PRIVILEGED_GUEST are by definition equal, removing XEN_PRIVILEGED_GUEST cannot influence this test. So there's no reason to not remove this symbol, like we do with all unused Kconfig symbols. [pebolle@tiscali.nl: rewrote commit explanation.] Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-28x86: remove the Xen-specific _PAGE_IOMAP PTE flagDavid Vrabel
The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs that were used to map I/O regions that are 1:1 in the p2m. This allowed Xen to obtain the correct PFN when converting the MFNs read from a PTE back to their PFN. Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn() returns the correct PFN by using a combination of the m2p and p2m to determine if an MFN corresponds to a 1:1 mapping in the the p2m. Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for future uses of the PTE flag. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org -- This depends on the preceeding Xen changes, so I think it would be best if this was acked by an x86 maintainer and merged via the Xen tree. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-28x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappingsDavid Vrabel
Since mfn_to_pfn() returns the correct PFN for identity mappings (as used for MMIO regions), the use of _PAGE_IOMAP is not required in pte_mfn_to_pfn(). Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it in pte_mfn_to_pfn(). This will allow _PAGE_IOMAP to be removed, making it available for future use. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-28x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range()David Vrabel
_PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate the MFN. If mfn_pte() is used instead, the p2m look up is avoided and the use of _PAGE_IOMAP is no longer needed. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-28x86/xen: set regions above the end of RAM as 1:1David Vrabel
PCI devices may have BARs located above the end of RAM so mark such frames as identity frames in the p2m (instead of the default of missing). PFNs outside the p2m (above MAX_P2M_PFN) are also considered to be identity frames for the same reason. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-28x86/xen: only warn once if bad MFNs are found during setupDavid Vrabel
In xen_add_extra_mem(), if the WARN() checks for bad MFNs trigger it is likely that they will trigger at lot, spamming the log. Use WARN_ONCE() instead. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-28x86/xen: compactly store large identity ranges in the p2mDavid Vrabel
Large (multi-GB) identity ranges currently require a unique middle page (filled with p2m_identity entries) per 1 GB region. Similar to the common p2m_mid_missing middle page for large missing regions, introduce a p2m_mid_identity page (filled with p2m_identity entries) which can be used instead. set_phys_range_identity() thus only needs to allocate new middle pages at the beginning and end of the range. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-28x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFNDavid Vrabel
Allow set_phys_range_identity() to work with a range that overlaps MAX_P2M_PFN by clamping pfn_e to MAX_P2M_PFN. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-28x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle()David Vrabel
early_p2m_alloc_middle() allocates a new leaf page and early_p2m_alloc() allocates a new middle page. This is confusing. Swap the names so they match what the functions actually do. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-27x86, platforms: Remove SGI Visual WorkstationH. Peter Anvin
The SGI Visual Workstation seems to be dead; remove support so we don't have to continue maintaining it. Cc: Andrey Panin <pazke@donpac.ru> Cc: Michael Reed <mdr@sgi.com> Link: http://lkml.kernel.org/r/530CFD6C.7040705@zytor.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-10xen: properly account for _PAGE_NUMA during xen pte translationsMel Gorman
Steven Noonan forwarded a users report where they had a problem starting vsftpd on a Xen paravirtualized guest, with this in dmesg: BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067 page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x2ffc0000000014(referenced|dirty) addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74 CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1 Call Trace: dump_stack+0x45/0x56 print_bad_pte+0x22e/0x250 unmap_single_vma+0x583/0x890 unmap_vmas+0x65/0x90 exit_mmap+0xc5/0x170 mmput+0x65/0x100 do_exit+0x393/0x9e0 do_group_exit+0xcc/0x140 SyS_exit_group+0x14/0x20 system_call_fastpath+0x1a/0x1f Disabling lock debugging due to kernel taint BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1 BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1 The issue could not be reproduced under an HVM instance with the same kernel, so it appears to be exclusive to paravirtual Xen guests. He bisected the problem to commit 1667918b6483 ("mm: numa: clear numa hinting information on mprotect") that was also included in 3.12-stable. The problem was related to how xen translates ptes because it was not accounting for the _PAGE_NUMA bit. This patch splits pte_present to add a pteval_present helper for use by xen so both bare metal and xen use the same code when checking if a PTE is present. [mgorman@suse.de: wrote changelog, proposed minor modifications] [akpm@linux-foundation.org: fix typo in comment] Reported-by: Steven Noonan <steven@uplinklabs.net> Tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-05Merge tag 'stable/for-linus-3.14-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from Konrad Rzeszutek Wilk: "Bug-fixes: - Revert "xen/grant-table: Avoid m2p_override during mapping" as it broke Xen ARM build. - Fix CR4 not being set on AP processors in Xen PVH mode" * tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvh: set CR4 flags for APs Revert "xen/grant-table: Avoid m2p_override during mapping"
2014-02-03xen/pvh: set CR4 flags for APsMukesh Rathor
During bootup in the 'probe_page_size_mask' these CR4 flags are set in there. But for AP processors they are not set as we do not use 'secondary_startup_64' which the baremetal kernels uses. Instead do it in this function which we use in Xen PVH during our startup for AP processors. As such fix it up to make sure we have that flag set. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-03Revert "xen/grant-table: Avoid m2p_override during mapping"Konrad Rzeszutek Wilk
This reverts commit 08ece5bb2312b4510b161a6ef6682f37f4eac8a1. As it breaks ARM builds and needs more attention on the ARM side. Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-31Merge tag 'stable/for-linus-3.14-rc0-late-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bugfixes from Konrad Rzeszutek Wilk: "Bug-fixes for the new features that were added during this cycle. There are also two fixes for long-standing issues for which we have a solution: grant-table operations extra work that was not needed causing performance issues and the self balloon code was too aggressive causing OOMs. Details: - Xen ARM couldn't use the new FIFO events - Xen ARM couldn't use the SWIOTLB if compiled as 32-bit with 64-bit PCIe devices. - Grant table were doing needless M2P operations. - Ratchet down the self-balloon code so it won't OOM. - Fix misplaced kfree in Xen PVH error code paths" * tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvh: Fix misplaced kfree from xlated_setup_gnttab_pages drivers: xen: deaggressive selfballoon driver xen/grant-table: Avoid m2p_override during mapping xen/gnttab: Use phys_addr_t to describe the grant frame base address xen: swiotlb: handle sizeof(dma_addr_t) != sizeof(phys_addr_t) arm/xen: Initialize event channels earlier
2014-01-31xen/pvh: Fix misplaced kfree from xlated_setup_gnttab_pagesDave Jones
Passing a freed 'pages' to free_xenballooned_pages will end badly on kernels with slub debug enabled. This looks out of place between the rc assign and the check, but we do want to kfree pages regardless of which path we take. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-31xen/grant-table: Avoid m2p_override during mappingZoltan Kiss
The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page->index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page->private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by: David Vrabel <david.vrabel@citrix.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-30Merge branch 'x86-asmlinkage-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asmlinkage (LTO) changes from Peter Anvin: "This patchset adds more infrastructure for link time optimization (LTO). This patchset was pulled into my tree late because of a miscommunication (part of the patchset was picked up by other maintainers). However, the patchset is strictly build-related and seems to be okay in testing" * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, asmlinkage, xen: Fix type of NMI x86, asmlinkage, xen, kvm: Make {xen,kvm}_lock_spinning global and visible x86: Use inline assembler instead of global register variable to get sp x86, asmlinkage, paravirt: Make paravirt thunks global x86, asmlinkage, paravirt: Don't rely on local assembler labels x86, asmlinkage, lguest: Fix C functions used by inline assembler
2014-01-29x86, asmlinkage, xen: Fix type of NMIAndi Kleen
LTO requires consistent types of symbols over all files. So "nmi" cannot be declared as a char [] here, need to use the correct function type. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-8-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-29x86, asmlinkage, xen, kvm: Make {xen,kvm}_lock_spinning global and visibleAndi Kleen
These functions are called from inline assembler stubs, thus need to be global and visible. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-7-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-29x86, asmlinkage, paravirt: Make paravirt thunks globalAndi Kleen
The paravirt thunks use a hack of using a static reference to a static function to reference that function from the top level statement. This assumes that gcc always generates static function names in a specific format, which is not necessarily true. Simply make these functions global and asmlinkage or __visible. This way the static __used variables are not needed and everything works. Functions with arguments are __visible to keep the register calling convention on 32bit. Changed in paravirt and in all users (Xen and vsmp) v2: Use __visible for functions with arguments Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ido Yariv <ido@wizery.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-21xen/pvh: Set X86_CR0_WP and others in CR0 (v2)Roger Pau Monne
otherwise we will get for some user-space applications that use 'clone' with CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID end up hitting an assert in glibc manifested by: general protection ip:7f80720d364c sp:7fff98fd8a80 error:0 in libc-2.13.so[7f807209e000+180000] This is due to the nature of said operations which sets and clears the PID. "In the successful one I can see that the page table of the parent process has been updated successfully to use a different physical page, so the write of the tid on that page only affects the child... On the other hand, in the failed case, the write seems to happen before the copy of the original page is done, so both the parent and the child end up with the same value (because the parent copies the page after the write of the child tid has already happened)." (Roger's analysis). The nature of this is due to the Xen's commit of 51e2cac257ec8b4080d89f0855c498cbbd76a5e5 "x86/pvh: set only minimal cr0 and cr4 flags in order to use paging" the CR0_WP was removed so COW features of the Linux kernel were not operating properly. While doing that also update the rest of the CR0 flags to be inline with what a baremetal Linux kernel would set them to. In 'secondary_startup_64' (baremetal Linux) sets: X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | X86_CR0_PG The hypervisor for HVM type guests (which PVH is a bit) sets: X86_CR0_PE | X86_CR0_ET | X86_CR0_TS For PVH it specifically sets: X86_CR0_PG Which means we need to set the rest: X86_CR0_MP | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM to have full parity. Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Took out the cr4 writes to be a seperate patch] [v2: 0-DAY kernel found xen_setup_gdt to be missing a static]
2014-01-10xen/pvh: Use 'depend' instead of 'select'.Konrad Rzeszutek Wilk
The usage of 'select' means it will enable the CONFIG options without checking their dependencies. That meant we would inadvertently turn on CONFIG_XEN_PVHM while its core dependency (CONFIG_PCI) was turned off. This patch fixes the warnings and compile failures: warning: (XEN_PVH) selects XEN_PVHVM which has unmet direct dependencies (HYPERVISOR_GUEST && XEN && PCI && X86_LOCAL_APIC) Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-07xen/pvh: remove duplicated include from enlighten.cWei Yongjun
Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-07xen/pvh: Fix compile issues with xen_pvh_domain()Konrad Rzeszutek Wilk
Oddly enough it compiles for my ancient compiler but with the supplied .config it does blow up. Fix is easy enough. Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/pvh: Support ParaVirtualized Hardware extensions (v3).Mukesh Rathor
PVH allows PV linux guest to utilize hardware extended capabilities, such as running MMU updates in a HVM container. The Xen side defines PVH as (from docs/misc/pvh-readme.txt, with modifications): "* the guest uses auto translate: - p2m is managed by Xen - pagetables are owned by the guest - mmu_update hypercall not available * it uses event callback and not vlapic emulation, * IDT is native, so set_trap_table hcall is also N/A for a PVH guest. For a full list of hcalls supported for PVH, see pvh_hypercall64_table in arch/x86/hvm/hvm.c in xen. From the ABI prespective, it's mostly a PV guest with auto translate, although it does use hvm_op for setting callback vector." Use .ascii and .asciz to define xen feature string. Note, the PVH string must be in a single line (not multiple lines with \) to keep the assembler from putting null char after each string before \. This patch allows it to be configured and enabled. We also use introduce the 'XEN_ELFNOTE_SUPPORTED_FEATURES' ELF note to tell the hypervisor that 'hvm_callback_vector' is what the kernel needs. We can not put it in 'XEN_ELFNOTE_FEATURES' as older hypervisor parse fields they don't understand as errors and refuse to load the kernel. This work-around fixes the problem. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/pvh: Piggyback on PVHVM for grant driver (v4)Konrad Rzeszutek Wilk
In PVH the shared grant frame is the PFN and not MFN, hence its mapped via the same code path as HVM. The allocation of the grant frame is done differently - we do not use the early platform-pci driver and have an ioremap area - instead we use balloon memory and stitch all of the non-contingous pages in a virtualized area. That means when we call the hypervisor to replace the GMFN with a XENMAPSPACE_grant_table type, we need to lookup the old PFN for every iteration instead of assuming a flat contingous PFN allocation. Lastly, we only use v1 for grants. This is because PVHVM is not able to use v2 due to no XENMEM_add_to_physmap calls on the error status page (see commit 69e8f430e243d657c2053f097efebc2e2cd559f0 xen/granttable: Disable grant v2 for HVM domains.) Until that is implemented this workaround has to be in place. Also per suggestions by Stefano utilize the PVHVM paths as they share common functionality. v2 of this patch moves most of the PVH code out in the arch/x86/xen/grant-table driver and touches only minimally the generic driver. v3, v4: fixes us some of the code due to earlier patches. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/pvh: Piggyback on PVHVM for event channels (v2)Mukesh Rathor
PVH is a PV guest with a twist - there are certain things that work in it like HVM and some like PV. There is a similar mode - PVHVM where we run in HVM mode with PV code enabled - and this patch explores that. The most notable PV interfaces are the XenBus and event channels. We will piggyback on how the event channel mechanism is used in PVHVM - that is we want the normal native IRQ mechanism and we will install a vector (hvm callback) for which we will call the event channel mechanism. This means that from a pvops perspective, we can use native_irq_ops instead of the Xen PV specific. Albeit in the future we could support pirq_eoi_map. But that is a feature request that can be shared with PVHVM. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/pvh: Update E820 to work with PVH (v2)Mukesh Rathor
In xen_add_extra_mem() we can skip updating P2M as it's managed by Xen. PVH maps the entire IO space, but only RAM pages need to be repopulated. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/pvh: Secondary VCPU bringup (non-bootup CPUs)Mukesh Rathor
The VCPU bringup protocol follows the PV with certain twists. From xen/include/public/arch-x86/xen.h: Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise for HVM and PVH guests, not all information in this structure is updated: - For HVM guests, the structures read include: fpu_ctxt (if VGCT_I387_VALID is set), flags, user_regs, debugreg[*] - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to set cr3. All other fields not used should be set to 0. This is what we do. We piggyback on the 'xen_setup_gdt' - but modify a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt' can load per-cpu data-structures. It has no effect on the VCPU0. We also piggyback on the %rdi register to pass in the CPU number - so that when we bootup a new CPU, the cpu_bringup_and_idle will have passed as the first parameter the CPU number (via %rdi for 64-bit). Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/pvh: Load GDT/GS in early PV bootup code for BSP.Mukesh Rathor
During early bootup we start life using the Xen provided GDT, which means that we are running with %cs segment set to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261). But for PVH we want to be use HVM type mechanism for segment operations. As such we need to switch to the HVM one and also reload ourselves with the __KERNEL_CS:eip to run in the proper GDT and segment. For HVM this is usually done in 'secondary_startup_64' in (head_64.S) but since we are not taking that bootup path (we start in PV - xen_start_kernel) we need to do that in the early PV bootup paths. For good measure we also zero out the %fs, %ds, and %es (not strictly needed as Xen has already cleared them for us). The %gs is loaded by 'switch_to_new_gdt'. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2014-01-06xen/pvh: Setup up shared_info.Mukesh Rathor
For PVHVM the shared_info structure is provided via the same way as for normal PV guests (see include/xen/interface/xen.h). That is during bootup we get 'xen_start_info' via the %esi register in startup_xen. Then later we extract the 'shared_info' from said structure (in xen_setup_shared_info) and start using it. The 'xen_setup_shared_info' is all setup to work with auto-xlat guests, but there are two functions which it calls that are not: xen_setup_mfn_list_list and xen_setup_vcpu_info_placement. This patch modifies the P2M code (xen_setup_mfn_list_list) while the "Piggyback on PVHVM for event channels" modifies the xen_setup_vcpu_info_placement. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/pvh/mmu: Use PV TLB instead of native.Mukesh Rathor
We also optimize one - the TLB flush. The native operation would needlessly IPI offline VCPUs causing extra wakeups. Using the Xen one avoids that and lets the hypervisor determine which VCPU needs the TLB flush. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/pvh: MMU changes for PVH (v2)Mukesh Rathor
.. which are surprisingly small compared to the amount for PV code. PVH uses mostly native mmu ops, we leave the generic (native_*) for the majority and just overwrite the baremetal with the ones we need. At startup, we are running with pre-allocated page-tables courtesy of the tool-stack. But we still need to graft them in the Linux initial pagetables. However there is no need to unpin/pin and change them to R/O or R/W. Note that the xen_pagetable_init due to 7836fec9d0994cc9c9150c5a33f0eb0eb08a335a "xen/mmu/p2m: Refactor the xen_pagetable_init code." does not need any changes - we just need to make sure that xen_post_allocator_init does not alter the pvops from the default native one. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/mmu: Cleanup xen_pagetable_p2m_copy a bit.Konrad Rzeszutek Wilk
Stefano noticed that the code runs only under 64-bit so the comments about 32-bit are pointless. Also we change the condition for xen_revector_p2m_tree returning the same value (because it could not allocate a swath of space to put the new P2M in) or it had been called once already. In such we return early from the function. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/mmu/p2m: Refactor the xen_pagetable_init code (v2).Konrad Rzeszutek Wilk
The revectoring and copying of the P2M only happens when !auto-xlat and on 64-bit builds. It is not obvious from the code, so lets have seperate 32 and 64-bit functions. We also invert the check for auto-xlat to make the code flow simpler. Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/pvh: Don't setup P2M tree.Konrad Rzeszutek Wilk
P2M is not available for PVH. Fortunatly for us the P2M code already has mostly the support for auto-xlat guest thanks to commit 3d24bbd7dddbea54358a9795abaf051b0f18973c "grant-table: call set_phys_to_machine after mapping grant refs" which: " introduces set_phys_to_machine calls for auto_translated guests (even on x86) in gnttab_map_refs and gnttab_unmap_refs. translated by swiotlb-xen... " so we don't need to muck much. with above mentioned "commit you'll get set_phys_to_machine calls from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do anything with them " (Stefano Stabellini) which is OK - we want them to be NOPs. This is because we assume that an "IOMMU is always present on the plaform and Xen is going to make the appropriate IOMMU pagetable changes in the hypercall implementation of GNTTABOP_map_grant_ref and GNTTABOP_unmap_grant_ref, then eveything should be transparent from PVH priviligied point of view and DMA transfers involving foreign pages keep working with no issues[sp] Otherwise we would need a P2M (and an M2P) for PVH priviligied to track these foreign pages .. (see arch/arm/xen/p2m.c)." (Stefano Stabellini). We still have to inhibit the building of the P2M tree. That had been done in the past by not calling xen_build_dynamic_phys_to_machine (which setups the P2M tree and gives us virtual address to access them). But we are missing a check for xen_build_mfn_list_list - which was continuing to setup the P2M tree and would blow up at trying to get the virtual address of p2m_missing (which would have been setup by xen_build_dynamic_phys_to_machine). Hence a check is needed to not call xen_build_mfn_list_list when running in auto-xlat mode. Instead of replicating the check for auto-xlat in enlighten.c do it in the p2m.c code. The reason is that the xen_build_mfn_list_list is called also in xen_arch_post_suspend without any checks for auto-xlat. So for PVH or PV with auto-xlat - we would needlessly allocate space for an P2M tree. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/pvh: Early bootup changes in PV code (v4).Mukesh Rathor
We don't use the filtering that 'xen_cpuid' is doing because the hypervisor treats 'XEN_EMULATE_PREFIX' as an invalid instruction. This means that all of the filtering will have to be done in the hypervisor/toolstack. Without the filtering we expose to the guest the: - cpu topology (sockets, cores, etc); - the APERF (which the generic scheduler likes to use), see 5e626254206a709c6e937f3dda69bf26c7344f6f "xen/setup: filter APERFMPERF cpuid feature out" - and the inability to figure out whether MWAIT_LEAF should be exposed or not. See df88b2d96e36d9a9e325bfcd12eb45671cbbc937 "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded." - x2apic, see 4ea9b9aca90cfc71e6872ed3522356755162932c "xen: mask x2APIC feature in PV" We also check for vector callback early on, as it is a required feature. PVH also runs at default kernel IOPL. Finally, pure PV settings are moved to a separate function that are only called for pure PV, ie, pv with pvmmu. They are also #ifdef with CONFIG_XEN_PVMMU. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/pvh/x86: Define what an PVH guest is (v3).Mukesh Rathor
Which is a PV guest with auto page translation enabled and with vector callback. It is a cross between PVHVM and PV. The Xen side defines PVH as (from docs/misc/pvh-readme.txt, with modifications): "* the guest uses auto translate: - p2m is managed by Xen - pagetables are owned by the guest - mmu_update hypercall not available * it uses event callback and not vlapic emulation, * IDT is native, so set_trap_table hcall is also N/A for a PVH guest. For a full list of hcalls supported for PVH, see pvh_hypercall64_table in arch/x86/hvm/hvm.c in xen. From the ABI prespective, it's mostly a PV guest with auto translate, although it does use hvm_op for setting callback vector." Also we use the PV cpuid, albeit we can use the HVM (native) cpuid. However, we do have a fair bit of filtering in the xen_cpuid and we can piggyback on that until the hypervisor/toolstack filters the appropiate cpuids. Once that is done we can swap over to use the native one. We setup a Kconfig entry that is disabled by default and cannot be enabled. Note that on ARM the concept of PVH is non-existent. As Ian put it: "an ARM guest is neither PV nor HVM nor PVHVM. It's a bit like PVH but is different also (it's further towards the H end of the spectrum than even PVH).". As such these options (PVHVM, PVH) are never enabled nor seen on ARM compilations. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/x86: set VIRQ_TIMER priority to maximumDavid Vrabel
Commit bee980d9e (xen/events: Handle VIRQ_TIMER before any other hardirq in event loop) effectively made the VIRQ_TIMER the highest priority event when using the 2-level ABI. Set the VIRQ_TIMER priority to the highest so this behaviour is retained when using the FIFO-based ABI. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2014-01-03xen/pvhvm: Remove the xen_platform_pci int.Konrad Rzeszutek Wilk
Since we have xen_has_pv_devices,xen_has_pv_disk_devices, xen_has_pv_nic_devices, and xen_has_pv_and_legacy_disk_devices to figure out the different 'unplug' behaviors - lets use those instead of this single int. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-03xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4).Konrad Rzeszutek Wilk
The user has the option of disabling the platform driver: 00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) which is used to unplug the emulated drivers (IDE, Realtek 8169, etc) and allow the PV drivers to take over. If the user wishes to disable that they can set: xen_platform_pci=0 (in the guest config file) or xen_emul_unplug=never (on the Linux command line) except it does not work properly. The PV drivers still try to load and since the Xen platform driver is not run - and it has not initialized the grant tables, most of the PV drivers stumble upon: input: Xen Virtual Keyboard as /devices/virtual/input/input5 input: Xen Virtual Pointer as /devices/virtual/input/input6M ------------[ cut here ]------------ kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206! invalid opcode: 0000 [#1] SMP Modules linked in: xen_kbdfront(+) xenfs xen_privcmd CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1 Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013 RIP: 0010:[<ffffffff813ddc40>] [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300 Call Trace: [<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240 [<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70 [<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront] [<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront] [<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130 [<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50 [<ffffffff8145e9a9>] driver_probe_device+0x89/0x230 [<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230 [<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0 [<ffffffff8145e7d9>] driver_attach+0x19/0x20 [<ffffffff8145e260>] bus_add_driver+0x1a0/0x220 [<ffffffff8145f1ff>] driver_register+0x5f/0xf0 [<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20 [<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40 [<ffffffffa0015000>] ? 0xffffffffa0014fff [<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront] [<ffffffff81002049>] do_one_initcall+0x49/0x170 .. snip.. which is hardly nice. This patch fixes this by having each PV driver check for: - if running in PV, then it is fine to execute (as that is their native environment). - if running in HVM, check if user wanted 'xen_emul_unplug=never', in which case bail out and don't load any PV drivers. - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci) does not exist, then bail out and not load PV drivers. - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks', then bail out for all PV devices _except_ the block one. Ditto for the network one ('nics'). - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary' then load block PV driver, and also setup the legacy IDE paths. In (v3) make it actually load PV drivers. Reported-by: Sander Eikelenboom <linux@eikelenboom.it Reported-by: Anthony PERARD <anthony.perard@citrix.com> Reported-and-Tested-by: Fabio Fantoni <fabio.fantoni@m2r.biz> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Add extra logic to handle the myrid ways 'xen_emul_unplug' can be used per Ian and Stefano suggestion] [v3: Make the unnecessary case work properly] [v4: s/disks/ide-disks/ spotted by Fabio] Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts] CC: stable@vger.kernel.org
2013-11-15Merge tag 'stable/for-linus-3.13-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from Konrad Rzeszutek Wilk: "This has tons of fixes and two major features which are concentrated around the Xen SWIOTLB library. The short <blurb> is that the tracing facility (just one function) has been added to SWIOTLB to make it easier to track I/O progress. Additionally under Xen and ARM (32 & 64) the Xen-SWIOTLB driver "is used to translate physical to machine and machine to physical addresses of foreign[guest] pages for DMA operations" (Stefano) when booting under hardware without proper IOMMU. There are also bug-fixes, cleanups, compile warning fixes, etc. The commit times for some of the commits is a bit fresh - that is b/c we wanted to make sure we have the Ack's from the ARM folks - which with the string of back-to-back conferences took a bit of time. Rest assured - the code has been stewing in #linux-next for some time. Features: - SWIOTLB has tracing added when doing bounce buffer. - Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to safely program real devices for DMA operations when running as a guest on Xen on ARM, without IOMMU support. [*1] - xen_raw_printk works with PVHVM guests if needed. Bug-fixes: - Make memory ballooning work under HVM with large MMIO region. - Inform hypervisor of MCFG regions found in ACPI DSDT. - Remove deprecated IRQF_DISABLED. - Remove deprecated __cpuinit. [*1]: "On arm and arm64 all Xen guests, including dom0, run with second stage translation enabled. As a consequence when dom0 programs a device for a DMA operation is going to use (pseudo) physical addresses instead machine addresses. This work introduces two trees to track physical to machine and machine to physical mappings of foreign pages. Local pages are assumed mapped 1:1 (physical address == machine address). It enables the SWIOTLB-Xen driver on ARM and ARM64, so that Linux can translate physical addresses to machine addresses for dma operations when necessary. " (Stefano)" * tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (32 commits) xen/arm: pfn_to_mfn and mfn_to_pfn return the argument if nothing is in the p2m arm,arm64/include/asm/io.h: define struct bio_vec swiotlb-xen: missing include dma-direction.h pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI arm: make SWIOTLB available xen: delete new instances of added __cpuinit xen/balloon: Set balloon's initial state to number of existing RAM pages xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas. xen: remove deprecated IRQF_DISABLED x86/xen: remove deprecated IRQF_DISABLED swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary grant-table: call set_phys_to_machine after mapping grant refs arm,arm64: do not always merge biovec if we are running on Xen swiotlb: print a warning when the swiotlb is full swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device tracing/events: Fix swiotlb tracepoint creation swiotlb-xen: use xen_alloc/free_coherent_pages xen: introduce xen_alloc/free_coherent_pages ...
2013-11-15mm: dynamically allocate page->ptl if it cannot be embedded to struct pageKirill A. Shutemov
If split page table lock is in use, we embed the lock into struct page of table's page. We have to disable split lock, if spinlock_t is too big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC enabled. This patch add support for dynamic allocation of split page table lock if we can't embed it to struct page. page->ptl is unsigned long now and we use it as spinlock_t if sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t. The spinlock_t allocated in pgtable_page_ctor() for PTE table and in pgtable_pmd_page_ctor() for PMD table. All other helpers converted to support dynamically allocated page->ptl. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15mm: rename USE_SPLIT_PTLOCKS to USE_SPLIT_PTE_PTLOCKSKirill A. Shutemov
We're going to introduce split page table lock for PMD level. Let's rename existing split ptlock for PTE level to avoid confusion. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>