summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2024-03-31Merge tag 'perf_urgent_for_v6.9_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Borislav Petkov: - Define the correct set of default hw events on AMD Zen4 - Use the correct stalled cycles PMCs on AMD Zen2 and newer - Fix detection of the LBR freeze feature on AMD * tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later perf/x86/amd/lbr: Use freeze based on availability x86/cpufeatures: Add new word for scattered features
2024-03-31Merge tag 'x86_urgent_for_v6.9_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure single object builds in arch/x86/virt/ ala make ... arch/x86/virt/vmx/tdx/seamcall.o work again - Do not do ROM range scans and memory validation when the kernel is running as a SEV-SNP guest as those can get problematic and, before that, are not really needed in such a guest - Exclude the build-time generated vdso-image-x32.o object from objtool validation and in particular the return sites in there due to a warning which fires when an unpatched return thunk is being used - Improve the NMI CPUs stall message to show additional information about the state of each CPU wrt the NMI handler - Enable gcc named address spaces support only on !KCSAN configs due to compiler options incompatibility - Revert a change which was trying to use GB pages for mapping regions only when the regions would be large enough but that change lead to kexec failing - A documentation fixlet * tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Use obj-y to descend into arch/x86/virt/ x86/sev: Skip ROM range scans and validation for SEV-SNP guests x86/vdso: Fix rethunk patching for vdso-image-x32.o too x86/nmi: Upgrade NMI backtrace stall checks & messages x86/percpu: Disable named address spaces for KCSAN Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped." Documentation/x86: Fix title underline length
2024-03-30x86/build: Use obj-y to descend into arch/x86/virt/Masahiro Yamada
Commit c33621b4c5ad ("x86/virt/tdx: Wire up basic SEAMCALL functions") introduced a new instance of core-y instead of the standardized obj-y syntax. X86 Makefiles descend into subdirectories of arch/x86/virt inconsistently; into arch/x86/virt/ via core-y defined in arch/x86/Makefile, but into arch/x86/virt/svm/ via obj-y defined in arch/x86/Kbuild. This is problematic when you build a single object in parallel because multiple threads attempt to build the same file. $ make -j$(nproc) arch/x86/virt/vmx/tdx/seamcall.o [ snip ] AS arch/x86/virt/vmx/tdx/seamcall.o AS arch/x86/virt/vmx/tdx/seamcall.o fixdep: error opening file: arch/x86/virt/vmx/tdx/.seamcall.o.d: No such file or directory make[4]: *** [scripts/Makefile.build:362: arch/x86/virt/vmx/tdx/seamcall.o] Error 2 Use the obj-y syntax, as it works correctly. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240330060554.18524-1-masahiroy@kernel.org
2024-03-29x86/bugs: Fix the SRSO mitigation on Zen3/4Borislav Petkov (AMD)
The original version of the mitigation would patch in the calls to the untraining routines directly. That is, the alternative() in UNTRAIN_RET will patch in the CALL to srso_alias_untrain_ret() directly. However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain mess") meant well in trying to clean up the situation, due to micro- architectural reasons, the untraining routine srso_alias_untrain_ret() must be the target of a CALL instruction and not of a JMP instruction as it is done now. Reshuffle the alternative macros to accomplish that. Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-29Merge tag 'efi-fixes-for-v6.9-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "These address all the outstanding EFI/x86 boot related regressions: - Revert to the old initrd memory allocation soft limit of INT_MAX, which was dropped inadvertently - Ensure that startup_32() is entered with a valid boot_params pointer when using the new EFI mixed mode protocol - Fix a compiler warning introduced by a fix from the previous pull" * tag 'efi-fixes-for-v6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Reinstate soft limit for initrd loading efi/libstub: Cast away type warning in use of max() x86/efistub: Add missing boot_params for mixed mode compat entry
2024-03-26crash: use macro to add crashk_res into iomem early for specific archBaoquan He
There are regression reports[1][2] that crashkernel region on x86_64 can't be added into iomem tree sometime. This causes the later failure of kdump loading. This happened after commit 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") was merged. Even though, these reported issues are proved to be related to other component, they are just exposed after above commmit applied, I still would like to keep crashk_res and crashk_low_res being added into iomem early as before because the early adding has been always there on x86_64 and working very well. For safety of kdump, Let's change it back. Here, add a macro HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY to limit that only ARCH defining the macro can have the early adding crashk_res/_low_res into iomem. Then define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY on x86 to enable it. Note: In reserve_crashkernel_low(), there's a remnant of crashk_low_res handling which was mistakenly added back in commit 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c"). [1] [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data https://lore.kernel.org/all/Zfv8iCL6CT2JqLIC@darkstar.users.ipa.redhat.com/T/#u [2] Question about Address Range Validation in Crash Kernel Allocation https://lore.kernel.org/all/4eeac1f733584855965a2ea62fa4da58@huawei.com/T/#u Link: https://lkml.kernel.org/r/ZgDYemRQ2jxjLkq+@MiWiFi-R3L-srv Fixes: 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Li Huafei <lihuafei1@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26x86/sev: Skip ROM range scans and validation for SEV-SNP guestsKevin Loughlin
SEV-SNP requires encrypted memory to be validated before access. Because the ROM memory range is not part of the e820 table, it is not pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes to access this range, the guest must first validate the range. The current SEV-SNP code does indeed scan the ROM range during early boot and thus attempts to validate the ROM range in probe_roms(). However, this behavior is neither sufficient nor necessary for the following reasons: * With regards to sufficiency, if EFI_CONFIG_TABLES are not enabled and CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which falls in the ROM range) prior to validation. For example, Project Oak Stage 0 provides a minimal guest firmware that currently meets these configuration conditions, meaning guests booting atop Oak Stage 0 firmware encounter a problematic call chain during dmi_setup() -> dmi_scan_machine() that results in a crash during boot if SEV-SNP is enabled. * With regards to necessity, SEV-SNP guests generally read garbage (which changes across boots) from the ROM range, meaning these scans are unnecessary. The guest reads garbage because the legacy ROM range is unencrypted data but is accessed via an encrypted PMD during early boot (where the PMD is marked as encrypted due to potentially mapping actually-encrypted data in other PMD-contained ranges). In one exceptional case, EISA probing treats the ROM range as unencrypted data, which is inconsistent with other probing. Continuing to allow SEV-SNP guests to use garbage and to inconsistently classify ROM range encryption status can trigger undesirable behavior. For instance, if garbage bytes appear to be a valid signature, memory may be unnecessarily reserved for the ROM range. Future code or other use cases may result in more problematic (arbitrary) behavior that should be avoided. While one solution would be to overhaul the early PMD mapping to always treat the ROM region of the PMD as unencrypted, SEV-SNP guests do not currently rely on data from the ROM region during early boot (and even if they did, they would be mostly relying on garbage data anyways). As a simpler solution, skip the ROM range scans (and the otherwise- necessary range validation) during SEV-SNP guest early boot. The potential SEV-SNP guest crash due to lack of ROM range validation is thus avoided by simply not accessing the ROM range. In most cases, skip the scans by overriding problematic x86_init functions during sme_early_init() to SNP-safe variants, which can be likened to x86_init overrides done for other platforms (ex: Xen); such overrides also avoid the spread of cc_platform_has() checks throughout the tree. In the exceptional EISA case, still use cc_platform_has() for the simplest change, given (1) checks for guest type (ex: Xen domain status) are already performed here, and (2) these checks occur in a subsys initcall instead of an x86_init function. [ bp: Massage commit message, remove "we"s. ] Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active") Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20240313121546.2964854-1-kevinloughlin@google.com
2024-03-26x86/vdso: Fix rethunk patching for vdso-image-x32.o tooBorislav Petkov (AMD)
In a similar fashion to b388e57d4628 ("x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o") annotate vdso-image-x32.o too for objtool so that it gets annotated properly and the unused return thunk warning doesn't fire. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202403251454.23df6278-lkp@intel.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/202403251454.23df6278-lkp@intel.com
2024-03-26x86/nmi: Upgrade NMI backtrace stall checks & messagesPaul E. McKenney
The commit to improve NMI stall debuggability: 344da544f177 ("x86/nmi: Print reasons why backtrace NMIs are ignored") ... has shown value, but widespread use has also identified a few opportunities for improvement. The systems have (as usual) shown far more creativity than that commit's author, demonstrating yet again that failing CPUs can do whatever they want. In addition, the current message format is less friendly than one might like to those attempting to use these messages to identify failing CPUs. Therefore, separately flag CPUs that, during the full time that the stack-backtrace request was waiting, were always in an NMI handler, were never in an NMI handler, or exited one NMI handler. Also, split the message identifying the CPU and the time since that CPU's last NMI-related activity so that a single line identifies the CPU without any other variable information, greatly reducing the processing overhead required to identify repeat-offender CPUs. Co-developed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/ab4d70c8-c874-42dc-b206-643018922393@paulmck-laptop
2024-03-26perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and laterSandipan Das
Add the "ref-cycles" event for AMD processors based on Zen 4 and later microarchitectures. The backing event is based on PMCx120 which counts cycles not in halt state in P0 frequency (same as MPERF). Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/089155f19f7c7e65aeb1caa727a882e2ca9b8b04.1711352180.git.sandipan.das@amd.com
2024-03-26perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and laterSandipan Das
AMD processors based on Zen 2 and later microarchitectures do not support PMCx087 (instruction pipe stalls) which is used as the backing event for "stalled-cycles-frontend" and "stalled-cycles-backend". Use PMCx0A9 (cycles where micro-op queue is empty) instead to count frontend stalls and remove the entry for backend stalls since there is no direct replacement. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h") Link: https://lore.kernel.org/r/03d7fc8fa2a28f9be732116009025bdec1b3ec97.1711352180.git.sandipan.das@amd.com
2024-03-26x86/efistub: Add missing boot_params for mixed mode compat entryArd Biesheuvel
The pure EFI stub entry point does not take a struct boot_params from the boot loader, but creates it from scratch, and populates only the fields that still have meaning in this context (command line, initrd base and size, etc) The original mixed mode implementation used the EFI handover protocol instead, where the boot loader (i.e., GRUB) populates a boot_params struct and passes it to a special Linux specific EFI entry point that takes the boot_params pointer as its third argument. When the new mixed mode implementation was introduced, using a special 32-bit PE entrypoint in the 64-bit kernel, it adopted the pure approach, and relied on the EFI stub to create the struct boot_params. This is preferred because it makes the bootloader side much easier to implement, as it does not need any x86-specific knowledge on how struct boot_params and struct setup_header are put together. This mixed mode implementation was adopted by systemd-boot version 252 and later. When commit e2ab9eab324c ("x86/boot/compressed: Move 32-bit entrypoint code into .text section") refactored this code and moved it out of head_64.S, the fact that ESI was populated with the address of the base of the image was overlooked, and to simplify the code flow, ESI is now zeroed and stored to memory unconditionally in shared code, so that the NULL-ness of that variable can still be used later to determine which mixed mode boot protocol is in use. With ESI pointing to the base of the image, it can serve as a struct boot_params pointer for startup_32(), which only accesses the init_data and kernel_alignment fields (and the scratch field as a temporary stack). Zeroing ESI means that those accesses produce garbage now, even though things appear to work if the first page of memory happens to be zeroed, and the region right before LOAD_PHYSICAL_ADDR (== 16 MiB) happens to be free. The solution is to pass a special, temporary struct boot_params to startup_32() via ESI, one that is sufficient for getting it to create the page tables correctly and is discarded right after. This involves setting a minimal alignment of 4k, only to get the statically allocated page tables line up correctly, and setting init_size to the executable image size (_end - startup_32). This ensures that the page tables are covered by the static footprint of the PE image. Given that EFI boot no longer calls the decompressor and no longer pads the image to permit the decompressor to execute in place, the same temporary struct boot_params should be used in the EFI handover protocol based mixed mode implementation as well, to prevent the page tables from being placed outside of allocated memory. Fixes: e2ab9eab324c ("x86/boot/compressed: Move 32-bit entrypoint code into .text section") Cc: <stable@kernel.org> # v6.1+ Closes: https://lore.kernel.org/all/20240321150510.GI8211@craftyguy.net/ Reported-by: Clayton Craft <clayton@craftyguy.net> Tested-by: Clayton Craft <clayton@craftyguy.net> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-25x86/percpu: Disable named address spaces for KCSANUros Bizjak
-fsanitize=thread (KCSAN) is at the moment incompatible with named address spaces in a similar way as KASAN - see GCC PR sanitizer/111736: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736 The patch disables named address spaces with KCSAN. Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240325110128.615933-1-ubizjak@gmail.com
2024-03-25Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."Ingo Molnar
This reverts commit d794734c9bbfe22f86686dc2909c25f5ffe1a572. While the original change tries to fix a bug, it also unintentionally broke existing systems, see the regressions reported at: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/ Since d794734c9bbf was also marked for -stable, let's back it out before causing more damage. Note that due to another upstream change the revert was not 100% automatic: 0a845e0f6348 mm/treewide: replace pud_large() with pud_leaf() Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Russ Anderson <rja@hpe.com> Cc: Steve Wahl <steve.wahl@hpe.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/ Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
2024-03-25perf/x86/amd/lbr: Use freeze based on availabilitySandipan Das
Currently, the LBR code assumes that LBR Freeze is supported on all processors when X86_FEATURE_AMD_LBR_V2 is available i.e. CPUID leaf 0x80000022[EAX] bit 1 is set. This is incorrect as the availability of the feature is additionally dependent on CPUID leaf 0x80000022[EAX] bit 2 being set, which may not be set for all Zen 4 processors. Define a new feature bit for LBR and PMC freeze and set the freeze enable bit (FLBRI) in DebugCtl (MSR 0x1d9) conditionally. It should still be possible to use LBR without freeze for profile-guided optimization of user programs by using an user-only branch filter during profiling. When the user-only filter is enabled, branches are no longer recorded after the transition to CPL 0 upon PMI arrival. When branch entries are read in the PMI handler, the branch stack does not change. E.g. $ perf record -j any,u -e ex_ret_brn_tkn ./workload Since the feature bit is visible under flags in /proc/cpuinfo, it can be used to determine the feasibility of use-cases which require LBR Freeze to be supported by the hardware such as profile-guided optimization of kernels. Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/69a453c97cfd11c6f2584b19f937fe6df741510f.1711091584.git.sandipan.das@amd.com
2024-03-25x86/cpufeatures: Add new word for scattered featuresSandipan Das
Add a new word for scattered features because all free bits among the existing Linux-defined auxiliary flags have been exhausted. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/8380d2a0da469a1f0ad75b8954a79fb689599ff6.1711091584.git.sandipan.das@amd.com
2024-03-24Merge tag 'efi-fixes-for-v6.9-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Fix logic that is supposed to prevent placement of the kernel image below LOAD_PHYSICAL_ADDR - Use the firmware stack in the EFI stub when running in mixed mode - Clear BSS only once when using mixed mode - Check efi.get_variable() function pointer for NULL before trying to call it * tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: fix panic in kdump kernel x86/efistub: Don't clear BSS twice in mixed mode x86/efistub: Call mixed mode boot services on the firmware's stack efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
2024-03-24Merge tag 'x86-urgent-2024-03-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Ensure that the encryption mask at boot is properly propagated on 5-level page tables, otherwise the PGD entry is incorrectly set to non-encrypted, which causes system crashes during boot. - Undo the deferred 5-level page table setup as it cannot work with memory encryption enabled. - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset to the default value but the cached variable is not, so subsequent comparisons might yield the wrong result and as a consequence the result prevents updating the MSR. - Register the local APIC address only once in the MPPARSE enumeration to prevent triggering the related WARN_ONs() in the APIC and topology code. - Handle the case where no APIC is found gracefully by registering a fake APIC in the topology code. That makes all related topology functions work correctly and does not affect the actual APIC driver code at all. - Don't evaluate logical IDs during early boot as the local APIC IDs are not yet enumerated and the invoked function returns an error code. Nothing requires the logical IDs before the final CPUID enumeration takes place, which happens after the enumeration. - Cure the fallout of the per CPU rework on UP which misplaced the copying of boot_cpu_data to per CPU data so that the final update to boot_cpu_data got lost which caused inconsistent state and boot crashes. - Use copy_from_kernel_nofault() in the kprobes setup as there is no guarantee that the address can be safely accessed. - Reorder struct members in struct saved_context to work around another kmemleak false positive - Remove the buggy code which tries to update the E820 kexec table for setup_data as that is never passed to the kexec kernel. - Update the resource control documentation to use the proper units. - Fix a Kconfig warning observed with tinyconfig * tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Move 5-level paging global variable assignments back x86/boot/64: Apply encryption mask to 5-level pagetable update x86/cpu: Add model number for another Intel Arrow Lake mobile processor x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD Documentation/x86: Document that resctrl bandwidth control units are MiB x86/mpparse: Register APIC address only once x86/topology: Handle the !APIC case gracefully x86/topology: Don't evaluate logical IDs during early boot x86/cpu: Ensure that CPU info updates are propagated on UP kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address x86/pm: Work around false positive kmemleak report in msr_build_context() x86/kexec: Do not update E820 kexec table for setup_data x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
2024-03-24x86/efistub: Call mixed mode boot services on the firmware's stackArd Biesheuvel
Normally, the EFI stub calls into the EFI boot services using the stack that was live when the stub was entered. According to the UEFI spec, this stack needs to be at least 128k in size - this might seem large but all asynchronous processing and event handling in EFI runs from the same stack and so quite a lot of space may be used in practice. In mixed mode, the situation is a bit different: the bootloader calls the 32-bit EFI stub entry point, which calls the decompressor's 32-bit entry point, where the boot stack is set up, using a fixed allocation of 16k. This stack is still in use when the EFI stub is started in 64-bit mode, and so all calls back into the EFI firmware will be using the decompressor's limited boot stack. Due to the placement of the boot stack right after the boot heap, any stack overruns have gone unnoticed. However, commit 5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code") moved the definition of the boot heap into C code, and now the boot stack is placed right at the base of BSS, where any overruns will corrupt the end of the .data section. While it would be possible to work around this by increasing the size of the boot stack, doing so would affect all x86 systems, and mixed mode systems are a tiny (and shrinking) fraction of the x86 installed base. So instead, record the firmware stack pointer value when entering from the 32-bit firmware, and switch to this stack every time a EFI boot service call is made. Cc: <stable@kernel.org> # v6.1+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24x86/boot/64: Move 5-level paging global variable assignments backTom Lendacky
Commit 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") moved assignment of 5-level global variables to later in the boot in order to avoid having to use RIP relative addressing in order to set them. However, when running with 5-level paging and SME active (mem_encrypt=on), the variables are needed as part of the page table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(), etc.). Since the variables haven't been set, the page table manipulation is done as if 4-level paging is active, causing the system to crash on boot. While only a subset of the assignments that were moved need to be set early, move all of the assignments back into check_la57_support() so that these assignments aren't spread between two locations. Instead of just reverting the fix, this uses the new RIP_REL_REF() macro when assigning the variables. Fixes: 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
2024-03-24x86/boot/64: Apply encryption mask to 5-level pagetable updateTom Lendacky
When running with 5-level page tables, the kernel mapping PGD entry is updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC, which, when SME is active (mem_encrypt=on), results in a page table entry without the encryption mask set, causing the system to crash on boot. Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so that the encryption mask is set for the PGD entry. Fixes: 533568e06b15 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
2024-03-24x86/cpu: Add model number for another Intel Arrow Lake mobile processorTony Luck
This one is the regular laptop CPU. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
2024-03-24x86/fpu: Keep xfd_state in sync with MSR_IA32_XFDAdamos Ttofari
Commit 672365477ae8 ("x86/fpu: Update XFD state where required") and commit 8bf26758ca96 ("x86/fpu: Add XFD state to fpstate") introduced a per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in order to avoid unnecessary writes to the MSR. On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which wipes out any stale state. But the per CPU cached xfd value is not reset, which brings them out of sync. As a consequence a subsequent xfd_update_state() might fail to update the MSR which in turn can result in XRSTOR raising a #NM in kernel space, which crashes the kernel. To fix this, introduce xfd_set_state() to write xfd_state together with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD. Fixes: 672365477ae8 ("x86/fpu: Update XFD state where required") Signed-off-by: Adamos Ttofari <attofari@amazon.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
2024-03-23x86/mpparse: Register APIC address only onceThomas Gleixner
The APIC address is registered twice. First during the early detection and afterwards when actually scanning the table for APIC IDs. The APIC and topology core warn about the second attempt. Restrict it to the early detection call. Fixes: 81287ad65da5 ("x86/apic: Sanitize APIC address setup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
2024-03-23x86/topology: Handle the !APIC case gracefullyThomas Gleixner
If there is no local APIC enumerated and registered then the topology bitmaps are empty. Therefore, topology_init_possible_cpus() will die with a division by zero exception. Prevent this by registering a fake APIC id to populate the topology bitmap. This also allows to use all topology query interfaces unconditionally. It does not affect the actual APIC code because either the local APIC address was not registered or no local APIC could be detected. Fixes: f1f758a80516 ("x86/topology: Add a mechanism to track topology via APIC IDs") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
2024-03-23x86/topology: Don't evaluate logical IDs during early bootThomas Gleixner
The local APICs have not yet been enumerated so the logical ID evaluation from the topology bitmaps does not work and would return an error code. Skip the evaluation during the early boot CPUID evaluation and only apply it on the final run. Fixes: 380414be78bf ("x86/cpu/topology: Use topology logical mapping mechanism") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
2024-03-23x86/cpu: Ensure that CPU info updates are propagated on UPThomas Gleixner
The boot sequence evaluates CPUID information twice: 1) During early boot 2) When finalizing the early setup right before mitigations are selected and alternatives are patched. In both cases the evaluation is stored in boot_cpu_data, but on UP the copying of boot_cpu_data to the per CPU info of the boot CPU happens between #1 and #2. So any update which happens in #2 is never propagated to the per CPU info instance. Consolidate the whole logic and copy boot_cpu_data right before applying alternatives as that's the point where boot_cpu_data is in it's final state and not supposed to change anymore. This also removes the voodoo mb() from smp_prepare_cpus_common() which had absolutely no purpose. Fixes: 71eb4893cfaf ("x86/percpu: Cure per CPU madness on UP") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
2024-03-22kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe addressMasami Hiramatsu (Google)
Read from an unsafe address with copy_from_kernel_nofault() in arch_adjust_kprobe_addr() because this function is used before checking the address is in text or not. Syzcaller bot found a bug and reported the case if user specifies inaccessible data area, arch_adjust_kprobe_addr() will cause a kernel panic. [ mingo: Clarified the comment. ] Fixes: cc66bb914578 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes") Reported-by: Qiang Zhang <zzqq0103.hey@gmail.com> Tested-by: Jinghao Jia <jinghao7@illinois.edu> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/171042945004.154897.2221804961882915806.stgit@devnote2
2024-03-22x86/pm: Work around false positive kmemleak report in msr_build_context()Anton Altaparmakov
Since: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") kmemleak reports this issue: unreferenced object 0xf68241e0 (size 32): comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s) hex dump (first 32 bytes): 00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00 ....)........... 00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc .B.............. backtrace: [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260 [<ea65e13b>] __kmalloc+0x54/0x160 [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100 [<46635aff>] pm_check_save_msr+0x63/0x80 [<6b6bb938>] do_one_initcall+0x41/0x1f0 [<3f3add60>] kernel_init_freeable+0x199/0x1e8 [<3b538fde>] kernel_init+0x1a/0x110 [<938ae2b2>] ret_from_fork+0x1c/0x28 Which is a false positive. Reproducer: - Run rsync of whole kernel tree (multiple times if needed). - start a kmemleak scan - Note this is just an example: a lot of our internal tests hit these. The root cause is similar to the fix in: b0b592cf0836 x86/pm: Fix false positive kmemleak report in msr_build_context() ie. the alignment within the packed struct saved_context which has everything unaligned as there is only "u16 gs;" at start of struct where in the past there were four u16 there thus aligning everything afterwards. The issue is with the fact that Kmemleak only searches for pointers that are aligned (see how pointers are scanned in kmemleak.c) so when the struct members are not aligned it doesn't see them. Testing: We run a lot of tests with our CI, and after applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above report. From a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, which is quite a lot. With this fix applied we get zero kmemleak related failures. Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240314142656.17699-1-anton@tuxera.com
2024-03-22x86/kexec: Do not update E820 kexec table for setup_dataDave Young
crashkernel reservation failed on a Thinkpad t440s laptop recently. Actually the memblock reservation succeeded, but later insert_resource() failed. Test steps: kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */ kexec reboot -> dmesg|grep "crashkernel reserved"; crashkernel memory range like below reserved successfully: 0x00000000d0000000 - 0x00000000da000000 But no such "Crash kernel" region in /proc/iomem The background story: Currently the E820 code reserves setup_data regions for both the current kernel and the kexec kernel, and it inserts them into the resources list. Before the kexec kernel reboots nobody passes the old setup_data, and kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed. Thus the old setup data memory is not used at all. Due to old kernel updates the kexec e820 table as well so kexec kernel sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data regions are inserted into resources list in the kexec kernel by e820__reserve_resources(). Note, due to no setup_data is passed in for those old regions they are not early reserved (by function early_reserve_memory), and the crashkernel memblock reservation will just treat them as usable memory and it could reserve the crashkernel region which overlaps with the old setup_data regions. And just like the bug I noticed here, kdump insert_resource failed because e820__reserve_resources has added the overlapped chunks in /proc/iomem already. Finally, looking at the code, the old setup_data regions are not used at all as no setup_data is passed in by the kexec boot loader. Although something like SETUP_PCI etc could be needed, kexec should pass the info as new setup_data so that kexec kernel can take care of them. This should be taken care of in other separate patches if needed. Thus drop the useless buggy code here. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Eric DeVolder <eric.devolder@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/Zf0T3HCG-790K-pZ@darkstar.users.ipa.redhat.com
2024-03-21Merge tag 'kbuild-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Generate a list of built DTB files (arch/*/boot/dts/dtbs-list) - Use more threads when building Debian packages in parallel - Fix warnings shown during the RPM kernel package uninstallation - Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to Makefile - Support GCC's -fmin-function-alignment flag - Fix a null pointer dereference bug in modpost - Add the DTB support to the RPM package - Various fixes and cleanups in Kconfig * tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits) kconfig: tests: test dependency after shuffling choices kconfig: tests: add a test for randconfig with dependent choices kconfig: tests: support KCONFIG_SEED for the randconfig runner kbuild: rpm-pkg: add dtb files in kernel rpm kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig() kconfig: check prompt for choice while parsing kconfig: lxdialog: remove unused dialog colors kconfig: lxdialog: fix button color for blackbg theme modpost: fix null pointer dereference kbuild: remove GCC's default -Wpacked-bitfield-compat flag kbuild: unexport abs_srctree and abs_objtree kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1 kconfig: remove named choice support kconfig: use linked list in get_symbol_str() to iterate over menus kconfig: link menus to a symbol kbuild: fix inconsistent indentation in top Makefile kbuild: Use -fmin-function-alignment when available alpha: merge two entries for CONFIG_ALPHA_GAMMA alpha: merge two entries for CONFIG_ALPHA_EV4 kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj) ...
2024-03-21Merge tag 'char-misc-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc and other driver subsystem updates from Greg KH: "Here is the big set of char/misc and a number of other driver subsystem updates for 6.9-rc1. Included in here are: - IIO driver updates, loads of new ones and evolution of existing ones - coresight driver updates - const cleanups for many driver subsystems - speakup driver additions - platform remove callback void cleanups - mei driver updates - mhi driver updates - cdx driver updates for MSI interrupt handling - nvmem driver updates - other smaller driver updates and cleanups, full details in the shortlog All of these have been in linux-next for a long time with no reported issue, other than a build warning for the speakup driver" The build warning hits clang and is a gcc (and C23) extension, and is fixed up in the merge. Link: https://lore.kernel.org/all/20240321134831.GA2762840@dev-arch.thelio-3990X/ * tag 'char-misc-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (279 commits) binder: remove redundant variable page_addr uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion uio_pruss: UIO_MEM_DMA_COHERENT conversion cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT uio: introduce UIO_MEM_DMA_COHERENT type cdx: add MSI support for CDX bus pps: use cflags-y instead of EXTRA_CFLAGS speakup: Add /dev/synthu device speakup: Fix 8bit characters from direct synth parport: sunbpp: Convert to platform remove callback returning void parport: amiga: Convert to platform remove callback returning void char: xillybus: Convert to platform remove callback returning void vmw_balloon: change maintainership MAINTAINERS: change the maintainer for hpilo driver char: xilinx_hwicap: Fix NULL vs IS_ERR() bug hpet: remove hpets::hp_clocksource platform: goldfish: move the separate 'default' propery for CONFIG_GOLDFISH char: xilinx_hwicap: drop casting to void in dev_set_drvdata greybus: move is_gb_* functions out of greybus.h greybus: Remove usage of the deprecated ida_simple_xx() API ...
2024-03-21Merge tag 'hyperv-next-signed-20240320' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Use Hyper-V entropy to seed guest random number generator (Michael Kelley) - Convert to platform remove callback returning void for vmbus (Uwe Kleine-König) - Introduce hv_get_hypervisor_version function (Nuno Das Neves) - Rename some HV_REGISTER_* defines for consistency (Nuno Das Neves) - Change prefix of generic HV_REGISTER_* MSRs to HV_MSR_* (Nuno Das Neves) - Cosmetic changes for hv_spinlock.c (Purna Pavan Chandra Aekkaladevi) - Use per cpu initial stack for vtl context (Saurabh Sengar) * tag 'hyperv-next-signed-20240320' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Use Hyper-V entropy to seed guest random number generator x86/hyperv: Cosmetic changes for hv_spinlock.c hyperv-tlfs: Rename some HV_REGISTER_* defines for consistency hv: vmbus: Convert to platform remove callback returning void mshyperv: Introduce hv_get_hypervisor_version function x86/hyperv: Use per cpu initial stack for vtl context hyperv-tlfs: Change prefix of generic HV_REGISTER_* MSRs to HV_MSR_*
2024-03-21x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'Masahiro Yamada
Kconfig emits a warning for the following command: $ make ARCH=x86_64 tinyconfig ... .config:1380:warning: override: UNWINDER_GUESS changes choice state When X86_64=y, the unwinder is exclusively selected from the following three options: - UNWINDER_ORC - UNWINDER_FRAME_POINTER - UNWINDER_GUESS However, arch/x86/configs/tiny.config only specifies the values of the last two. UNWINDER_ORC must be explicitly disabled. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240320154313.612342-1-masahiroy@kernel.org
2024-03-19Merge tag 'for-linus-6.9-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - Xen event channel handling fix for a regression with a rare kernel config and some added hardening - better support of running Xen dom0 in PVH mode - a cleanup for the xen grant-dma-iommu driver * tag 'for-linus-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: increment refcnt only if event channel is refcounted xen/evtchn: avoid WARN() when unbinding an event channel x86/xen: attempt to inflate the memory balloon on PVH xen/grant-dma-iommu: Convert to platform remove callback returning void
2024-03-18x86/hyperv: Use Hyper-V entropy to seed guest random number generatorMichael Kelley
A Hyper-V host provides its guest VMs with entropy in a custom ACPI table named "OEM0". The entropy bits are updated each time Hyper-V boots the VM, and are suitable for seeding the Linux guest random number generator (rng). See a brief description of OEM0 in [1]. Generation 2 VMs on Hyper-V use UEFI to boot. Existing EFI code in Linux seeds the rng with entropy bits from the EFI_RNG_PROTOCOL. Via this path, the rng is seeded very early during boot with good entropy. The ACPI OEM0 table provided in such VMs is an additional source of entropy. Generation 1 VMs on Hyper-V boot from BIOS. For these VMs, Linux doesn't currently get any entropy from the Hyper-V host. While this is not fundamentally broken because Linux can generate its own entropy, using the Hyper-V host provided entropy would get the rng off to a better start and would do so earlier in the boot process. Improve the rng seeding for Generation 1 VMs by having Hyper-V specific code in Linux take advantage of the OEM0 table to seed the rng. For Generation 2 VMs, use the OEM0 table to provide additional entropy beyond the EFI_RNG_PROTOCOL. Because the OEM0 table is custom to Hyper-V, parse it directly in the Hyper-V code in the Linux kernel and use add_bootloader_randomness() to add it to the rng. Once the entropy bits are read from OEM0, zero them out in the table so they don't appear in /sys/firmware/acpi/tables/OEM0 in the running VM. The zero'ing is done out of an abundance of caution to avoid potential security risks to the rng. Also set the OEM0 data length to zero so a kexec or other subsequent use of the table won't try to use the zero'ed bits. [1] https://download.microsoft.com/download/1/c/9/1c9813b8-089c-4fef-b2ad-ad80e79403ba/Whitepaper%20-%20The%20Windows%2010%20random%20number%20generation%20infrastructure.pdf Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20240318155408.216851-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240318155408.216851-1-mhklinux@outlook.com>
2024-03-18x86/hyperv: Cosmetic changes for hv_spinlock.cPurna Pavan Chandra Aekkaladevi
Fix issues reported by checkpatch.pl script for hv_spinlock.c file. - Place __initdata after variable name - Add missing blank line after enum declaration No functional changes intended. Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/1710763751-14137-1-git-send-email-paekkaladevi@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1710763751-14137-1-git-send-email-paekkaladevi@linux.microsoft.com>
2024-03-18hyperv-tlfs: Rename some HV_REGISTER_* defines for consistencyNuno Das Neves
Rename HV_REGISTER_GUEST_OSID to HV_REGISTER_GUEST_OS_ID. This matches the existing HV_X64_MSR_GUEST_OS_ID. Rename HV_REGISTER_CRASH_* to HV_REGISTER_GUEST_CRASH_*. Including GUEST_ is consistent with other #defines such as HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE. The new names also match the TLFS document more accurately, i.e. HvRegisterGuestCrash*. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Link: https://lore.kernel.org/r/1710285687-9160-1-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1710285687-9160-1-git-send-email-nunodasneves@linux.microsoft.com>
2024-03-17Merge tag 'perf-urgent-2024-03-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event fixes from Ingo Molnar: - Work around AMD erratum to filter out bogus LBR stack entries - Fix incorrect PMU reset that can result in warnings (or worse) during suspend/hibernation * tag 'perf-urgent-2024-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/core: Avoid register reset when CPU is dead perf/x86/amd/lbr: Discard erroneous branch entries
2024-03-16x86/CPU/AMD: Update the Zenbleed microcode revisionsBorislav Petkov (AMD)
Update them to the correct revision numbers. Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "S390: - Changes to FPU handling came in via the main s390 pull request - Only deliver to the guest the SCLP events that userspace has requested - More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same) - Fix selftests undefined behavior x86: - Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec - Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests) - Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized - Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot - Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels - Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives - Fix the debugregs ABI for 32-bit KVM - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work - Cleanup the logic for checking if the currently loaded vCPU is in-kernel - Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel x86 Xen emulation: - Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior) - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs RISC-V: - Support exception and interrupt handling in selftests - New self test for RISC-V architectural timer (Sstc extension) - New extension support (Ztso, Zacas) - Support userspace emulation of random number seed CSRs ARM: - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: - Set reserved bits as zero in CPUCFG - Start SW timer only when vcpu is blocking - Do not restart SW timer when it is expired - Remove unnecessary CSR register saving during enter guest - Misc cleanups and fixes as usual Generic: - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else - Factor common "select" statements in common code instead of requiring each architecture to specify it - Remove thoroughly obsolete APIs from the uapi headers - Move architecture-dependent stuff to uapi/asm/kvm.h - Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded - Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker Selftests: - Reduce boilerplate especially when utilize selftest TAP infrastructure - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory - Fix benign bugs where tests neglect to close() guest_memfd files" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) selftests: kvm: remove meaningless assignments in Makefiles KVM: riscv: selftests: Add Zacas extension to get-reg-list test RISC-V: KVM: Allow Zacas extension for Guest/VM KVM: riscv: selftests: Add Ztso extension to get-reg-list test RISC-V: KVM: Allow Ztso extension for Guest/VM RISC-V: KVM: Forward SEED CSR access to user space KVM: riscv: selftests: Add sstc timer test KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Add exception handling support LoongArch: KVM: Remove unnecessary CSR register saving during enter guest LoongArch: KVM: Do not restart SW timer when it is expired LoongArch: KVM: Start SW timer only when vcpu is blocking LoongArch: KVM: Set reserved bits as zero in CPUCFG KVM: selftests: Explicitly close guest_memfd files in some gmem tests KVM: x86/xen: fix recursive deadlock in timer injection KVM: pfncache: simplify locking and make more self-contained KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled KVM: x86/xen: improve accuracy of Xen timers ...
2024-03-15Merge tag 'devicetree-for-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DT core: - Add cleanup.h based auto release of struct device_node pointers via __free marking and new for_each_child_of_node_scoped() iterator to use it. - Always create a base skeleton DT when CONFIG_OF is enabled. This supports several usecases of adding DT data on non-DT booted systems. - Move around some /reserved-memory code in preparation for further improvements - Add a stub for_each_property_of_node() for !OF - Adjust the printk levels on some messages - Fix __be32 sparse warning - Drop RESERVEDMEM_OF_DECLARE usage from Freescale qbman driver (currently orphaned) - Add Saravana Kannan and drop Frank Rowand as DT maintainers DT bindings: - Convert Mediatek timer, Mediatek sysirq, fsl,imx6ul-tsc, fsl,imx6ul-pinctrl, Atmel AIC, Atmel HLCDC, FPGA region, and xlnx,sd-fec to DT schemas - Add existing, but undocumented fsl,imx-anatop binding - Add bunch of undocumented vendor prefixes used in compatible strings - Drop obsolete brcm,bcm2835-pm-wdt binding - Drop obsolete i2c.txt which as been replaced with schema in dtschema - Add DPS310 device and sort trivial-devices.yaml - Enable undocumented compatible checks on DT binding examples - More QCom maintainer fixes/updates - Updates to writing-schema.rst and DT submitting-patches.rst to cover some frequent review comments - Clean-up SPDX tags to use 'OR' rather than 'or'" * tag 'devicetree-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (56 commits) dt-bindings: soc: imx: fsl,imx-anatop: add imx6q regulators of: unittest: Use for_each_child_of_node_scoped() of: Introduce for_each_*_child_of_node_scoped() to automate of_node_put() handling of: Add cleanup.h based auto release via __free(device_node) markings of: Move all FDT reserved-memory handling into of_reserved_mem.c of: Add KUnit test to confirm DTB is loaded of: unittest: treat missing of_root as error instead of fixing up x86/of: Unconditionally call unflatten_and_copy_device_tree() um: Unconditionally call unflatten_device_tree() of: Create of_root if no dtb provided by firmware of: Always unflatten in unflatten_and_copy_device_tree() dt-bindings: timer: mediatek: Convert to json-schema dt-bindings: interrupt-controller: fsl,intmux: Include power-domains support soc: fsl: qbman: Remove RESERVEDMEM_OF_DECLARE usage dt-bindings: fsl-imx-sdma: fix HDMI audio index dt-bindings: soc: imx: fsl,imx-iomuxc-gpr: add imx6 dt-bindings: soc: imx: fsl,imx-anatop: add binding dt-bindings: input: touchscreen: fsl,imx6ul-tsc convert to YAML dt-bindings: pinctrl: fsl,imx6ul-pinctrl: convert to YAML of: make for_each_property_of_node() available to to !OF ...
2024-03-14Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-03-14Merge tag 'probes-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: "x86 kprobes: - Use boolean for some function return instead of 0 and 1 - Prohibit probing on INT/UD. This prevents user to put kprobe on INTn/INT1/INT3/INTO and UD0/UD1/UD2 because these are used for a special purpose in the kernel - Boost Grp instructions. Because a few percent of kernel instructions are Grp 2/3/4/5 and those are safe to be executed without ip register fixup, allow those to be boosted (direct execution on the trampoline buffer with a JMP) tracing: - Add function argument access from return events (kretprobe and fprobe). This allows user to compare how a data structure field is changed after executing a function. With BTF, return event also accepts function argument access by name. - Fix a wrong comment (using "Kretprobe" in fprobe) - Cleanup a big probe argument parser function into three parts, type parser, post-processing function, and main parser - Cleanup to set nr_args field when initializing trace_probe instead of counting up it while parsing - Cleanup a redundant #else block from tracefs/README source code - Update selftests to check entry argument access from return probes - Documentation update about entry argument access from return probes" * tag 'probes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Add entry argument access at function exit selftests/ftrace: Add test cases for entry args at function exit tracing/probes: Support $argN in return probe (kprobe and fprobe) tracing: Remove redundant #else block for BTF args from README tracing/probes: cleanup: Set trace_probe::nr_args at trace_probe_init tracing/probes: Cleanup probe argument parser tracing/fprobe-event: cleanup: Fix a wrong comment in fprobe event x86/kprobes: Boost more instructions from grp2/3/4/5 x86/kprobes: Prohibit kprobing on INT and UD x86/kprobes: Refactor can_{probe,boost} return type to bool
2024-03-14Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "The major features are support for LPA2 (52-bit VA/PA with 4K and 16K pages), the dpISA extension and Rust enabled on arm64. The changes are mostly contained within the usual arch/arm64/, drivers/perf, the arm64 Documentation and kselftests. The exception is the Rust support which touches some generic build files. Summary: - Reorganise the arm64 kernel VA space and add support for LPA2 (at stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address range with 4KB and 16KB pages - Enable Rust on arm64 - Support for the 2023 dpISA extensions (data processing ISA), host only - arm64 perf updates: - StarFive's StarLink (integrates one or more CPU cores with a shared L3 memory system) PMU support - Enable HiSilicon Erratum 162700402 quirk for HIP09 - Several updates for the HiSilicon PCIe PMU driver - Arm CoreSight PMU support - Convert all drivers under drivers/perf/ to use .remove_new() - Miscellaneous: - Don't enable workarounds for "rare" errata by default - Clean up the DAIF flags handling for EL0 returns (in preparation for NMI support) - Kselftest update for ptrace() - Update some of the sysreg field definitions - Slight improvement in the code generation for inline asm I/O accessors to permit offset addressing - kretprobes: acquire regs via a BRK exception (previously done via a trampoline handler) - SVE/SME cleanups, comment updates - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously disabled due to gcc silently ignoring -falign-functions=N)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (134 commits) Revert "mm: add arch hook to validate mmap() prot flags" Revert "arm64: mm: add support for WXN memory translation attribute" Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512" ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 kselftest/arm64: Add 2023 DPISA hwcap test coverage kselftest/arm64: Add basic FPMR test kselftest/arm64: Handle FPMR context in generic signal frame parser arm64/hwcap: Define hwcaps for 2023 DPISA features arm64/ptrace: Expose FPMR via ptrace arm64/signal: Add FPMR signal handling arm64/fpsimd: Support FEAT_FPMR arm64/fpsimd: Enable host kernel access to FPMR arm64/cpufeature: Hook new identification registers up to cpufeature docs: perf: Fix build warning of hisi-pcie-pmu.rst perf: starfive: Only allow COMPILE_TEST for 64-bit architectures MAINTAINERS: Add entry for StarFive StarLink PMU docs: perf: Add description for StarFive's StarLink PMU dt-bindings: perf: starfive: Add JH8100 StarLink PMU perf: starfive: Add StarLink PMU support docs: perf: Update usage for target filter of hisi-pcie-pmu ...
2024-03-14Merge tag 'pci-v6.9-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Consolidate interrupt related code in irq.c (Ilpo Järvinen) - Reduce kernel size by replacing sysfs resource macros with functions (Ilpo Järvinen) - Reduce kernel size by compiling sysfs support only when CONFIG_SYSFS=y (Lukas Wunner) - Avoid using Extended Tags on 3ware-9650SE Root Port to work around an apparent hardware defect (Jörg Wedekind) Resource management: - Fix an MMIO mapping leak in pci_iounmap() (Philipp Stanner) - Move pci_iomap.c and other PCI-specific devres code to drivers/pci (Philipp Stanner) - Consolidate PCI devres code in devres.c (Philipp Stanner) Power management: - Avoid D3cold on Asus B1400 PCI-NVMe bridge, where firmware doesn't know how to return correctly to D0, and remove previous quirk that wasn't as specific (Daniel Drake) - Allow runtime PM when the driver enables it but doesn't need any runtime PM callbacks (Raag Jadav) - Drain runtime-idle callbacks before driver removal to avoid races between .remove() and .runtime_idle(), which caused intermittent page faults when the rtsx .runtime_idle() accessed registers that its .remove() had already unmapped (Rafael J. Wysocki) Virtualization: - Avoid Secondary Bus Reset on LSI FW643 so it can be assigned to VMs with VFIO, e.g., for professional audio software on many Apple machines, at the cost of leaking state between VMs (Edmund Raile) Error handling: - Print all logged TLP Prefixes, not just the first, after AER or DPC errors (Ilpo Järvinen) - Quirk the DPC PIO log size for Intel Raptor Lake Root Ports, which still don't advertise a legal size (Paul Menzel) - Ignore expected DPC Surprise Down errors on hot removal (Smita Koralahalli) - Block runtime suspend while handling AER errors to avoid races that prevent the device form being resumed from D3hot (Stanislaw Gruszka) Peer-to-peer DMA: - Use atomic XA allocation in RCU read section (Christophe JAILLET) ASPM: - Collect bits of ASPM-related code that we need even without CONFIG_PCIEASPM into aspm.c (David E. Box) - Save/restore L1 PM Substates config for suspend/resume (David E. Box) - Update save_save when ASPM config is changed, so a .slot_reset() during error recovery restores the changed config, not the .probe()-time config (Vidya Sagar) Endpoint framework: - Refactor and improve pci_epf_alloc_space() API (Niklas Cassel) - Clean up endpoint BAR descriptions (Niklas Cassel) - Fix ntb_register_device() name leak in error path (Yang Yingliang) - Return actual error code for pci_vntb_probe() failure (Yang Yingliang) Broadcom STB PCIe controller driver: - Fix MDIO write polling, which previously never waited for completion (Jonathan Bell) Cadence PCIe endpoint driver: - Clear the ARI "Next Function Number" of last function (Jasko-EXT Wojciech) Freescale i.MX6 PCIe controller driver: - Simplify by replacing switch statements with function pointers for different hardware variants (Frank Li) - Simplify by using clk_bulk*() API (Frank Li) - Remove redundant DT clock and reg/reg-name details (Frank Li) - Add i.MX95 DT and driver support for both Root Complex and Endpoint mode (Frank Li) Microsoft Hyper-V host bridge driver: - Reduce memory usage by limiting ring buffer size to 16KB instead of 4 pages (Michael Kelley) Qualcomm PCIe controller driver: - Add X1E80100 DT and driver support (Abel Vesa) - Add DT 'required-opps' for SoCs that require a minimum performance level (Johan Hovold) - Make DT 'msi-map-mask' optional, depending on how MSI interrupts are mapped (Johan Hovold) - Disable ASPM L0s for sc8280xp, sa8540p and sa8295p because the PHY configuration isn't tuned correctly for L0s (Johan Hovold) - Split dt-binding qcom,pcie.yaml into qcom,pcie-common.yaml and separate files for SA8775p, SC7280, SC8180X, SC8280XP, SM8150, SM8250, SM8350, SM8450, SM8550 for easier reviewing (Krzysztof Kozlowski) - Enable BDF to SID translation by disabling bypass mode (Manivannan Sadhasivam) - Add endpoint MHI support for Snapdragon SA8775P SoC (Mrinmay Sarkar) Synopsys DesignWare PCIe controller driver: - Allocate 64-bit MSI address if no 32-bit address is available (Ajay Agarwal) - Fix endpoint Resizable BAR to actually advertise the required 1MB size (Niklas Cassel) MicroSemi Switchtec management driver: - Release resources if the .probe() fails (Christophe JAILLET) Miscellaneous: - Make pcie_port_bus_type const (Ricardo B. Marliere)" * tag 'pci-v6.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (77 commits) PCI/ASPM: Update save_state when configuration changes PCI/ASPM: Disable L1 before configuring L1 Substates PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state() PCI/ASPM: Save L1 PM Substates Capability for suspend/resume PCI: hv: Fix ring buffer size calculation PCI: dwc: endpoint: Fix advertised resizable BAR size PCI: cadence: Clear the ARI Capability Next Function Number of the last function PCI: dwc: Strengthen the MSI address allocation logic PCI: brcmstb: Fix broken brcm_pcie_mdio_write() polling PCI: qcom: Add X1E80100 PCIe support dt-bindings: PCI: qcom: Document the X1E80100 PCIe Controller PCI: qcom: Enable BDF to SID translation properly PCI/AER: Generalize TLP Header Log reading PCI/AER: Use explicit register size for PCI_ERR_CAP PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p dt-bindings: PCI: qcom: Do not require 'msi-map-mask' dt-bindings: PCI: qcom: Allow 'required-opps' PCI/AER: Block runtime suspend when handling errors PCI/ASPM: Move pci_save_ltr_state() to aspm.c PCI/ASPM: Always build aspm.c ...
2024-03-14Merge tag 'platform-drivers-x86-v6.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: - New acer-wmi HW support - Support for new revision of amd/pmf heartbeat notify - Correctly handle asus-wmi HW without LEDs - fujitsu-laptop battery charge control support - Support for new hp-wmi thermal profiles - Support ideapad-laptop refresh rate key - Put intel/pmc AI accelerator (GNA) into D3 if it has no driver to allow entry into low-power modes, and temporarily removed Lunar Lake SSRAM support due to breaking FW changes causing probe fail (further breaking FW changes are still pending) - Report pmc/punit_atom devices that prevent reacing low power levels - Surface Fan speed function support - Support for more sperial keys and complete the list of models with non-standard fan registers in thinkpad_acpi - New DMI touchscreen HW support - Continued modernization efforts of wmi - Removal of obsoleted ledtrig-audio call and the related dependency - Debug & metrics interface improvements - Miscellaneous cleanups / fixes / improvements * tag 'platform-drivers-x86-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (87 commits) platform/x86/intel/pmc: Improve PKGC residency counters debug platform/x86: asus-wmi: Consider device is absent when the read is ~0 Documentation/x86/amd/hsmp: Updating urls platform/mellanox: mlxreg-hotplug: Remove redundant NULL-check platform/x86/amd/pmf: Update sps power thermals according to the platform-profiles platform/x86/amd/pmf: Add support to get sps default APTS index values platform/x86/amd/pmf: Add support to get APTS index numbers for static slider platform/x86/amd/pmf: Add support to notify sbios heart beat event platform/x86/amd/pmf: Add support to get sbios requests in PMF driver platform/x86/amd/pmf: Disable debugfs support for querying power thermals platform/x86/amd/pmf: Differentiate PMF ACPI versions x86/platform/atom: Check state of Punit managed devices on s2idle platform/x86: pmc_atom: Check state of PMC clocks on s2idle platform/x86: pmc_atom: Check state of PMC managed devices on s2idle platform/x86: pmc_atom: Annotate d3_sts register bit defines clk: x86: Move clk-pmc-atom register defines to include/linux/platform_data/x86/pmc_atom.h platform/x86: make fw_attr_class constant platform/x86/intel/tpmi: Change vsec offset to u64 platform/x86: intel_scu_pcidrv: Remove unused intel-mid.h platform/x86: intel_scu_wdt: Remove unused intel-mid.h ...
2024-03-13Merge tag 'efi-next-for-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Measure initrd and command line using the CC protocol if the ordinary TCG2 protocol is not implemented, typically on TDX confidential VMs - Avoid creating mappings that are both writable and executable while running in the EFI boot services. This is a prerequisite for getting the x86 shim loader signed by MicroSoft again, which allows the distros to install on x86 PCs that ship with EFI secure boot enabled. - API update for struct platform_driver::remove() * tag 'efi-next-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: virt: efi_secret: Convert to platform remove callback returning void x86/efistub: Remap kernel text read-only before dropping NX attribute efi/libstub: Add get_event_log() support for CC platforms efi/libstub: Measure into CC protocol if TCG2 protocol is absent efi/libstub: Add Confidential Computing (CC) measurement typedefs efi/tpm: Use symbolic GUID name from spec for final events table efi/libstub: Use TPM event typedefs from the TCG PC Client spec
2024-03-13Merge tag 'acpi-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These modify the ACPI device events and processor enumeration code to take the 'enabled' _STA bit into account as mandated by the ACPI specification, convert several platform drivers to using a remove callback that returns void, add some new quirks for ACPI IRQ override and other things, address assorted issues and clean up code. Specifics: - Rearrange Device Check and Bus Check notification handling in the ACPI device hotplug code to make it get the "enabled" _STA bit into account (Rafael Wysocki) - Modify acpi_processor_add() to skip processors with the "enabled" _STA bit clear, as per the specification (Rafael Wysocki) - Stop failing Device Check notification handling without a valid reason (Rafael Wysocki) - Defer enumeration of devices that depend on a device with an ACPI device ID equalt to INTC10CF to address probe ordering issues on some platforms (Wentong Wu) - Constify acpi_bus_type (Ricardo Marliere) - Make the ACPI-specific suspend-to-idle code take the Low-Power S0 Idle MSFT UUID into account on non-AMD systems (Rafael Wysocki) - Add ACPI IRQ override quirks for some new platforms (Sergey Kalinichev, Maxim Kudinov, Alexey Froloff, Sviatoslav Harasymchuk, Nicolas Haye) - Make the NFIT parsing code use acpi_evaluate_dsm_typed() (Andy Shevchenko) - Fix a memory leak in acpi_processor_power_exit() (Armin Wolf) - Make it possible to quirk the CSI-2 and MIPI DisCo for Imaging properties parsing and add a quirk for Dell XPS 9315 (Sakari Ailus) - Prevent false-positive static checker warnings from triggering by intializing some variables in the ACPI thermal code to zero (Colin Ian King) - Add DELL0501 handling to acpi_quirk_skip_serdev_enumeration() and make that function generic (Hans de Goede) - Make the ACPI backlight code handle fetching EDID that is longer than 256 bytes (Mario Limonciello) - Skip initialization of GHES_ASSIST structures for Machine Check Architecture in APEI (Avadhut Naik) - Convert several plaform drivers in the ACPI subsystem to using a remove callback that returns void (Uwe Kleine-König) - Drop the long-deprecated custom_method debugfs interface that is problematic from the security standpoint (Rafael Wysocki) - Use %pe in a couple of places in the ACPI code for easier error decoding (Onkarnath) - Fix register width information handling during system memory accesses in the ACPI CPPC library (Jarred White) - Add AMD CPPC V2 support for family 17h processors to the ACPI CPPC library (Perry Yuan)" * tag 'acpi-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) ACPI: resource: Use IRQ override on Maibenben X565 ACPI: CPPC: Use access_width over bit_width for system memory accesses ACPI: CPPC: enable AMD CPPC V2 support for family 17h processors ACPI: APEI: Skip initialization of GHES_ASSIST structures for Machine Check Architecture ACPI: scan: Consolidate Device Check and Bus Check notification handling ACPI: scan: Rework Device Check and Bus Check notification handling ACPI: scan: Make acpi_processor_add() check the device enabled bit ACPI: scan: Relocate acpi_bus_trim_one() ACPI: scan: Fix device check notification handling ACPI: resource: Add MAIBENBEN X577 to irq1_edge_low_force_override ACPI: pfr_update: Convert to platform remove callback returning void ACPI: pfr_telemetry: Convert to platform remove callback returning void ACPI: fan: Convert to platform remove callback returning void ACPI: GED: Convert to platform remove callback returning void ACPI: DPTF: Convert to platform remove callback returning void ACPI: AGDI: Convert to platform remove callback returning void ACPI: TAD: Convert to platform remove callback returning void ACPI: APEI: GHES: Convert to platform remove callback returning void ACPI: property: Polish ignoring bad data nodes ACPI: thermal_lib: Initialize temp_decik to zero ...