summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2020-08-03Merge tag 'x86-cleanups-2020-08-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups all around the place" * tag 'x86-cleanups-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioperm: Initialize pointer bitmap with NULL rather than 0 x86: uv: uv_hub.h: Delete duplicated word x86: cmpxchg_32.h: Delete duplicated word x86: bootparam.h: Delete duplicated word x86/mm: Remove the unused mk_kernel_pgd() #define x86/tsc: Remove unused "US_SCALE" and "NS_SCALE" leftover macros x86/ioapic: Remove unused "IOAPIC_AUTO" define x86/mm: Drop unused MAX_PHYSADDR_BITS x86/msr: Move the F15h MSRs where they belong x86/idt: Make idt_descr static initrd: Remove erroneous comment x86/mm/32: Fix -Wmissing prototypes warnings for init.c cpu/speculation: Add prototype for cpu_show_srbds() x86/mm: Fix -Wmissing-prototypes warnings for arch/x86/mm/init.c x86/asm: Unify __ASSEMBLY__ blocks x86/cpufeatures: Mark two free bits in word 3 x86/msr: Lift AMD family 0x15 power-specific MSRs
2020-08-03Merge tag 'x86-alternatives-2020-08-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/alternatives update from Ingo Molnar: "A single commit that improves the alternatives patching syslog debug output" * tag 'x86-alternatives-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Add pr_fmt() to debug macros
2020-08-03Merge tag 'sched-core-2020-08-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Improve uclamp performance by using a static key for the fast path - Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for better power efficiency of RT tasks on battery powered devices. (The default is to maximize performance & reduce RT latencies.) - Improve utime and stime tracking accuracy, which had a fixed boundary of error, which created larger and larger relative errors as the values become larger. This is now replaced with more precise arithmetics, using the new mul_u64_u64_div_u64() helper in math64.h. - Improve the deadline scheduler, such as making it capacity aware - Improve frequency-invariant scheduling - Misc cleanups in energy/power aware scheduling - Add sched_update_nr_running tracepoint to track changes to nr_running - Documentation additions and updates - Misc cleanups and smaller fixes * tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/doc: Factorize bits between sched-energy.rst & sched-capacity.rst sched/doc: Document capacity aware scheduling sched: Document arch_scale_*_capacity() arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE Documentation/sysctl: Document uclamp sysctl knobs sched/uclamp: Add a new sysctl to control RT default boost value sched/uclamp: Fix a deadlock when enabling uclamp static key sched: Remove duplicated tick_nohz_full_enabled() check sched: Fix a typo in a comment sched/uclamp: Remove unnecessary mutex_init() arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entry arch_topology, sched/core: Cleanup thermal pressure definition trace/events/sched.h: fix duplicated word linux/sched/mm.h: drop duplicated words in comments smp: Fix a potential usage of stale nr_cpus sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal sched: nohz: stop passing around unused "ticks" parameter. sched: Better document ttwu() sched: Add a tracepoint to track rq->nr_running ...
2020-08-03Merge tag 'perf-core-2020-08-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event updates from Ingo Molnar: "HW support updates: - Add uncore support for Intel Comet Lake - Add RAPL support for Hygon Fam18h - Add Intel "IIO stack to PMON mapping" support on Skylake-SP CPUs, which enumerates per device performance counters via sysfs and enables the perf stat --iiostat functionality - Add support for Intel "Architectural LBRs", which generalized the model specific LBR hardware tracing feature into a model-independent, architected performance monitoring feature. Usage is mostly seamless to tooling, as the pre-existing LBR features are kept, but there's a couple of advantages under the hood, such as faster context-switching, faster LBR reads, cleaner exposure of LBR features to guest kernels, etc. ( Since architectural LBRs are supported via XSAVE, there's related changes to the x86 FPU code as well. ) ftrace/perf updates: - Add support to add a text poke event to record changes to kernel text (i.e. self-modifying code) in order to support tracers like Intel PT decoding through jump labels, kprobes and ftrace trampolines. Misc cleanups, smaller fixes..." * tag 'perf-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) perf/x86/rapl: Add Hygon Fam18h RAPL support kprobes: Remove unnecessary module_mutex locking from kprobe_optimizer() x86/perf: Fix a typo perf: <linux/perf_event.h>: drop a duplicated word perf/x86/intel/lbr: Support XSAVES for arch LBR read perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch x86/fpu/xstate: Add helpers for LBR dynamic supervisor feature x86/fpu/xstate: Support dynamic supervisor feature for LBR x86/fpu: Use proper mask to replace full instruction mask perf/x86: Remove task_ctx_size perf/x86/intel/lbr: Create kmem_cache for the LBR context data perf/core: Use kmem_cache to allocate the PMU specific data perf/core: Factor out functions to allocate/free the task_ctx_data perf/x86/intel/lbr: Support Architectural LBR perf/x86/intel/lbr: Factor out intel_pmu_store_lbr perf/x86/intel/lbr: Factor out rdlbr_all() and wrlbr_all() perf/x86/intel/lbr: Mark the {rd,wr}lbr_{to,from} wrappers __always_inline perf/x86/intel/lbr: Unify the stored format of LBR information perf/x86/intel/lbr: Support LBR_CTL perf/x86: Expose CPUID enumeration bits for arch LBR ...
2020-08-03Merge tag 'locking-core-2020-08-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - LKMM updates: mostly documentation changes, but also some new litmus tests for atomic ops. - KCSAN updates: the most important change is that GCC 11 now has all fixes in place to support KCSAN, so GCC support can be enabled again. Also more annotations. - futex updates: minor cleanups and simplifications - seqlock updates: merge preparatory changes/cleanups for the 'associated locks' facilities. - lockdep updates: - simplify IRQ trace event handling - add various new debug checks - simplify header dependencies, split out <linux/lockdep_types.h>, decouple lockdep from other low level headers some more - fix NMI handling - misc cleanups and smaller fixes * tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) kcsan: Improve IRQ state trace reporting lockdep: Refactor IRQ trace events fields into struct seqlock: lockdep assert non-preemptibility on seqcount_t write lockdep: Add preemption enabled/disabled assertion APIs seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs seqlock: Reorder seqcount_t and seqlock_t API definitions seqlock: seqcount_t latch: End read sections with read_seqcount_retry() seqlock: Properly format kernel-doc code samples Documentation: locking: Describe seqlock design and usage locking/qspinlock: Do not include atomic.h from qspinlock_types.h locking/atomic: Move ATOMIC_INIT into linux/types.h lockdep: Move list.h inclusion into lockdep.h locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs futex: Remove unused or redundant includes futex: Consistently use fshared as boolean futex: Remove needless goto's futex: Remove put_futex_key() rwsem: fix commas in initialisation docs: locking: Replace HTTP links with HTTPS ones ...
2020-08-03Merge tag 'irq-urgent-2020-08-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Fix a recent IRQ affinities regression, add in a missing debugfs printout that helps the debugging of IRQ affinity logic bugs, and fix a memory leak" * tag 'irq-urgent-2020-08-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/debugfs: Add missing irqchip flags genirq/affinity: Make affinity setting if activated opt-in irqdomain/treewide: Free firmware node after domain removal
2020-07-31Merge branch 'linus' into locking/core, to resolve conflictIngo Molnar
Conflicts: arch/arm/include/asm/percpu.h As Stephen Rothwell noted, there's a conflict between this commit in locking/core: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") and this fresh upstream commit: aa54ea903abb ("ARM: percpu.h: fix build error") a21ee6055c30 is a simpler solution to the dependency problem and doesn't further increase header hell - so this conflict resolution effectively reverts aa54ea903abb and uses the a21ee6055c30 solution. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-30initrd: remove support for multiple floppiesChristoph Hellwig
Remove the special handling for multiple floppies in the initrd code. No one should be using floppies for booting these days. (famous last words..) Includes a spelling fix from Colin Ian King <colin.king@canonical.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-29x86/i8259: Use printk_deferred() to prevent deadlockx86-urgent-2020-08-02Thomas Gleixner
0day reported a possible circular locking dependency: Chain exists of: &irq_desc_lock_class --> console_owner --> &port_lock_key Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&port_lock_key); lock(console_owner); lock(&port_lock_key); lock(&irq_desc_lock_class); The reason for this is a printk() in the i8259 interrupt chip driver which is invoked with the irq descriptor lock held, which reverses the lock operations vs. printk() from arbitrary contexts. Switch the printk() to printk_deferred() to avoid that. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87365abt2v.fsf@nanos.tec.linutronix.de
2020-07-28Merge tag 'v5.8-rc7' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-27x86: switch to ->regset_get()Al Viro
All instances of ->get() in arch/x86 switched; that might or might not be worth splitting up. Notes: * for xstateregs_get() the amount we want to store is determined at the boot time; see init_xstate_size() and update_regset_xstate_info() for details. task->thread.fpu.state.xsave ends with a flexible array member and the amount of data in it depends upon the FPU features supported/enabled. * fpregs_get() writes slightly less than full ->thread.fpu.state.fsave (the last word is not copied); we pass the full size of state.fsave and let membuf_write() trim to the amount declared by regset - __regset_get() will make sure that the space in buffer is no more than that. * copy_xstate_to_user() and its helpers are gone now. * fpregs_soft_get() was getting user_regset_copyout() arguments wrong. Since "x86: x86 user_regset math_emu" back in 2008... I really doubt that it's worth splitting out for -stable, though - you need a 486SX box for that to trigger... [Kevin's braino fix for copy_xstate_to_kernel() essentially duplicated here] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27genirq/affinity: Make affinity setting if activated opt-inThomas Gleixner
John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
2020-07-27x86/cpu: Relocate sync_core() to sync_core.hRicardo Neri
Having sync_core() in processor.h is problematic since it is not possible to check for hardware capabilities via the *cpu_has() family of macros. The latter needs the definitions in processor.h. It also looks more intuitive to relocate the function to sync_core.h. This changeset does not make changes in functionality. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200727043132.15082-3-ricardo.neri-calderon@linux.intel.com
2020-07-27Merge tag 'v5.8-rc7' into x86/cpu, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-26Merge branch 'x86/urgent' into x86/cleanupsIngo Molnar
Refresh the branch for a dependent commit. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-26Merge branch 'locking/nmi' into x86/entryIngo Molnar
Resolve conflicts with ongoing lockdep work that fixed the NMI entry code. Conflicts: arch/x86/entry/common.c arch/x86/include/asm/idtentry.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-25Merge tag 'v5.8-rc6' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-25x86/split_lock: Enable the split lock feature on Sapphire Rapids and Alder ↵Fenghua Yu
Lake CPUs Add Sapphire Rapids and Alder Lake processors to CPU list to enumerate and enable the split lock feature. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/1595634320-79689-1-git-send-email-fenghua.yu@intel.com
2020-07-25Merge tag 'v5.8-rc6' into x86/cpu, to refresh the branch before adding new ↵Ingo Molnar
commits Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-24x86/entry: Cleanup idtentry_enter/exitThomas Gleixner
Remove the temporary defines and fixup all references. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.855839271@linutronix.de
2020-07-24x86/entry: Cleanup idtentry_entry/exit_userThomas Gleixner
Cleanup the temporary defines and use irqentry_ instead of idtentry_. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.602603691@linutronix.de
2020-07-24x86/entry: Use generic syscall exit functionalityThomas Gleixner
Replace the x86 variant with the generic version. Provide the relevant architecture specific helper functions and defines. Use a temporary define for idtentry_exit_user which will be cleaned up seperately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.494648601@linutronix.de
2020-07-24Merge branch 'core/entry' into x86/entryThomas Gleixner
Pick up generic entry code to migrate x86 over.
2020-07-24x86: Correct noinstr qualifiersIra Weiny
The noinstr qualifier is to be specified before the return type in the same way inline is used. These 2 cases were missed by previous patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200723161405.852613-1-ira.weiny@intel.com
2020-07-24Merge v5.8-rc6 into drm-nextDave Airlie
I've got a silent conflict + two trees based on fixes to merge. Fixes a silent merge with amdgpu Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-07-23irqdomain/treewide: Free firmware node after domain removalJon Derrick
Commit 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") unintentionally caused a dangling pointer page fault issue on firmware nodes that were freed after IRQ domain allocation. Commit e3beca48a45b fixed that dangling pointer issue by only freeing the firmware node after an IRQ domain allocation failure. That fix no longer frees the firmware node immediately, but leaves the firmware node allocated after the domain is removed. The firmware node must be kept around through irq_domain_remove, but should be freed it afterwards. Add the missing free operations after domain removal where where appropriate. Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com
2020-07-22x86/dumpstack: Show registers dump with trace's log levelx86-core-2020-08-03Dmitry Safonov
show_trace_log_lvl() provides x86 platform-specific way to unwind backtrace with a given log level. Unfortunately, registers dump(s) are not printed with the same log level - instead, KERN_DEFAULT is always used. Arista's switches uses quite common setup with rsyslog, where only urgent messages goes to console (console_log_level=KERN_ERR), everything else goes into /var/log/ as the console baud-rate often is indecently slow (9600 bps). Backtrace dumps without registers printed have proven to be as useful as morning standups. Furthermore, in order to introduce KERN_UNSUPPRESSED (which I believe is still the most elegant way to fix raciness of sysrq[1]) the log level should be passed down the stack to register dumping functions. Besides, there is a potential use-case for printing traces with KERN_DEBUG level [2] (where registers dump shouldn't appear with higher log level). After all preparations are done, provide log_lvl parameter for show_regs_if_on_stack() and wire up to actual log level used as an argument for show_trace_log_lvl(). [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/ [2]: https://lore.kernel.org/linux-doc/20190724170249.9644-1-dima@arista.com/ Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://lkml.kernel.org/r/20200629144847.492794-4-dima@arista.com
2020-07-22x86/dumpstack: Add log_lvl to __show_regs()Dmitry Safonov
show_trace_log_lvl() provides x86 platform-specific way to unwind backtrace with a given log level. Unfortunately, registers dump(s) are not printed with the same log level - instead, KERN_DEFAULT is always used. Arista's switches uses quite common setup with rsyslog, where only urgent messages goes to console (console_log_level=KERN_ERR), everything else goes into /var/log/ as the console baud-rate often is indecently slow (9600 bps). Backtrace dumps without registers printed have proven to be as useful as morning standups. Furthermore, in order to introduce KERN_UNSUPPRESSED (which I believe is still the most elegant way to fix raciness of sysrq[1]) the log level should be passed down the stack to register dumping functions. Besides, there is a potential use-case for printing traces with KERN_DEBUG level [2] (where registers dump shouldn't appear with higher log level). Add log_lvl parameter to __show_regs(). Keep the used log level intact to separate visible change. [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/ [2]: https://lore.kernel.org/linux-doc/20190724170249.9644-1-dima@arista.com/ Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://lkml.kernel.org/r/20200629144847.492794-3-dima@arista.com
2020-07-22x86/dumpstack: Add log_lvl to show_iret_regs()Dmitry Safonov
show_trace_log_lvl() provides x86 platform-specific way to unwind backtrace with a given log level. Unfortunately, registers dump(s) are not printed with the same log level - instead, KERN_DEFAULT is always used. Arista's switches uses quite common setup with rsyslog, where only urgent messages goes to console (console_log_level=KERN_ERR), everything else goes into /var/log/ as the console baud-rate often is indecently slow (9600 bps). Backtrace dumps without registers printed have proven to be as useful as morning standups. Furthermore, in order to introduce KERN_UNSUPPRESSED (which I believe is still the most elegant way to fix raciness of sysrq[1]) the log level should be passed down the stack to register dumping functions. Besides, there is a potential use-case for printing traces with KERN_DEBUG level [2] (where registers dump shouldn't appear with higher log level). Add log_lvl parameter to show_iret_regs() as a preparation to add it to __show_regs() and show_regs_if_on_stack(). [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/ [2]: https://lore.kernel.org/linux-doc/20190724170249.9644-1-dima@arista.com/ Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://lkml.kernel.org/r/20200629144847.492794-2-dima@arista.com
2020-07-22x86/dumpstack: Dump user space code correctly againx86-urgent-2020-07-25Thomas Gleixner
H.J. reported that post 5.7 a segfault of a user space task does not longer dump the Code bytes when /proc/sys/debug/exception-trace is enabled. It prints 'Code: Bad RIP value.' instead. This was broken by a recent change which made probe_kernel_read() reject non-kernel addresses. Update show_opcodes() so it retrieves user space opcodes via copy_from_user_nmi(). Fixes: 98a23609b103 ("maccess: always use strict semantics for probe_kernel_read") Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87h7tz306w.fsf@nanos.tec.linutronix.de
2020-07-22x86/stacktrace: Fix reliable check for empty user task stacksJosh Poimboeuf
If a user task's stack is empty, or if it only has user regs, ORC reports it as a reliable empty stack. But arch_stack_walk_reliable() incorrectly treats it as unreliable. That happens because the only success path for user tasks is inside the loop, which only iterates on non-empty stacks. Generally, a user task must end in a user regs frame, but an empty stack is an exception to that rule. Thanks to commit 71c95825289f ("x86/unwind/orc: Fix error handling in __unwind_start()"), unwind_start() now sets state->error appropriately. So now for both ORC and FP unwinders, unwind_done() and !unwind_error() always means the end of the stack was successfully reached. So the success path for kthreads is no longer needed -- it can also be used for empty user tasks. Reported-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lkml.kernel.org/r/f136a4e5f019219cbc4f4da33b30c2f44fa65b84.1594994374.git.jpoimboe@redhat.com
2020-07-22x86/unwind/orc: Fix ORC for newly forked tasksJosh Poimboeuf
The ORC unwinder fails to unwind newly forked tasks which haven't yet run on the CPU. It correctly reads the 'ret_from_fork' instruction pointer from the stack, but it incorrectly interprets that value as a call stack address rather than a "signal" one, so the address gets incorrectly decremented in the call to orc_find(), resulting in bad ORC data. Fix it by forcing 'ret_from_fork' frames to be signal frames. Reported-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lkml.kernel.org/r/f91a8778dde8aae7f71884b5df2b16d552040441.1594994374.git.jpoimboe@redhat.com
2020-07-22Merge branch 'sched/urgent'Peter Zijlstra
2020-07-22x86, vmlinux.lds: Page-align end of ..page_aligned sectionsJoerg Roedel
On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is page-aligned, but the end of the .bss..page_aligned section is not guaranteed to be page-aligned. As a result, objects from other .bss sections may end up on the same 4k page as the idt_table, and will accidentially get mapped read-only during boot, causing unexpected page-faults when the kernel writes to them. This could be worked around by making the objects in the page aligned sections page sized, but that's wrong. Explicit sections which store only page aligned objects have an implicit guarantee that the object is alone in the page in which it is placed. That works for all objects except the last one. That's inconsistent. Enforcing page sized objects for these sections would wreckage memory sanitizers, because the object becomes artificially larger than it should be and out of bound access becomes legit. Align the end of the .bss..page_aligned and .data..page_aligned section on page-size so all objects places in these sections are guaranteed to have their own page. [ tglx: Amended changelog ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org
2020-07-21exec: Implement kernel_execveEric W. Biederman
To allow the kernel not to play games with set_fs to call exec implement kernel_execve. The function kernel_execve takes pointers into kernel memory and copies the values pointed to onto the new userspace stack. The calls with arguments from kernel space of do_execve are replaced with calls to kernel_execve. The calls do_execve and do_execveat are made static as there are now no callers outside of exec. The comments that mention do_execve are updated to refer to kernel_execve or execve depending on the circumstances. In addition to correcting the comments, this makes it easy to grep for do_execve and verify it is not used. Inspired-by: https://lkml.kernel.org/r/20200627072704.2447163-1-hch@lst.de Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87wo365ikj.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-19copy_xstate_to_kernel: Fix typo which caused GDB regressionKevin Buettner
This fixes a regression encountered while running the gdb.base/corefile.exp test in GDB's test suite. In my testing, the typo prevented the sw_reserved field of struct fxregs_state from being output to the kernel XSAVES area. Thus the correct mask corresponding to XCR0 was not present in the core file for GDB to interrogate, resulting in the following behavior: [kev@f32-1 gdb]$ ./gdb -q testsuite/outputs/gdb.base/corefile/corefile testsuite/outputs/gdb.base/corefile/corefile.core Reading symbols from testsuite/outputs/gdb.base/corefile/corefile... [New LWP 232880] warning: Unexpected size of section `.reg-xstate/232880' in core file. With the typo fixed, the test works again as expected. Signed-off-by: Kevin Buettner <kevinb@redhat.com> Fixes: 9e4636545933 ("copy_xstate_to_kernel(): don't leave parts of destination uninitialized") Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Airlie <airlied@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-19Merge tag 'x86-urgent-2020-07-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master Pull x86 fixes from Thomas Gleixner: "A pile of fixes for x86: - Fix the I/O bitmap invalidation on XEN PV, which was overlooked in the recent ioperm/iopl rework. This caused the TSS and XEN's I/O bitmap to get out of sync. - Use the proper vectors for HYPERV. - Make disabling of stack protector for the entry code work with GCC builds which enable stack protector by default. Removing the option is not sufficient, it needs an explicit -fno-stack-protector to shut it off. - Mark check_user_regs() noinstr as it is called from noinstr code. The missing annotation causes it to be placed in the text section which makes it instrumentable. - Add the missing interrupt disable in exc_alignment_check() - Fixup a XEN_PV build dependency in the 32bit entry code - A few fixes to make the Clang integrated assembler happy - Move EFI stub build to the right place for out of tree builds - Make prepare_exit_to_usermode() static. It's not longer called from ASM code" * tag 'x86-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Don't add the EFI stub to targets x86/entry: Actually disable stack protector x86/ioperm: Fix io bitmap invalidation on Xen PV x86: math-emu: Fix up 'cmp' insn for clang ias x86/entry: Fix vectors to IDTENTRY_SYSVEC for CONFIG_HYPERV x86/entry: Add compatibility with IAS x86/entry/common: Make prepare_exit_to_usermode() static x86/entry: Mark check_user_regs() noinstr x86/traps: Disable interrupts in exc_aligment_check() x86/entry/32: Fix XEN_PV build dependency
2020-07-18x86/ioperm: Fix io bitmap invalidation on Xen PVAndy Lutomirski
tss_invalidate_io_bitmap() wasn't wired up properly through the pvop machinery, so the TSS and Xen's io bitmap would get out of sync whenever disabling a valid io bitmap. Add a new pvop for tss_invalidate_io_bitmap() to fix it. This is XSA-329. Fixes: 22fe5b0439dd ("x86/ioperm: Move TSS bitmap update to exit to user work") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/d53075590e1f91c19f8af705059d3ff99424c020.1595030016.git.luto@kernel.org
2020-07-17genirq/affinity: Handle affinity setting on inactive interrupts correctlyirq-urgent-2020-07-19Thomas Gleixner
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests. X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS. Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then: - Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on. Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations. For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design. Remove the X86 workaround as it is not longer required. Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ali Saidi <alisaidi@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
2020-07-17x86/efi: Remove references to no-longer-used efi_have_uv1_memmap()steve.wahl@hpe.com
In removing UV1 support, efi_have_uv1_memmap is no longer used. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200713212955.786177105@hpe.com
2020-07-17x86/platform/uv: Remove support for UV1 platform from x2apic_uv_xsteve.wahl@hpe.com
UV1 is not longer supported. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212954.846026992@hpe.com
2020-07-16treewide: Remove uninitialized_var() usageKees Cook
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-14irqdomain/treewide: Keep firmware node unconditionally allocatedThomas Gleixner
Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after creating the irqdomain. The only purpose of these FW nodes is to convey name information. When this was introduced the core code did not store the pointer to the node in the irqdomain. A recent change stored the firmware node pointer in irqdomain for other reasons and missed to notice that the usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence are broken by this. Storing a dangling pointer is dangerous itself, but in case that the domain is destroyed later on this leads to a double free. Remove the freeing of the firmware node after creating the irqdomain from all affected call sites to cure this. Fixes: 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de
2020-07-10x86/entry: Fix NMI vs IRQ state trackingPeter Zijlstra
While the nmi_enter() users did trace_hardirqs_{off_prepare,on_finish}() there was no matching lockdep_hardirqs_*() calls to complete the picture. Introduce idtentry_{enter,exit}_nmi() to enable proper IRQ state tracking across the NMIs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200623083721.216740948@infradead.org
2020-07-10Merge branch 'x86/urgent' into x86/entry to pick up upstream fixes.Thomas Gleixner
2020-07-09x86/traps: Disable interrupts in exc_aligment_check()Thomas Gleixner
exc_alignment_check() fails to disable interrupts before returning to the entry code. Fixes: ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code") Reported-by: syzbot+0889df9502bc0f112b31@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.076519438@linutronix.de
2020-07-08x86/kvm: Add "nopvspin" parameter to disable PV spinlocksZhenzhong Duan
There are cases where a guest tries to switch spinlocks to bare metal behavior (e.g. by setting "xen_nopvspin" on XEN platform and "hv_nopvspin" on HYPER_V). That feature is missed on KVM, add a new parameter "nopvspin" to disable PV spinlocks for KVM guest. The new 'nopvspin' parameter will also replace Xen and Hyper-V specific parameters in future patches. Define variable nopvsin as global because it will be used in future patches as above. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08x86/kvm: Change print code to use pr_*() formatZhenzhong Duan
pr_*() is preferred than printk(KERN_* ...), after change all the print in arch/x86/kernel/kvm.c will have "kvm-guest: xxx" style. No functional change. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key get ↵Zhenzhong Duan
initialized" This reverts commit 34226b6b70980a8f81fff3c09a2c889f77edeeff. Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching early") adds jump_label_init() call in setup_arch() to make static keys initialized early, so we could use the original simpler code again. The similar change for XEN is in commit 090d54bcbc54 ("Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized"") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08Merge branch 'kvm-async-pf-int' into HEADPaolo Bonzini