summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2021-05-22powerpc/iommu: Annotate nested lock for lockdepAlexey Kardashevskiy
[ Upstream commit cc7130bf119add37f36238343a593b71ef6ecc1e ] The IOMMU table is divided into pools for concurrent mappings and each pool has a separate spinlock. When taking the ownership of an IOMMU group to pass through a device to a VM, we lock these spinlocks which triggers a false negative warning in lockdep (below). This fixes it by annotating the large pool's spinlock as a nest lock which makes lockdep not complaining when locking nested locks if the nest lock is locked already. === WARNING: possible recursive locking detected 5.11.0-le_syzkaller_a+fstn1 #100 Not tainted -------------------------------------------- qemu-system-ppc/4129 is trying to acquire lock: c0000000119bddb0 (&(p->lock)/1){....}-{2:2}, at: iommu_take_ownership+0xac/0x1e0 but task is already holding lock: c0000000119bdd30 (&(p->lock)/1){....}-{2:2}, at: iommu_take_ownership+0xac/0x1e0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(p->lock)/1); lock(&(p->lock)/1); === Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210301063653.51003-1-aik@ozlabs.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22powerpc/smp: Set numa node before updating maskSrikar Dronamraju
[ Upstream commit 6980d13f0dd189846887bbbfa43793d9a41768d3 ] Geethika reported a trace when doing a dlpar CPU add. ------------[ cut here ]------------ WARNING: CPU: 152 PID: 1134 at kernel/sched/topology.c:2057 CPU: 152 PID: 1134 Comm: kworker/152:1 Not tainted 5.12.0-rc5-master #5 Workqueue: events cpuset_hotplug_workfn NIP: c0000000001cfc14 LR: c0000000001cfc10 CTR: c0000000007e3420 REGS: c0000034a08eb260 TRAP: 0700 Not tainted (5.12.0-rc5-master+) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28828422 XER: 00000020 CFAR: c0000000001fd888 IRQMASK: 0 #012GPR00: c0000000001cfc10 c0000034a08eb500 c000000001f35400 0000000000000027 #012GPR04: c0000035abaa8010 c0000035abb30a00 0000000000000027 c0000035abaa8018 #012GPR08: 0000000000000023 c0000035abaaef48 00000035aa540000 c0000035a49dffe8 #012GPR12: 0000000028828424 c0000035bf1a1c80 0000000000000497 0000000000000004 #012GPR16: c00000000347a258 0000000000000140 c00000000203d468 c000000001a1a490 #012GPR20: c000000001f9c160 c0000034adf70920 c0000034aec9fd20 0000000100087bd3 #012GPR24: 0000000100087bd3 c0000035b3de09f8 0000000000000030 c0000035b3de09f8 #012GPR28: 0000000000000028 c00000000347a280 c0000034aefe0b00 c0000000010a2a68 NIP [c0000000001cfc14] build_sched_domains+0x6a4/0x1500 LR [c0000000001cfc10] build_sched_domains+0x6a0/0x1500 Call Trace: [c0000034a08eb500] [c0000000001cfc10] build_sched_domains+0x6a0/0x1500 (unreliable) [c0000034a08eb640] [c0000000001d1e6c] partition_sched_domains_locked+0x3ec/0x530 [c0000034a08eb6e0] [c0000000002936d4] rebuild_sched_domains_locked+0x524/0xbf0 [c0000034a08eb7e0] [c000000000296bb0] rebuild_sched_domains+0x40/0x70 [c0000034a08eb810] [c000000000296e74] cpuset_hotplug_workfn+0x294/0xe20 [c0000034a08ebc30] [c000000000178dd0] process_one_work+0x300/0x670 [c0000034a08ebd10] [c0000000001791b8] worker_thread+0x78/0x520 [c0000034a08ebda0] [c000000000185090] kthread+0x1a0/0x1b0 [c0000034a08ebe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 7d2903a6 4e800421 e8410018 7f67db78 7fe6fb78 7f45d378 7f84e378 7c681b78 3c62ff1a 3863c6f8 4802dc35 60000000 <0fe00000> 3920fff4 f9210070 e86100a0 ---[ end trace 532d9066d3d4d7ec ]--- Some of the per-CPU masks use cpu_cpu_mask as a filter to limit the search for related CPUs. On a dlpar add of a CPU, update cpu_cpu_mask before updating the per-CPU masks. This will ensure the cpu_cpu_mask is updated correctly before its used in setting the masks. Setting the numa_node will ensure that when cpu_cpu_mask() gets called, the correct node number is used. This code movement helped fix the above call trace. Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210401154200.150077-1-srikar@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22powerpc/prom: Mark identical_pvr_fixup as __initNathan Chancellor
[ Upstream commit 1ef1dd9c7ed27b080445e1576e8a05957e0e4dfc ] If identical_pvr_fixup() is not inlined, there are two modpost warnings: WARNING: modpost: vmlinux.o(.text+0x54e8): Section mismatch in reference from the function identical_pvr_fixup() to the function .init.text:of_get_flat_dt_prop() The function identical_pvr_fixup() references the function __init of_get_flat_dt_prop(). This is often because identical_pvr_fixup lacks a __init annotation or the annotation of of_get_flat_dt_prop is wrong. WARNING: modpost: vmlinux.o(.text+0x551c): Section mismatch in reference from the function identical_pvr_fixup() to the function .init.text:identify_cpu() The function identical_pvr_fixup() references the function __init identify_cpu(). This is often because identical_pvr_fixup lacks a __init annotation or the annotation of identify_cpu is wrong. identical_pvr_fixup() calls two functions marked as __init and is only called by a function marked as __init so it should be marked as __init as well. At the same time, remove the inline keywork as it is not necessary to inline this function. The compiler is still free to do so if it feels it is worthwhile since commit 889b3c1245de ("compiler: remove CONFIG_OPTIMIZE_INLINING entirely"). Fixes: 14b3d926a22b ("[POWERPC] 4xx: update 440EP(x)/440GR(x) identical PVR issue workaround") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1316 Link: https://lore.kernel.org/r/20210302200829.2680663-1-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22powerpc/eeh: Fix EEH handling for hugepages in ioremap space.Mahesh Salgaonkar
commit 5ae5bc12d0728db60a0aa9b62160ffc038875f1a upstream. During the EEH MMIO error checking, the current implementation fails to map the (virtual) MMIO address back to the pci device on radix with hugepage mappings for I/O. This results into failure to dispatch EEH event with no recovery even when EEH capability has been enabled on the device. eeh_check_failure(token) # token = virtual MMIO address addr = eeh_token_to_phys(token); edev = eeh_addr_cache_get_dev(addr); if (!edev) return 0; eeh_dev_check_failure(edev); <= Dispatch the EEH event In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys translation that results in wrong physical address, which is then passed to eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges to get to a PCI device. Hence, it fails to find a match and the EEH event never gets dispatched leaving the device in failed state. The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space") introduced following logic to translate virt to phys for hugepage mappings: eeh_token_to_phys(): + pa = pte_pfn(*ptep); + + /* On radix we can do hugepage mappings for io, so handle that */ + if (hugepage_shift) { + pa <<= hugepage_shift; <= This is wrong + pa |= token & ((1ul << hugepage_shift) - 1); + } This patch fixes the virt -> phys translation in eeh_token_to_phys() function. $ cat /sys/kernel/debug/powerpc/eeh_address_cache mem addr range [0x0000040080000000-0x00000400807fffff]: 0030:01:00.1 mem addr range [0x0000040080800000-0x0000040080ffffff]: 0030:01:00.1 mem addr range [0x0000040081000000-0x00000400817fffff]: 0030:01:00.0 mem addr range [0x0000040081800000-0x0000040081ffffff]: 0030:01:00.0 mem addr range [0x0000040082000000-0x000004008207ffff]: 0030:01:00.1 mem addr range [0x0000040082080000-0x00000400820fffff]: 0030:01:00.0 mem addr range [0x0000040082100000-0x000004008210ffff]: 0030:01:00.1 mem addr range [0x0000040082110000-0x000004008211ffff]: 0030:01:00.0 Above is the list of cached io address ranges of pci 0030:01:00.<fn>. Before this patch: Tracing 'arg1' of function eeh_addr_cache_get_dev() during error injection clearly shows that 'addr=' contains wrong physical address: kworker/u16:0-7 [001] .... 108.883775: eeh_addr_cache_get_dev: (eeh_addr_cache_get_dev+0xc/0xf0) addr=0x80103000a510 dmesg shows no EEH recovery messages: [ 108.563768] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x9ae) != mcp_pulse (0x7fff) [ 108.563788] bnx2x: [bnx2x_hw_stats_update:870(eth2)]NIG timer max (4294967295) [ 108.883788] bnx2x: [bnx2x_acquire_hw_lock:2013(eth1)]lock_status 0xffffffff resource_bit 0x1 [ 108.884407] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout [ 108.884976] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout <..> After this patch: eeh_addr_cache_get_dev() trace shows correct physical address: <idle>-0 [001] ..s. 1043.123828: eeh_addr_cache_get_dev: (eeh_addr_cache_get_dev+0xc/0xf0) addr=0x40080bc7cd8 dmesg logs shows EEH recovery getting triggerred: [ 964.323980] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x746f) != mcp_pulse (0x7fff) [ 964.323991] EEH: Recovering PHB#30-PE#10000 [ 964.324002] EEH: PE location: N/A, PHB location: N/A [ 964.324006] EEH: Frozen PHB#30-PE#10000 detected <..> Fixes: 33439620680b ("powerpc/eeh: Handle hugepages in ioremap space") Cc: stable@vger.kernel.org # v5.3+ Reported-by: Dominic DeMarco <ddemarc@us.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/161821396263.48361.2796709239866588652.stgit@jupiter Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17powerpc: improve handling of unrecoverable system resetNicholas Piggin
[ Upstream commit 11cb0a25f71818ca7ab4856548ecfd83c169aa4d ] If an unrecoverable system reset hits in process context, the system does not have to panic. Similar to machine check, call nmi_exit() before die(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-26-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-03powerpc/8xx: Fix software emulation interruptChristophe Leroy
[ Upstream commit 903178d0ce6bb30ef80a3604ab9ee2b57869fbc9 ] For unimplemented instructions or unimplemented SPRs, the 8xx triggers a "Software Emulation Exception" (0x1000). That interrupt doesn't set reason bits in SRR1 as the "Program Check Exception" does. Go through emulation_assist_interrupt() to set REASON_ILLEGAL. Fixes: fbbcc3bb139e ("powerpc/8xx: Remove SoftwareEmulation()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ad782af87a222efc79cfb06079b0fd23d4224eaf.1612515180.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-29powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filterTyrel Datwyler
commit f10881a46f8914428110d110140a455c66bdf27b upstream. Commit bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace") introduced the following error when invoking the errinjct userspace tool: [root@ltcalpine2-lp5 librtas]# errinjct open [327884.071171] sys_rtas: RTAS call blocked - exploit attempt? [327884.071186] sys_rtas: token=0x26, nargs=0 (called by errinjct) errinjct: Could not open RTAS error injection facility errinjct: librtas: open: Unexpected I/O error The entry for ibm,open-errinjct in rtas_filter array has a typo where the "j" is omitted in the rtas call name. After fixing this typo the errinjct tool functions again as expected. [root@ltcalpine2-lp5 linux]# errinjct open RTAS error injection facility open, token = 1 Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace") Cc: stable@vger.kernel.org Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201208195434.8289-1-tyreld@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc/8xx: Always fault when _PAGE_ACCESSED is not setChristophe Leroy
commit 29daf869cbab69088fe1755d9dd224e99ba78b56 upstream. The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. This adds at least 3 instructions to the TLB miss exception handlers fast path. Following patch will reduce this overhead. Also update the rotation instruction to the correct number of bits to reflect all changes done to _PAGE_ACCESSED over time. Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.") Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU") Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits") Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC") Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc/64s: flush L1D after user accessesNicholas Piggin
commit 9a32a7e78bd0cd9a9b6332cbdc345ee5ffd0c5de upstream. IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc/64s: flush L1D on kernel entryNicholas Piggin
commit f79643787e0a0762d2409b7b8334e83f22d85695 upstream. IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc/64s: move some exception handlers out of lineDaniel Axtens
(backport only) We're about to grow the exception handlers, which will make a bunch of them no longer fit within the space available. We move them out of line. This is a fiddly and error-prone business, so in the interests of reviewability I haven't merged this in with the addition of the entry flush. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05powerpc: Warn about use of smt_snooze_delayJoel Stanley
commit a02f6d42357acf6e5de6ffc728e6e77faf3ad217 upstream. It's not done anything for a long time. Save the percpu variable, and emit a warning to remind users to not expect it to do anything. This uses pr_warn_once instead of pr_warn_ratelimit as testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system showed the warning to be noisy, as the online/offline loop is slow. Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.") Cc: stable@vger.kernel.org # v3.14 Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902000012.3440389-1-joel@jms.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05powerpc/rtas: Restrict RTAS requests from userspaceAndrew Donnellan
commit bd59380c5ba4147dcbaad3e582b55ccfd120b764 upstream. A number of userspace utilities depend on making calls to RTAS to retrieve information and update various things. The existing API through which we expose RTAS to userspace exposes more RTAS functionality than we actually need, through the sys_rtas syscall, which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they want with arbitrary arguments. Many RTAS calls take the address of a buffer as an argument, and it's up to the caller to specify the physical address of the buffer as an argument. We allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can access, and then expose the physical address and size of this buffer in /proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address, poke at the buffer using /dev/mem, and pass an address in the RMO buffer to the RTAS call. However, there's nothing stopping the caller from specifying whatever address they want in the RTAS call, and it's easy to construct a series of RTAS calls that can overwrite arbitrary bytes (even without /dev/mem access). Additionally, there are some RTAS calls that do potentially dangerous things and for which there are no legitimate userspace use cases. In the past, this would not have been a particularly big deal as it was assumed that root could modify all system state freely, but with Secure Boot and lockdown we need to care about this. We can't fundamentally change the ABI at this point, however we can address this by implementing a filter that checks RTAS calls against a list of permitted calls and forces the caller to use addresses within the RMO buffer. The list is based off the list of calls that are used by the librtas userspace library, and has been tested with a number of existing userspace RTAS utilities. For compatibility with any applications we are not aware of that require other calls, the filter can be turned off at build time. Cc: stable@vger.kernel.org Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200820044512.7543-1-ajd@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29powerpc/tau: Disable TAU between measurementsFinn Thain
[ Upstream commit e63d6fb5637e92725cf143559672a34b706bca4f ] Enabling CONFIG_TAU_INT causes random crashes: Unrecoverable exception 1700 at c0009414 (msr=1000) Oops: Unrecoverable exception, sig: 6 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac-00043-gd5f545e1a8593 #5 NIP: c0009414 LR: c0009414 CTR: c00116fc REGS: c0799eb8 TRAP: 1700 Not tainted (5.7.0-pmac-00043-gd5f545e1a8593) MSR: 00001000 <ME> CR: 22000228 XER: 00000100 GPR00: 00000000 c0799f70 c076e300 00800000 0291c0ac 00e00000 c076e300 00049032 GPR08: 00000001 c00116fc 00000000 dfbd3200 ffffffff 007f80a8 00000000 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c075ce04 GPR24: c075ce04 dfff8880 c07b0000 c075ce04 00080000 00000001 c079ef98 c079ef5c NIP [c0009414] arch_cpu_idle+0x24/0x6c LR [c0009414] arch_cpu_idle+0x24/0x6c Call Trace: [c0799f70] [00000001] 0x1 (unreliable) [c0799f80] [c0060990] do_idle+0xd8/0x17c [c0799fa0] [c0060ba4] cpu_startup_entry+0x20/0x28 [c0799fb0] [c072d220] start_kernel+0x434/0x44c [c0799ff0] [00003860] 0x3860 Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX 3d20c07b XXXXXXXX XXXXXXXX XXXXXXXX 7c0802a6 XXXXXXXX XXXXXXXX XXXXXXXX 4e800421 XXXXXXXX XXXXXXXX XXXXXXXX 7d2000a6 ---[ end trace 3a0c9b5cb216db6b ]--- Resolve this problem by disabling each THRMn comparator when handling the associated THRMn interrupt and by disabling the TAU entirely when updating THRMn thresholds. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5a0ba3dc5612c7aac596727331284a3676c08472.1599260540.git.fthain@telegraphics.com.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/tau: Remove duplicated set_thresholds() callFinn Thain
[ Upstream commit 420ab2bc7544d978a5d0762ee736412fe9c796ab ] The commentary at the call site seems to disagree with the code. The conditional prevents calling set_thresholds() via the exception handler, which appears to crash. Perhaps that's because it immediately triggers another TAU exception. Anyway, calling set_thresholds() from TAUupdate() is redundant because tau_timeout() does so. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d7c7ee33232cf72a6a6bbb6ef05838b2e2b113c0.1599260540.git.fthain@telegraphics.com.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/tau: Use appropriate temperature sample intervalFinn Thain
[ Upstream commit 66943005cc41f48e4d05614e8f76c0ca1812f0fd ] According to the MPC750 Users Manual, the SITV value in Thermal Management Register 3 is 13 bits long. The present code calculates the SITV value as 60 * 500 cycles. This would overflow to give 10 us on a 500 MHz CPU rather than the intended 60 us. (But according to the Microprocessor Datasheet, there is also a factor of 266 that has to be applied to this value on certain parts i.e. speed sort above 266 MHz.) Always use the maximum cycle count, as recommended by the Datasheet. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/896f542e5f0f1d6cf8218524c2b67d79f3d69b3c.1599260540.git.fthain@telegraphics.com.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01powerpc/traps: Make unrecoverable NMIs die instead of panicNicholas Piggin
[ Upstream commit 265d6e588d87194c2fe2d6c240247f0264e0c19b ] System Reset and Machine Check interrupts that are not recoverable due to being nested or interrupting when RI=0 currently panic. This is not necessary, and can often just kill the current context and recover. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01powerpc/eeh: Only dump stack once if an MMIO loop is detectedOliver O'Halloran
[ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ] Many drivers don't check for errors when they get a 0xFFs response from an MMIO load. As a result after an EEH event occurs a driver can get stuck in a polling loop unless it some kind of internal timeout logic. Currently EEH tries to detect and report stuck drivers by dumping a stack trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump would occur every few seconds if the driver was spinning in a loop. This results in a lot of spurious stack traces in the kernel log. Fix this by limiting it to printing one stack trace for each PE freeze. If the driver is truely stuck the kernel's hung task detector is better suited to reporting the probelm anyway. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-23powerpc/dma: Fix dma_map_ops::get_required_maskAlexey Kardashevskiy
commit 437ef802e0adc9f162a95213a3488e8646e5fc03 upstream. There are 2 problems with it: 1. "<" vs expected "<<" 2. the shift number is an IOMMU page number mask, not an address mask as the IOMMU page shift is missing. This did not hit us before f1565c24b596 ("powerpc: use the generic dma_ops_bypass mode") because we had additional code to handle bypass mask so this chunk (almost?) never executed.However there were reports that aacraid does not work with "iommu=nobypass". After f1565c24b596, aacraid (and probably others which call dma_get_required_mask() before setting the mask) was unable to enable 64bit DMA and fall back to using IOMMU which was known not to work, one of the problems is double free of an IOMMU page. This fixes DMA for aacraid, both with and without "iommu=nobypass" in the kernel command line. Verified with "stress-ng -d 4". Fixes: 6a5c7be5e484 ("powerpc: Override dma_get_required_mask by platform hook and ops") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200908015106.79661-1-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()Michael Ellerman
commit 0828137e8f16721842468e33df0460044a0c588b upstream. __init_FSCR() was added originally in commit 2468dcf641e4 ("powerpc: Add support for context switching the TAR register") (Feb 2013), and only set FSCR_TAR. At that point FSCR (Facility Status and Control Register) was not context switched, so the setting was permanent after boot. Later we added initialisation of FSCR_DSCR to __init_FSCR(), in commit 54c9b2253d34 ("powerpc: Set DSCR bit in FSCR setup") (Mar 2013), again that was permanent after boot. Then commit 2517617e0de6 ("powerpc: Fix context switch DSCR on POWER8") (Aug 2013) added a limited context switch of FSCR, just the FSCR_DSCR bit was context switched based on thread.dscr_inherit. That commit said "This clears the H/FSCR DSCR bit initially", but it didn't, it left the initialisation of FSCR_DSCR in __init_FSCR(). However the initial context switch from init_task to pid 1 would clear FSCR_DSCR because thread.dscr_inherit was 0. That commit also introduced the requirement that FSCR_DSCR be clear for user processes, so that we can take the facility unavailable interrupt in order to manage dscr_inherit. Then in commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") (Dec 2015) FSCR was added to thread_struct. However it still wasn't fully context switched, we just took the existing value and set FSCR_DSCR if the new thread had dscr_inherit set. FSCR was still initialised at boot to FSCR_DSCR | FSCR_TAR, but that value was not propagated into the thread_struct, so the initial context switch set FSCR_DSCR back to 0. Finally commit b57bd2de8c6c ("powerpc: Improve FSCR init and context switching") (Jun 2016) added a full context switch of the FSCR, and added an initialisation of init_task.thread.fscr to FSCR_TAR | FSCR_EBB, but omitted FSCR_DSCR. The end result is that swapper runs with FSCR_DSCR set because of the initialisation in __init_FSCR(), but no other processes do, they use the value from init_task.thread.fscr. Having FSCR_DSCR set for swapper allows it to access SPR 3 from userspace, but swapper never runs userspace, so it has no useful effect. It's also confusing to have the value initialised in two places to two different values. So remove FSCR_DSCR from __init_FSCR(), this at least gets us to the point where there's a single value of FSCR, even if it's still set in two places. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200527145843.2761782-1-mpe@ellerman.id.au Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21powerpc/vdso: Fix vdso cpu truncationMilton Miller
[ Upstream commit a9f675f950a07d5c1dbcbb97aabac56f5ed085e3 ] The code in vdso_cpu_init that exposes the cpu and numa node to userspace via SPRG_VDSO incorrctly masks the cpu to 12 bits. This means that any kernel running on a box with more than 4096 threads (NR_CPUS advertises a limit of of 8192 cpus) would expose userspace to two cpu contexts running at the same time with the same cpu number. Note: I'm not aware of any distro shipping a kernel with support for more than 4096 threads today, nor of any system image that currently exceeds 4096 threads. Found via code browsing. Fixes: 18ad51dd342a7eb09dbcd059d0b451b616d4dafc ("powerpc: Add VDSO version of getcpu") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Anton Blanchard <anton@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715233704.1352257-1-anton@ozlabs.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-25powerpc/crashkernel: Take "mem=" option into accountPingfan Liu
[ Upstream commit be5470e0c285a68dc3afdea965032f5ddc8269d7 ] 'mem=" option is an easy way to put high pressure on memory during some test. Hence after applying the memory limit, instead of total mem, the actual usable memory should be considered when reserving mem for crashkernel. Otherwise the boot up may experience OOM issue. E.g. it would reserve 4G prior to the change and 512M afterward, if passing crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and mem=5G on a 256G machine. This issue is powerpc specific because it puts higher priority on fadump and kdump reservation than on "mem=". Referring the following code: if (fadump_reserve_mem() == 0) reserve_crashkernel(); ... /* Ensure that total memory size is page-aligned. */ limit = ALIGN(memory_limit ?: memblock_phys_mem_size(), PAGE_SIZE); memblock_enforce_memory_limit(limit); While on other arches, the effect of "mem=" takes a higher priority and pass through memblock_phys_mem_size() before calling reserve_crashkernel(). Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1585749644-4148-1-git-send-email-kernelfans@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20powerpc/64s: Save FSCR to init_task.thread.fscr after feature initMichael Ellerman
commit 912c0a7f2b5daa3cbb2bc10f303981e493de73bd upstream. At boot the FSCR is initialised via one of two paths. On most systems it's set to a hard coded value in __init_FSCR(). On newer skiboot systems we use the device tree CPU features binding, where firmware can tell Linux what bits to set in FSCR (and HFSCR). In both cases the value that's configured at boot is not propagated into the init_task.thread.fscr value prior to the initial fork of init (pid 1), which means the value is not used by any processes other than swapper (the idle task). For the __init_FSCR() case this is OK, because the value in init_task.thread.fscr is initialised to something sensible. However it does mean that the value set in __init_FSCR() is not used other than for swapper, which is odd and confusing. The bigger problem is for the device tree CPU features case it prevents firmware from setting (or clearing) FSCR bits for use by user space. This means all existing kernels can not have features enabled/disabled by firmware if those features require setting/clearing FSCR bits. We can handle both cases by saving the FSCR value into init_task.thread.fscr after we have initialised it at boot. This fixes the bug for device tree CPU features, and will allow us to simplify the initialisation for the __init_FSCR() case in a future patch. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200527145843.2761782-3-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-20powerpc/64s: Don't let DT CPU features set FSCR_DSCRMichael Ellerman
commit 993e3d96fd08c3ebf7566e43be9b8cd622063e6d upstream. The device tree CPU features binding includes FSCR bit numbers which Linux is instructed to set by firmware. Whether that's a good idea or not, in the case of the DSCR the Linux implementation has a hard requirement that the FSCR_DSCR bit not be set by default. We use it to track when a process reads/writes to DSCR, so it must be clear to begin with. So if firmware tells us to set FSCR_DSCR we must ignore it. Currently this does not cause a bug in our DSCR handling because the value of FSCR that the device tree CPU features code establishes is only used by swapper. All other tasks use the value hard coded in init_task.thread.fscr. However we'd like to fix that in a future commit, at which point this will become necessary. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200527145843.2761782-2-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-10powerpc/pci/of: Parse unassigned resourcesAlexey Kardashevskiy
commit dead1c845dbe97e0061dae2017eaf3bd8f8f06ee upstream. The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing which reads "assigned-addresses" of every PCI device and initializes the device resources. However if the property is missing or zero sized, then there is no fallback of any kind and the PCI resources remain undiscovered, i.e. pdev->resource[] array remains empty. This adds a fallback which parses the "reg" property in pretty much same way except it marks resources as "unset" which later make Linux assign those resources proper addresses. This has an effect when: 1. a hypervisor failed to assign any resource for a device; 2. /chosen/linux,pci-probe-only=0 is in the DT so the system may try assigning a resource. Neither is likely to happen under PowerVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02powerpc/setup_64: Set cache-line-size based on cache-block-sizeChris Packham
commit 94c0b013c98583614e1ad911e8795ca36da34a85 upstream. If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not, use the block-size value for both. Per the devicetree spec cache-line-size is only needed if it differs from the block size. Originally the code would fallback from block size to line size. An error message was printed if both properties were missing. Later the code was refactored to use clearer names and logic but it inadvertently made line size a required property, meaning on systems without a line size property we fall back to the default from the cputable. On powernv (OPAL) platforms, since the introduction of device tree CPU features (5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")), that has led to the wrong value being used, as the fallback value is incorrect for Power8/Power9 CPUs. The incorrect values flow through to the VDSO and also to the sysconf values, SC_LEVEL1_ICACHE_LINESIZE etc. Fixes: bd067f83b084 ("powerpc/64: Fix naming of cache block vs. cache line") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reported-by: Qian Cai <cai@lca.pw> [mpe: Add even more detail to change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200416221908.7886-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24powerpc: Make setjmp/longjmp signature standardClement Courbet
commit c17eb4dca5a353a9dbbb8ad6934fe57af7165e91 upstream. Declaring setjmp()/longjmp() as taking longs makes the signature non-standard, and makes clang complain. In the past, this has been worked around by adding -ffreestanding to the compile flags. The implementation looks like it only ever propagates the value (in longjmp) or sets it to 1 (in setjmp), and we only call longjmp with integer parameters. This allows removing -ffreestanding from the compilation flags. Fixes: c9029ef9c957 ("powerpc: Avoid clang warnings around setjmp and longjmp") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Clement Courbet <courbet@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200330080400.124803-1-courbet@google.com Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24powerpc/kprobes: Ignore traps that happened in real modeChristophe Leroy
commit 21f8b2fa3ca5b01f7a2b51b89ce97a3705a15aa0 upstream. When a program check exception happens while MMU translation is disabled, following Oops happens in kprobe_handler() in the following code: } else if (*addr != BREAKPOINT_INSTRUCTION) { BUG: Unable to handle kernel data access on read at 0x0000e268 Faulting instruction address: 0xc000ec34 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=16K PREEMPT CMPC885 Modules linked in: CPU: 0 PID: 429 Comm: cat Not tainted 5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267 NIP: c000ec34 LR: c000ecd8 CTR: c019cab8 REGS: ca4d3b58 TRAP: 0300 Not tainted (5.6.0-rc1-s3k-dev-00824-g84195dc6c58a) MSR: 00001032 <ME,IR,DR,RI> CR: 2a4d3c52 XER: 00000000 DAR: 0000e268 DSISR: c0000000 GPR00: c000b09c ca4d3c10 c66d0620 00000000 ca4d3c60 00000000 00009032 00000000 GPR08: 00020000 00000000 c087de44 c000afe0 c66d0ad0 100d3dd6 fffffff3 00000000 GPR16: 00000000 00000041 00000000 ca4d3d70 00000000 00000000 0000416d 00000000 GPR24: 00000004 c53b6128 00000000 0000e268 00000000 c07c0000 c07bb6fc ca4d3c60 NIP [c000ec34] kprobe_handler+0x128/0x290 LR [c000ecd8] kprobe_handler+0x1cc/0x290 Call Trace: [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4 --- interrupt: 700 at 0xe268 Instruction dump: 913e0008 81220000 38600001 3929ffff 91220000 80010024 bb410008 7c0803a6 38210020 4e800020 38600000 4e800020 <813b0000> 6d2a7fe0 2f8a0008 419e0154 ---[ end trace 5b9152d4cdadd06d ]--- kprobe is not prepared to handle events in real mode and functions running in real mode should have been blacklisted, so kprobe_handler() can safely bail out telling 'this trap is not mine' for any trap that happened while in real-mode. If the trap happened with MSR_IR or MSR_DR cleared, return 0 immediately. Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Fixes: 6cc89bad60a6 ("powerpc/kprobes: Invoke handlers directly") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/424331e2006e7291a1bfe40e7f3fa58825f565e1.1582054578.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24powerpc/64/tm: Don't let userspace set regs->trap via sigreturnMichael Ellerman
commit c7def7fbdeaa25feaa19caf4a27c5d10bd8789e4 upstream. In restore_tm_sigcontexts() we take the trap value directly from the user sigcontext with no checking: err |= __get_user(regs->trap, &sc->gp_regs[PT_TRAP]); This means we can be in the kernel with an arbitrary regs->trap value. Although that's not immediately problematic, there is a risk we could trigger one of the uses of CHECK_FULL_REGS(): #define CHECK_FULL_REGS(regs) BUG_ON(regs->trap & 1) It can also cause us to unnecessarily save non-volatile GPRs again in save_nvgprs(), which shouldn't be problematic but is still wrong. It's also possible it could trick the syscall restart machinery, which relies on regs->trap not being == 0xc00 (see 9a81c16b5275 ("powerpc: fix double syscall restarts")), though I haven't been able to make that happen. Finally it doesn't match the behaviour of the non-TM case, in restore_sigcontext() which zeroes regs->trap. So change restore_tm_sigcontexts() to zero regs->trap. This was discovered while testing Nick's upcoming rewrite of the syscall entry path. In that series the call to save_nvgprs() prior to signal handling (do_notify_resume()) is removed, which leaves the low-bit of regs->trap uncleared which can then trigger the FULL_REGS() WARNs in setup_tm_sigcontexts(). Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200401023836.3286664-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idleMichael Ellerman
commit 53a712bae5dd919521a58d7bad773b949358add0 upstream. In order to implement KUAP (Kernel Userspace Access Protection) on Power9 we will be using the AMR, and therefore indirectly the UAMOR/AMOR. So save/restore these regs in the idle code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [ajd: Backport to 4.14 tree, CVE-2020-11669] Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02powerpc: Include .BTF sectionNaveen N. Rao
[ Upstream commit cb0cc635c7a9fa8a3a0f75d4d896721819c63add ] Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld: ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF' Include .BTF section in vmlinux explicitly to fix the same. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-11powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systemsDesnes A. Nunes do Rosario
commit fc37a1632d40c80c067eb1bc235139f5867a2667 upstream. PowerVM systems running compatibility mode on a few Power8 revisions are still vulnerable to the hardware defect that loses PMU exceptions arriving prior to a context switch. The software fix for this issue is enabled through the CPU_FTR_PMAO_BUG cpu_feature bit, nevertheless this bit also needs to be set for PowerVM compatibility mode systems. Fixes: 68f2f0d431d9ea4 ("powerpc: Add a cpu feature CPU_FTR_PMAO_BUG") Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com> Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200227134715.9715-1-desnesn@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOVOliver O'Halloran
[ Upstream commit 1fb4124ca9d456656a324f1ee29b7bf942f59ac8 ] When disabling virtual functions on an SR-IOV adapter we currently do not correctly remove the EEH state for the now-dead virtual functions. When removing the pci_dn that was created for the VF when SR-IOV was enabled we free the corresponding eeh_dev without removing it from the child device list of the eeh_pe that contained it. This can result in crashes due to the use-after-free. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-1-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-28powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE numberOliver O'Halloran
[ Upstream commit 3b5b9997b331e77ce967eba2c4bc80dc3134a7fe ] On pseries there is a bug with adding hotplugged devices to an IOMMU group. For a number of dumb reasons fixing that bug first requires re-working how VFs are configured on PowerNV. For background, on PowerNV we use the pcibios_sriov_enable() hook to do two things: 1. Create a pci_dn structure for each of the VFs, and 2. Configure the PHB's internal BARs so the MMIO range for each VF maps to a unique PE. Roughly speaking a PE is the hardware counterpart to a Linux IOMMU group since all the devices in a PE share the same IOMMU table. A PE also defines the set of devices that should be isolated in response to a PCI error (i.e. bad DMA, UR/CA, AER events, etc). When isolated all MMIO and DMA traffic to and from devicein the PE is blocked by the root complex until the PE is recovered by the OS. The requirement to block MMIO causes a giant headache because the P8 PHB generally uses a fixed mapping between MMIO addresses and PEs. As a result we need to delay configuring the IOMMU groups for device until after MMIO resources are assigned. For physical devices (i.e. non-VFs) the PE assignment is done in pcibios_setup_bridge() which is called immediately after the MMIO resources for downstream devices (and the bridge's windows) are assigned. For VFs the setup is more complicated because: a) pcibios_setup_bridge() is not called again when VFs are activated, and b) The pci_dev for VFs are created by generic code which runs after pcibios_sriov_enable() is called. The work around for this is a two step process: 1. A fixup in pcibios_add_device() is used to initialised the cached pe_number in pci_dn, then 2. A bus notifier then adds the device to the IOMMU group for the PE specified in pci_dn->pe_number. A side effect fixing the pseries bug mentioned in the first paragraph is moving the fixup out of pcibios_add_device() and into pcibios_bus_add_device(), which is called much later. This results in step 2. failing because pci_dn->pe_number won't be initialised when the bus notifier is run. We can fix this by removing the need for the fixup. The PE for a VF is known before the VF is even scanned so we can initialise pci_dn->pe_number pcibios_sriov_enable() instead. Unfortunately, moving the initialisation causes two problems: 1. We trip the WARN_ON() in the current fixup code, and 2. The EEH core clears pdn->pe_number when recovering a VF and relies on the fixup to correctly re-set it. The only justification for either of these is a comment in eeh_rmv_device() suggesting that pdn->pe_number *must* be set to IODA_INVALID_PE in order for the VF to be scanned. However, this comment appears to have no basis in reality. Both bugs can be fixed by just deleting the code. Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191028085424.12006-1-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuildNathan Lynch
[ Upstream commit d4aa219a074a5abaf95a756b9f0d190b5c03a945 ] Allow external callers to force the cacheinfo code to release all its references to cache nodes, e.g. before processing device tree updates post-migration, and to rebuild the hierarchy afterward. CPU online/offline must be blocked by callers; enforce this. Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27powerpc/64s: Fix logic when handling unknown CPU featuresMichael Ellerman
[ Upstream commit 8cfaf106918a8c13abb24c641556172afbb9545c ] In cpufeatures_process_feature(), if a provided CPU feature is unknown and enable_unknown is false, we erroneously print that the feature is being enabled and return true, even though no feature has been enabled, and may also set feature bits based on the last entry in the match table. Fix this so that we only set feature bits from the match table if we have actually enabled a feature from that table, and when failing to enable an unknown feature, always print the "not enabling" message and return false. Coincidentally, some older gccs (<GCC 7), when invoked with -fsanitize-coverage=trace-pc, cause a spurious uninitialised variable warning in this function: arch/powerpc/kernel/dt_cpu_ftrs.c: In function ‘cpufeatures_process_feature’: arch/powerpc/kernel/dt_cpu_ftrs.c:686:7: warning: ‘m’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (m->cpu_ftr_bit_mask) An upcoming patch will enable support for kcov, which requires this option. This patch avoids the warning. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Reported-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [ajd: add commit message] Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/security: Fix wrong message when RFI Flush is disableGustavo L. F. Walbon
[ Upstream commit 4e706af3cd8e1d0503c25332b30cad33c97ed442 ] The issue was showing "Mitigation" message via sysfs whatever the state of "RFI Flush", but it should show "Vulnerable" when it is disabled. If you have "L1D private" feature enabled and not "RFI Flush" you are vulnerable to meltdown attacks. "RFI Flush" is the key feature to mitigate the meltdown whatever the "L1D private" state. SEC_FTR_L1D_THREAD_PRIV is a feature for Power9 only. So the message should be as the truth table shows: CPU | L1D private | RFI Flush | sysfs ----|-------------|-----------|------------------------------------- P9 | False | False | Vulnerable P9 | False | True | Mitigation: RFI Flush P9 | True | False | Vulnerable: L1D private per thread P9 | True | True | Mitigation: RFI Flush, L1D private per thread P8 | False | False | Vulnerable P8 | False | True | Mitigation: RFI Flush Output before this fix: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: RFI Flush, L1D private per thread # echo 0 > /sys/kernel/debug/powerpc/rfi_flush # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: L1D private per thread Output after fix: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: RFI Flush, L1D private per thread # echo 0 > /sys/kernel/debug/powerpc/rfi_flush # cat /sys/devices/system/cpu/vulnerabilities/meltdown Vulnerable: L1D private per thread Signed-off-by: Gustavo L. F. Walbon <gwalbon@linux.ibm.com> Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190502210907.42375-1-gwalbon@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/security/book3s64: Report L1TF status in sysfsAnthony Steinhauser
[ Upstream commit 8e6b6da91ac9b9ec5a925b6cb13f287a54bd547d ] Some PowerPC CPUs are vulnerable to L1TF to the same extent as to Meltdown. It is also mitigated by flushing the L1D on privilege transition. Currently the sysfs gives a false negative on L1TF on CPUs that I verified to be vulnerable, a Power9 Talos II Boston 004e 1202, PowerNV T2P9D01. Signed-off-by: Anthony Steinhauser <asteinhauser@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Just have cpu_show_l1tf() call cpu_show_meltdown() directly] Link: https://lore.kernel.org/r/20191029190759.84821-1-asteinhauser@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/pseries: Mark accumulate_stolen_time() as notraceMichael Ellerman
[ Upstream commit eb8e20f89093b64f48975c74ccb114e6775cee22 ] accumulate_stolen_time() is called prior to interrupt state being reconciled, which can trip the warning in arch_local_irq_restore(): WARNING: CPU: 5 PID: 1017 at arch/powerpc/kernel/irq.c:258 .arch_local_irq_restore+0x9c/0x130 ... NIP .arch_local_irq_restore+0x9c/0x130 LR .rb_start_commit+0x38/0x80 Call Trace: .ring_buffer_lock_reserve+0xe4/0x620 .trace_function+0x44/0x210 .function_trace_call+0x148/0x170 .ftrace_ops_no_ops+0x180/0x1d0 ftrace_call+0x4/0x8 .accumulate_stolen_time+0x1c/0xb0 decrementer_common+0x124/0x160 For now just mark it as notrace. We may change the ordering to call it after interrupt state has been reconciled, but that is a larger change. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024055932.27940-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31powerpc/irq: fix stack overflow verificationChristophe Leroy
commit 099bc4812f09155da77eeb960a983470249c9ce1 upstream. Before commit 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack"), check_stack_overflow() was called by do_IRQ(), before switching to the irq stack. In that commit, do_IRQ() was renamed __do_irq(), and is now executing on the irq stack, so check_stack_overflow() has just become almost useless. Move check_stack_overflow() call in do_IRQ() to do the check while still on the current stack. Fixes: 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e033aa8116ab12b7ca9a9c75189ad0741e3b9b5f.1575872340.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17powerpc: Fix vDSO clock_getres()Vincenzo Frascino
[ Upstream commit 552263456215ada7ee8700ce022d12b0cffe4802 ] clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. Fix the powerpc vdso implementation of clock_getres keeping a copy of hrtimer_resolution in vdso data and using that directly. Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Cc: stable@vger.kernel.org Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Shuah Khan <skhan@linuxfoundation.org> [chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a55eca3a5e85233838c2349783bcb5164dae1d09.1575273217.git.christophe.leroy@c-s.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17powerpc: Avoid clang warnings around setjmp and longjmpNathan Chancellor
[ Upstream commit c9029ef9c95765e7b63c4d9aa780674447db1ec0 ] Commit aea447141c7e ("powerpc: Disable -Wbuiltin-requires-header when setjmp is used") disabled -Wbuiltin-requires-header because of a warning about the setjmp and longjmp declarations. r367387 in clang added another diagnostic around this, complaining that there is no jmp_buf declaration. In file included from ../arch/powerpc/xmon/xmon.c:47: ../arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of built-in function 'setjmp' requires the declaration of the 'jmp_buf' type, commonly provided in the header <setjmp.h>. [-Werror,-Wincomplete-setjmp-declaration] extern long setjmp(long *); ^ ../arch/powerpc/include/asm/setjmp.h:11:13: error: declaration of built-in function 'longjmp' requires the declaration of the 'jmp_buf' type, commonly provided in the header <setjmp.h>. [-Werror,-Wincomplete-setjmp-declaration] extern void longjmp(long *, long); ^ 2 errors generated. We are not using the standard library's longjmp/setjmp implementations for obvious reasons; make this clear to clang by using -ffreestanding on these files. Cc: stable@vger.kernel.org # 4.14+ Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191119045712.39633-3-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17powerpc: Allow flush_icache_range to work across ranges >4GBAlastair D'Silva
commit 29430fae82073d39b1b881a3cd507416a56a363f upstream. When calling flush_icache_range with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191104023305.9581-2-alastair@au1.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GBAlastair D'Silva
commit f9ec11165301982585e5e5f606739b5bae5331f3 upstream. When calling __kernel_sync_dicache with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191104023305.9581-3-alastair@au1.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05powerpc/83xx: handle machine check caused by watchdog timerChristophe Leroy
[ Upstream commit 0deae39cec6dab3a66794f3e9e83ca4dc30080f1 ] When the watchdog timer is set in interrupt mode, it causes a machine check when it times out. The purpose of this mode is to ease debugging, not to crash the kernel and reboot the machine. This patch implements a special handling for that, in order to not crash the kernel if the watchdog times out while in interrupt or within the idle task. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: added missing #include] Signed-off-by: Scott Wood <oss@buserror.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05powerpc/prom: fix early DEBUG messagesChristophe Leroy
[ Upstream commit b18f0ae92b0a1db565c3e505fa87b6971ad3b641 ] This patch fixes early DEBUG messages in prom.c: - Use %px instead of %p to see the addresses - Cast memblock_phys_mem_size() with (unsigned long long) to avoid build failure when phys_addr_t is not 64 bits. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernelMichael Ellerman
commit af2e8c68b9c5403f77096969c516f742f5bb29e0 upstream. On some systems that are vulnerable to Spectre v2, it is up to software to flush the link stack (return address stack), in order to protect against Spectre-RSB. When exiting from a guest we do some house keeping and then potentially exit to C code which is several stack frames deep in the host kernel. We will then execute a series of returns without preceeding calls, opening up the possiblity that the guest could have poisoned the link stack, and direct speculative execution of the host to a gadget of some sort. To prevent this we add a flush of the link stack on exit from a guest. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [dja: straightforward backport to v4.14] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01powerpc/book3s64: Fix link stack flush on context switchMichael Ellerman
commit 39e72bf96f5847ba87cc5bd7a3ce0fed813dc9ad upstream. In commit ee13cb249fab ("powerpc/64s: Add support for software count cache flush"), I added support for software to flush the count cache (indirect branch cache) on context switch if firmware told us that was the required mitigation for Spectre v2. As part of that code we also added a software flush of the link stack (return address stack), which protects against Spectre-RSB between user processes. That is all correct for CPUs that activate that mitigation, which is currently Power9 Nimbus DD2.3. What I got wrong is that on older CPUs, where firmware has disabled the count cache, we also need to flush the link stack on context switch. To fix it we create a new feature bit which is not set by firmware, which tells us we need to flush the link stack. We set that when firmware tells us that either of the existing Spectre v2 mitigations are enabled. Then we adjust the patching code so that if we see that feature bit we enable the link stack flush. If we're also told to flush the count cache in software then we fall through and do that also. On the older CPUs we don't need to do do the software count cache flush, firmware has disabled it, so in that case we patch in an early return after the link stack flush. The naming of some of the functions is awkward after this patch, because they're called "count cache" but they also do link stack. But we'll fix that up in a later commit to ease backporting. This is the fix for CVE-2019-18660. Reported-by: Anthony Steinhauser <asteinhauser@google.com> Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [dja: straightforward backport to v4.14] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01powerpc/64s: support nospectre_v2 cmdline optionChristopher M. Riedl
commit d8f0e0b073e1ec52a05f0c2a56318b47387d2f10 upstream. Add support for disabling the kernel implemented spectre v2 mitigation (count cache flush on context switch) via the nospectre_v2 and mitigations=off cmdline options. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190524024647.381-1-cmr@informatik.wtf Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01powerpc/process: Fix flush_all_to_thread for SPEFelipe Rechia
[ Upstream commit e901378578c62202594cba0f6c076f3df365ec91 ] Fix a bug introduced by the creation of flush_all_to_thread() for processors that have SPE (Signal Processing Engine) and use it to compute floating-point operations. >From userspace perspective, the problem was seen in attempts of computing floating-point operations which should generate exceptions. For example: fork(); float x = 0.0 / 0.0; isnan(x); // forked process returns False (should be True) The operation above also should always cause the SPEFSCR FINV bit to be set. However, the SPE floating-point exceptions were turned off after a fork(). Kernel versions prior to the bug used flush_spe_to_thread(), which first saves SPEFSCR register values in tsk->thread and then calls giveup_spe(tsk). After commit 579e633e764e, the save_all() function was called first to giveup_spe(), and then the SPEFSCR register values were saved in tsk->thread. This would save the SPEFSCR register values after disabling SPE for that thread, causing the bug described above. Fixes 579e633e764e ("powerpc: create flush_all_to_thread()") Signed-off-by: Felipe Rechia <felipe.rechia@datacom.com.br> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>