Age | Commit message (Collapse) | Author |
|
commit f43c27188a49111b58e9611afa2f0365b0b55625 upstream.
On arm64 the TTBR0_EL1 register is set to either the reserved TTBR0
page tables on boot or to the active_mm mappings belonging to user space
processes, it must never be set to swapper_pg_dir page tables mappings.
When a CPU is booted its active_mm is set to init_mm even though its
TTBR0_EL1 points at the reserved TTBR0 page mappings. This implies
that when __cpu_suspend is triggered the active_mm can point at
init_mm even if the current TTBR0_EL1 register contains the reserved
TTBR0_EL1 mappings.
Therefore, the mm save and restore executed in __cpu_suspend might
turn out to be erroneous in that, if the current->active_mm corresponds
to init_mm, on resume from low power it ends up restoring in the
TTBR0_EL1 the init_mm mappings that are global and can cause speculation
of TLB entries which end up being propagated to user space.
This patch fixes the issue by checking the active_mm pointer before
restoring the TTBR0 mappings. If the current active_mm == &init_mm,
the code sets the TTBR0_EL1 to the reserved TTBR0 mapping instead of
switching back to the active_mm, which is the expected behaviour
corresponding to the TTBR0_EL1 settings when __cpu_suspend was entered.
Fixes: 95322526ef62 ("arm64: kernel: cpu_{suspend/resume} implementation")
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c3684fbb446501b48dec6677a6a9f61c215053de upstream.
The function cpu_resume currently lives in the .data section.
There's no reason for it to be there since we can use relative
instructions without a problem. Move a few cpu_resume data
structures out of the assembly file so the .data annotation
can be dropped completely and cpu_resume ends up in the read
only text section.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 714f59925595b9c2ea9c22b107b340d38e3b3bc9 upstream.
CPU suspend is the standard kernel interface to be used to enter
low-power states on ARM64 systems. Current cpu_suspend implementation
by default assumes that all low power states are losing the CPU context,
so the CPU registers must be saved and cleaned to DRAM upon state
entry. Furthermore, the current cpu_suspend() implementation assumes
that if the CPU suspend back-end method returns when called, this has
to be considered an error regardless of the return code (which can be
successful) since the CPU was not expected to return from a code path that
is different from cpu_resume code path - eg returning from the reset vector.
All in all this means that the current API does not cope well with low-power
states that preserve the CPU context when entered (ie retention states),
since first of all the context is saved for nothing on state entry for
those states and a successful state entry can return as a normal function
return, which is considered an error by the current CPU suspend
implementation.
This patch refactors the cpu_suspend() API so that it can be split in
two separate functionalities. The arm64 cpu_suspend API just provides
a wrapper around CPU suspend operation hook. A new function is
introduced (for architecture code use only) for states that require
context saving upon entry:
__cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
__cpu_suspend() saves the context on function entry and calls the
so called suspend finisher (ie fn) to complete the suspend operation.
The finisher is not expected to return, unless it fails in which case
the error is propagated back to the __cpu_suspend caller.
The API refactoring results in the following pseudo code call sequence for a
suspending CPU, when triggered from a kernel subsystem:
/*
* int cpu_suspend(unsigned long idx)
* @idx: idle state index
*/
{
-> cpu_suspend(idx)
|---> CPU operations suspend hook called, if present
|--> if (retention_state)
|--> direct suspend back-end call (eg PSCI suspend)
else
|--> __cpu_suspend(idx, &back_end_finisher);
}
By refactoring the cpu_suspend API this way, the CPU operations back-end
has a chance to detect whether idle states require state saving or not
and can call the required suspend operations accordingly either through
simple function call or indirectly through __cpu_suspend() which carries out
state saving and suspend finisher dispatching to complete idle state entry.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 18ab7db6b749ac27aac08d572afbbd2f4d937934 upstream.
Suspend init function must be marked as __init, since it is not needed
after the kernel has booted. This patch moves the cpu_suspend_init()
function to the __init section.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7d57511d2dba03a8046c8b428dd9192a4bfc1e73 upstream.
Commit a469abd0f868 (ARM: elf: add new hwcap for identifying atomic
ldrd/strd instructions) introduces HWCAP_ELF for 32-bit ARM
applications. As LPAE is always present on arm64, report the
corresponding compat HWCAP to user space.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 899d5933b2dd2720f2b20b01eaa07871aa6ad096 upstream.
When experimenting with patches to provide kprobes support for aarch64
smp machines would hang when inserting breakpoints into kernel code.
The hangs were caused by a race condition in the code called by
aarch64_insn_patch_text_sync(). The first processor in the
aarch64_insn_patch_text_cb() function would patch the code while other
processors were still entering the function and incrementing the
cpu_count field. This resulted in some processors never observing the
exit condition and exiting the function. Thus, processors in the
system hung.
The first processor to enter the patching function performs the
patching and signals that the patching is complete with an increment
of the cpu_count field. When all the processors have incremented the
cpu_count field the cpu_count will be num_cpus_online()+1 and they
will return to normal execution.
Fixes: ae16480785de arm64: introduce interfaces to hotpatch kernel and module code
Signed-off-by: William Cohen <wcohen@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 97fc15436b36ee3956efad83e22a557991f7d19d upstream.
ARM64 currently doesn't fix up faults on the single-byte (strb) case of
__clear_user... which means that we can cause a nasty kernel panic as an
ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero.
i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.)
This is a pretty obscure bug in the general case since we'll only
__do_kernel_fault (since there's no extable entry for pc) if the
mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll
always fault.
if (!down_read_trylock(&mm->mmap_sem)) {
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
retry:
down_read(&mm->mmap_sem);
} else {
/*
* The above down_read_trylock() might have succeeded in
* which
* case, we'll have missed the might_sleep() from
* down_read().
*/
might_sleep();
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
}
Fix that by adding an extable entry for the strb instruction, since it
touches user memory, similar to the other stores in __clear_user.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Reported-by: Miloš Prchlík <mprchlik@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 971a5b6fe634bb7b617d8c5f25b6a3ddbc600194 upstream.
The compat_elf_prpsinfo structure does not match the arch/arm struct
elf_pspsinfo definition. As result NT_PRPSINFO note in core file
created by arm64 kernel for aarch32 (compat) process has wrong size.
So gdb cannot display command that caused process crash.
Fix is to change size of __compat_uid_t, __compat_gid_t so it would
match size of similar fields in arch/arm case.
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 27d7ff273c2aad37b28f6ff0cab2cfa35b51e648 upstream.
I'm not sure what I was on when I wrote this, but when iterating over
the hardware watchpoint array (hbp_watch_array), our index is off by
ARM_MAX_BRP, so we walk off the end of our thread_struct...
... except, a dodgy condition in the loop means that it never executes
at all (bp cannot be NULL).
This patch fixes the code so that we remove the bp check and use the
correct index for accessing the watchpoint structures.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f6edbbf36da3a27b298b66c7955fc84e1dcca305 upstream.
X-Gene u-boot runs in EL2 mode with MMU enabled hence we might
have stale EL2 tlb enteris when we enable EL2 MMU on each host CPU.
This can happen on any ARM/ARM64 board running bootloader in
Hyp-mode (or EL2-mode) with MMU enabled.
This patch ensures that we flush all Hyp-mode (or EL2-mode) TLBs
on each host CPU before enabling Hyp-mode (or EL2-mode) MMU.
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 05e0127f9e362b36aa35f17b1a3d52bca9322a3a upstream.
The architecture specifies that when the processor wakes up from a WFE
or WFI instruction, the instruction is considered complete, however we
currrently return to EL1 (or EL0) at the WFI/WFE instruction itself.
While most guests may not be affected by this because their local
exception handler performs an exception returning setting the event bit
or with an interrupt pending, some guests like UEFI will get wedged due
this little mishap.
Simply skip the instruction when we have completed the emulation.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3d8afe3099ebc602848aa7f09235cce3a9a023ce upstream.
The arm64 interrupt migration code on cpu offline calls
irqchip.irq_set_affinity() with the argument force=true. Originally
this argument had no effect because it was not used by any interrupt
chip driver and there was no semantics defined.
This changed with commit 01f8fa4f01d8 ("genirq: Allow forcing cpu
affinity of interrupts") which made the force argument useful to route
interrupts to not yet online cpus without checking the target cpu
against the cpu online mask. The following commit ffde1de64012
("irqchip: gic: Support forced affinity setting") implemented this for
the GIC interrupt controller.
As a consequence the cpu offline irq migration fails if CPU0 is
offlined, because CPU0 is still set in the affinity mask and the
validation against cpu online mask is skipped to the force argument
being true. The following first_cpu(mask) selection always selects
CPU0 as the target.
Commit 601c942176d8("arm64: use cpu_online_mask when using forced
irq_set_affinity") intended to fix the above mentioned issue but
introduced another issue where affinity can be migrated to a wrong
CPU due to unconditional copy of cpu_online_mask.
As with for arm, solve the issue by calling irq_set_affinity() with
force=false from the CPU offline irq migration code so the GIC driver
validates the affinity mask against CPU online mask and therefore
removes CPU0 from the possible target candidates. Also revert the
changes done in the commit 601c942176d8 as it's no longer needed.
Tested on Juno platform.
Fixes: 601c942176d8("arm64: use cpu_online_mask when using forced
irq_set_affinity")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit eb35bdd7bca29a13c8ecd44e6fd747a84ce675db upstream.
Nathan reports that we leak TLS information from the parent context
during an exec, as we don't clear the TLS registers when flushing the
thread state.
This patch updates the flushing code so that we:
(1) Unconditionally zero the tpidr_el0 register (since this is fully
context switched for native tasks and zeroed for compat tasks)
(2) Zero the tp_value state in thread_info before clearing the
tpidrr0_el0 register for compat tasks (since this is only writable
by the set_tls compat syscall and therefore not fully switched).
A missing compiler barrier is also added to the compat set_tls syscall.
Acked-by: Nathan Lynch <Nathan_Lynch@mentor.com>
Reported-by: Nathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream.
The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.
There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.
Opt in for known good archs.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fa2ec3ea10bd377f9d55772b1dab65178425a1a2 upstream.
include/linux/sched.h implements TASK_SIZE_OF as TASK_SIZE if it
is not set by the architecture headers. TASK_SIZE uses the
current task to determine the size of the virtual address space.
On a 64-bit kernel this will cause reading /proc/pid/pagemap of a
64-bit process from a 32-bit process to return EOF when it reads
past 0xffffffff.
Implement TASK_SIZE_OF exactly the same as TASK_SIZE with
test_tsk_thread_flag instead of test_thread_flag.
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3906c2b53cd23c2ae03e6ce41432c8e7f0a3cbbb upstream.
The value of ESR has been stored into x1, and should be directly pass to
do_sp_pc_abort function, "MOV x1, x25" is an extra operation and do_sp_pc_abort
will get the wrong value of ESR.
Signed-off-by: ChiaHao <andy.jhshiu@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 923b8f5044da753e4985ab15c1374ced2cdf616c upstream.
The __sync_icache_dcache routine will only flush the dcache for the
first page of a compound page, potentially leading to stale icache
data residing further on in a hugetlb page.
This patch addresses this issue by taking into consideration the
order of the page when flushing the dcache.
Reported-by: Mark Brown <broonie@linaro.org>
Tested-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f3a183cb422574014538017b5b291a416396f97e upstream.
Arm64 does not define dma_get_required_mask() function.
Therefore, it should not define the ARCH_HAS_DMA_GET_REQUIRED_MASK.
This causes build errors in some device drivers (e.g. mpt2sas)
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2227901a0230d8fde81ba9c602d649839390f56b upstream.
Currently core file of aarch32 process prstatus note has empty
registers set. As result aarch32 core files create by V8 kernel are
not very useful.
It happens because compat_gpr_get and compat_gpr_set functions can
copy registers values to/from either kbuf or ubuf. ELF core file
collection function fill_thread_core_info calls compat_gpr_get
with kbuf set and ubuf set to 0. But current compat_gpr_get and
compat_gpr_set function handle copy to/from only ubuf case.
Fix is to handle kbuf and ubuf as two separate cases in similar
way as other functions like user_regset_copyout, user_regset_copyin do.
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c168870704bcde6bb63d05f7882b620dd3985a46 upstream.
Our compat PTRACE_POKEUSR implementation simply passes the user data to
regset_copy_from_user after some simple range checking. Unfortunately,
the data in question has already been copied to the kernel stack by this
point, so the subsequent access_ok check fails and the ptrace request
returns -EFAULT. This causes problems tracing fork() with older versions
of strace.
This patch briefly changes the fs to KERNEL_DS, so that the access_ok
check passes even with a kernel address.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c177c81e09e517bbf75b67762cdab1b83aba6976 upstream.
Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs. So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.
Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 601c942176d8ad8334118bddb747e3720bed24f8 upstream.
Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
enabled the forced irq_set_affinity which previously refused to route an
interrupt to an offline cpu.
Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
implements this force logic and disables the cpu online check for GIC
interrupt controller.
When __cpu_disable calls migrate_irqs, it disables the current cpu in
cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
away from the cpu but passes affinity mask with the cpu being offlined
also included in it.
When calling irq_set_affinity with force == true in a cpu hotplug path,
the caller must ensure that the cpu being offlined is not present in the
affinity mask or it may be selected as the target CPU, leading to the
interrupt not being migrated.
This patch uses cpu_online_mask when using forced irq_set_affinity so
that the IRQs are properly migrated away.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4797ec2dc83a43be35bad56037d1b53db9e2b5d5 upstream.
The following happens when trying to run a kvm guest on a kernel
configured for 64k pages. This doesn't happen with 4k pages:
BUG: failure at include/linux/mm.h:297/put_page_testzero()!
Kernel panic - not syncing: BUG!
CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF 3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
Call trace:
[<fffffe0000096034>] dump_backtrace+0x0/0x16c
[<fffffe00000961b4>] show_stack+0x14/0x1c
[<fffffe000066e648>] dump_stack+0x84/0xb0
[<fffffe0000668678>] panic+0xf4/0x220
[<fffffe000018ec78>] free_reserved_area+0x0/0x110
[<fffffe000018edd8>] free_pages+0x50/0x88
[<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
[<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
[<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
[<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
[<fffffe00001edc1c>] __fput+0xb0/0x288
[<fffffe00001ede4c>] ____fput+0xc/0x14
[<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
[<fffffe0000095c14>] do_notify_resume+0x54/0x58
In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
on the stage2 pgd which leads to the BUG in put_page_testzero(). This
happens because a pud_huge() test in unmap_range() returns true when it
should always be false with 2-level pages tables used by 64k pages.
This patch removes support for huge puds if 2-level pagetables are
being used.
Signed-off-by: Mark Salter <msalter@redhat.com>
[catalin.marinas@arm.com: removed #ifndef around PUD_SIZE check]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 838977f0178334bf3d7f3e974ea3154b68979be0 upstream.
This fixes commit 6290b53de025 (arm64: compat: Wire up new AArch32 syscalls)
which did not update __NR_compat_syscalls accordingly.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit de2db7432917a82b62d55bb59635586eeca6d1bd upstream.
pgprot_{dmacoherent,writecombine,noncached} don't need to generate
executable mappings with side-effects like __sync_icache_dcache() being
called when the mapping is in user space.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 71fdb6bf61bf0692f004f9daf5650392c0cfe300 upstream.
Special pte mappings are not intended to be executable and do not even
have an associated struct page. This patch ensures that we do not call
__sync_icache_dcache() on such ptes.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Steve Capper <Steve.Capper@arm.com>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull KVM fixes from Paolo Bonzini:
"Three x86 fixes and one for ARM/ARM64.
In particular, nested virtualization on Intel is broken in 3.13 and
fixed by this pull request"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm, vmx: Really fix lazy FPU on nested guest
kvm: x86: fix emulator buffer overflow (CVE-2014-0049)
arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT
KVM: MMU: drop read-only large sptes when creating lower level sptes
|
|
Commit fb4a96029c8a (arm64: kernel: fix per-cpu offset restore on
resume) uses per_cpu_offset() unconditionally during CPU wakeup,
however, this is only defined for the SMP case.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Dave P Martin <Dave.Martin@arm.com>
|
|
Page table entries on ARM64 are 64 bits, and some pte functions such as
pte_dirty return a bitwise-and of a flag with the pte value. If the
flag to be tested resides in the upper 32 bits of the pte, then we run
into the danger of the result being dropped if downcast.
For example:
gather_stats(page, md, pte_dirty(*pte), 1);
where pte_dirty(*pte) is downcast to an int.
This patch adds a double logical invert to all the pte_ accessors to
ensure predictable downcasting.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Commit 1fcf7ce0c602 (arm: kvm: implement CPU PM notifier) added
support for CPU power-management, using a cpu_notifier to re-init
KVM on a CPU that entered CPU idle.
The code assumed that a CPU entering idle would actually be powered
off, loosing its state entierely, and would then need to be
reinitialized. It turns out that this is not always the case, and
some HW performs CPU PM without actually killing the core. In this
case, we try to reinitialize KVM while it is still live. It ends up
badly, as reported by Andre Przywara (using a Calxeda Midway):
[ 3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760
[ 3.663897] unexpected data abort in Hyp mode at: 0xc067d150
[ 3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0
The trick here is to detect if we've been through a full re-init or
not by looking at HVBAR (VBAR_EL2 on arm64). This involves
implementing the backend for __hyp_get_vectors in the main KVM HYP
code (rather small), and checking the return value against the
default one when the CPU notifier is called on CPU_PM_EXIT.
Reported-by: Andre Przywara <osp@andrep.de>
Tested-by: Andre Przywara <osp@andrep.de>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Rob Herring <rob.herring@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The frame PC value in the unwind code used to just take the saved LR
value and use that. That's incorrect as a stack trace, since it shows
the return path stack, not the call path stack.
In particular, it shows faulty information in case the bl is done as
the very last instruction of one label, since the return point will be
in the next label. That can easily be seen with tail calls to panic(),
which is marked __noreturn and thus doesn't have anything useful after it.
Easiest here is to just correct the unwind code and do a -4, to get the
actual call site for the backtrace instead of the return site.
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Pull KVM fixes from Paolo Bonzini:
"A small error handling problem and a compile breakage for ARM64"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
arm64: KVM: Add VGIC device control for arm64
KVM: return an error code in kvm_vm_ioctl_register_coalesced_mmio()
|
|
This fixes the build breakage introduced by
c07a0191ef2de1f9510f12d1f88e3b0b5cd8d66f and adds support for the device
control API and save/restore of the VGIC state for ARMv8.
The defines were simply missing from the arm64 header files and
uaccess.h must be implicitly imported from somewhere else on arm.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
Versatile Express. As these attach to a Motherboard Express V2M-P1 it
would be useful to have support for some V2M-P1 peripherals enabled by
default.
Additionally a couple of of features have been introduced since the last
defconfig update (CMA, jump labels) that would be good to have enabled
by default to ensure they are build and boot tested.
This patch updates the arm64 defconfig to enable support for these
devices and features. The arm64 Kconfig is modified to select
HAVE_PATA_PLATFORM, which is required to enable support for the
CompactFlash controller on the V2M-P1.
A few options which don't need to appear in defconfig are trimmed:
* BLK_DEV - selected by default
* EXPERIMENTAL - otherwise gone from the kernel
* MII - selected by drivers which require it
* USB_SUPPORT - selected by default
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.
On arm64, these operations have been incorrectly implemented as follows:
// A, B, C are independent memory locations
<Access [A]>
// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
<Access [C]>
The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).
Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.
The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:
<Access [A]>
// atomic_op (B)
dmb ish // Full barrier
1: ldxr x0, [B] // Exclusive load
<op(B)>
stxr w1, x0, [B] // Exclusive store
cbnz w1, 1b
dmb ish // Full barrier
<Access [C]>
but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:
<Access [A]>
// atomic_op (B)
1: ldxr x0, [B] // Exclusive load
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
dmb ish // Full barrier
<Access [C]>
The simple observations here are:
- The dmb ensures that no subsequent accesses (e.g. the access to C)
can enter or pass the atomic sequence.
- The dmb also ensures that no prior accesses (e.g. the access to A)
can pass the atomic sequence.
- Therefore, no prior access can pass a subsequent access, or
vice-versa (i.e. A is strictly ordered before C).
- The stlxr ensures that no prior access can pass the store component
of the atomic operation.
The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.
From an (arbitrary) observer's point of view, there are two scenarios:
1. We have observed the ldxr. This means that if we perform a store to
[B], the ldxr will still return older data. If we can observe the
ldxr, then we can potentially observe the permitted re-ordering
with the access to A, which is clearly an issue when compared to
the dmb variant of the code. Thankfully, the exclusive monitor will
save us here since it will be cleared as a result of the store and
the ldxr will retry. Notice that any use of a later memory
observation to imply observation of the ldxr will also imply
observation of the access to A, since the stlxr/dmb ensure strict
ordering.
2. We have not observed the ldxr. This means we can perform a store
and influence the later ldxr. However, that doesn't actually tell
us anything about the access to [A], so we've not lost anything
here either when compared to the dmb variant.
This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.
Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The dsb instruction takes an option specifying both the target access
types and shareability domain.
This patch allows such an option to be passed to the dsb macro,
resulting in potentially more efficient code. Currently the option is
ignored until all callers are updated (unlike ARM, the option is
mandated by the assembler).
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
This patch enables sys_compat, sys_finit_module, sys_sched_setattr and
sys_sched_getattr for compat (AArch32) applications.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Update wall-to-monotonic fields in the VDSO data page
unconditionally. These are used to service CLOCK_MONOTONIC_COARSE,
which is not guarded by use_syscall.
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When __kernel_clock_gettime is called with a CLOCK_MONOTONIC_COARSE or
CLOCK_REALTIME_COARSE clock id, it returns incorrectly to whatever the
caller has placed in x2 ("ret x2" to return from the fast path). Fix
this by saving x30/LR to x2 only in code that will call
__do_get_tspec, restoring x30 afterward, and using a plain "ret" to
return from the routine.
Also: while the resulting tv_nsec value for CLOCK_REALTIME and
CLOCK_MONOTONIC must be computed using intermediate values that are
left-shifted by cs_shift (x12, set by __do_get_tspec), the results for
coarse clocks should be calculated using unshifted values
(xtime_coarse_nsec is in units of actual nanoseconds). The current
code shifts intermediate values by x12 unconditionally, but x12 is
uninitialized when servicing a coarse clock. Fix this by setting x12
to 0 once we know we are dealing with a coarse clock id.
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently pgd_alloc has a redundant NULL check in its return path that
can be removed with no ill effects. With that removed it's also possible
to return early and eliminate the new_pgd temporary variable.
This patch applies said modifications, making the logic of pgd_alloc
correspond 1-1 with that of pgd_free.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Somehow SERROR has acquired an additional 'R' in a couple of headers.
This patch removes them before they spread further. As neither instance
is in use yet, no other sites need to be fixed up.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
With the 64K page size configuration, __create_page_tables in head.S
maps enough memory to get started but using 64K pages rather than 512M
sections with a single pgd/pud/pmd entry pointing to a pte table.
create_mapping() may override the pgd/pud/pmd table entry with a block
(section) one if the RAM size is more than 512MB and aligned correctly.
For the end of this block to be accessible, the old TLB entry must be
invalidated.
Cc: <stable@vger.kernel.org>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
dma_alloc_from_contiguous takes number of pages for a size.
Align up the dma size passed in to page size to avoid truncation
and allocation failures on sizes less than PAGE_SIZE.
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Add DSB after icache flush to complete the cache maintenance operation.
The function __flush_icache_all() is used only for user space mappings
and an ISB is not required because of an exception return before executing
user instructions. An exception return would behave like an ISB.
Signed-off-by: Vinayak Kale <vkale@apm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Whilst the text segment for our VDSO is marked as PT_LOAD in the ELF
headers, it is mapped by the kernel and not actually subject to
demand-paging. ld doesn't realise this, and emits a p_align field of 64k
(the maximum supported page size), which conflicts with the load address
picked by the kernel on 4k systems, which will be 4k aligned. This
causes GDB to fail with "Failed to read a valid object file image from
memory" when attempting to load the VDSO.
This patch passes the -n option to ld, which prevents it from aligning
PT_LOAD segments to the maximum page size.
Cc: <stable@vger.kernel.org>
Reported-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pyll ARM64 patches from Catalin Marinas:
- Build fix with DMA_CMA enabled
- Introduction of PTE_WRITE to distinguish between writable but clean
and truly read-only pages
- FIQs enabling/disabling clean-up (they aren't used on arm64)
- CPU resume fix for the per-cpu offset restoring
- Code comment typos
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: Introduce PTE_WRITE
arm64: mm: Remove PTE_BIT_FUNC macro
arm64: FIQs are unused
arm64: mm: fix the function name in comment of cpu_do_switch_mm
arm64: fix build error if DMA_CMA is enabled
arm64: kernel: fix per-cpu offset restore on resume
arm64: mm: fix the function name in comment of __flush_dcache_area
arm64: mm: use ubfm for dcache_line_size
|
|
We have the following means for encoding writable or dirty ptes:
PTE_DIRTY PTE_RDONLY
!pte_dirty && !pte_write 0 1
!pte_dirty && pte_write 0 1
pte_dirty && !pte_write 1 1
pte_dirty && pte_write 1 0
So we can't distinguish between writable clean ptes and read only
ptes. This can cause problems with ptes being incorrectly flagged as
read only when they are writable but not dirty.
This patch introduces a new software bit PTE_WRITE which allows us to
correctly identify writable ptes. PTE_RDONLY is now only clear for
valid ptes where a page is both writable and dirty.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Expand out the pte manipulation functions. This makes our life easier
when using things like tags and cscope.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
So any FIQ handling is superfluous at the moment. The functions to
disable/enable FIQs is kept around if ever someone needs them in the
future, but existing calling sites including arch_cpu_idle_prepare()
may go for now.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|