summaryrefslogtreecommitdiff
path: root/init
AgeCommit message (Collapse)Author
2021-07-14sched/core: Initialize the idle task with preemption disabledValentin Schneider
[ Upstream commit f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674 ] As pointed out by commit de9b8f5dcbd9 ("sched: Fix crash trying to dequeue/enqueue the idle thread") init_idle() can and will be invoked more than once on the same idle task. At boot time, it is invoked for the boot CPU thread by sched_init(). Then smp_init() creates the threads for all the secondary CPUs and invokes init_idle() on them. As the hotplug machinery brings the secondaries to life, it will issue calls to idle_thread_get(), which itself invokes init_idle() yet again. In this case it's invoked twice more per secondary: at _cpu_up(), and at bringup_cpu(). Given smp_init() already initializes the idle tasks for all *possible* CPUs, no further initialization should be required. Now, removing init_idle() from idle_thread_get() exposes some interesting expectations with regards to the idle task's preempt_count: the secondary startup always issues a preempt_disable(), requiring some reset of the preempt count to 0 between hot-unplug and hotplug, which is currently served by idle_thread_get() -> idle_init(). Given the idle task is supposed to have preemption disabled once and never see it re-enabled, it seems that what we actually want is to initialize its preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove init_idle() from idle_thread_get(). Secondary startups were patched via coccinelle: @begone@ @@ -preempt_disable(); ... cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10pid: take a reference when initializing `cad_pid`Mark Rutland
commit 0711f0d7050b9e07c44bc159bbc64ac0a1022c7f upstream. During boot, kernel_init_freeable() initializes `cad_pid` to the init task's struct pid. Later on, we may change `cad_pid` via a sysctl, and when this happens proc_do_cad_pid() will increment the refcount on the new pid via get_pid(), and will decrement the refcount on the old pid via put_pid(). As we never called get_pid() when we initialized `cad_pid`, we decrement a reference we never incremented, can therefore free the init task's struct pid early. As there can be dangling references to the struct pid, we can later encounter a use-after-free (e.g. when delivering signals). This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to have been around since the conversion of `cad_pid` to struct pid in commit 9ec52099e4b8 ("[PATCH] replace cad_pid by a struct pid") from the pre-KASAN stone age of v2.6.19. Fix this by getting a reference to the init task's struct pid when we assign it to `cad_pid`. Full KASAN splat below. ================================================================== BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline] BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273 CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1 Hardware name: linux,dummy-virt (DT) Call trace: ns_of_pid include/linux/pid.h:153 [inline] task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 do_notify_parent+0x308/0xe60 kernel/signal.c:1950 exit_notify kernel/exit.c:682 [inline] do_exit+0x2334/0x2bd0 kernel/exit.c:845 do_group_exit+0x108/0x2c8 kernel/exit.c:922 get_signal+0x4e4/0x2a88 kernel/signal.c:2781 do_signal arch/arm64/kernel/signal.c:882 [inline] do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936 work_pending+0xc/0x2dc Allocated by task 0: slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516 slab_alloc_node mm/slub.c:2907 [inline] slab_alloc mm/slub.c:2915 [inline] kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920 alloc_pid+0xdc/0xc00 kernel/pid.c:180 copy_process+0x2794/0x5e18 kernel/fork.c:2129 kernel_clone+0x194/0x13c8 kernel/fork.c:2500 kernel_thread+0xd4/0x110 kernel/fork.c:2552 rest_init+0x44/0x4a0 init/main.c:687 arch_call_rest_init+0x1c/0x28 start_kernel+0x520/0x554 init/main.c:1064 0x0 Freed by task 270: slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kmem_cache_free+0x224/0x8e0 mm/slub.c:3177 put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114 put_pid+0x30/0x48 kernel/pid.c:109 proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401 proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591 proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617 call_write_iter include/linux/fs.h:1977 [inline] new_sync_write+0x3ac/0x510 fs/read_write.c:518 vfs_write fs/read_write.c:605 [inline] vfs_write+0x9c4/0x1018 fs/read_write.c:585 ksys_write+0x124/0x240 fs/read_write.c:658 __do_sys_write fs/read_write.c:670 [inline] __se_sys_write fs/read_write.c:667 [inline] __arm64_sys_write+0x78/0xb0 fs/read_write.c:667 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall arch/arm64/kernel/syscall.c:49 [inline] el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129 do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168 el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416 el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432 el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701 The buggy address belongs to the object at ffff23794dda0000 which belongs to the cache pid of size 224 The buggy address is located 4 bytes inside of 224-byte region [ffff23794dda0000, ffff23794dda00e0) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0 head:(____ptrval____) order:1 compound_mapcount:0 flags: 0x3fffc0000010200(slab|head) raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080 raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com Fixes: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Christian Brauner <christian@brauner.io> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14seccomp: Fix CONFIG tests for Seccomp_filtersKenta.Tada@sony.com
[ Upstream commit 64bdc0244054f7d4bb621c8b4455e292f4e421bc ] Strictly speaking, seccomp filters are only used when CONFIG_SECCOMP_FILTER. This patch fixes the condition to enable "Seccomp_filters" in /proc/$pid/status. Signed-off-by: Kenta Tada <Kenta.Tada@sony.com> Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status") Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/OSBPR01MB26772D245E2CF4F26B76A989F5669@OSBPR01MB2677.jpnprd01.prod.outlook.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-10init/Kconfig: make COMPILE_TEST depend on HAS_IOMEMMasahiro Yamada
commit ea29b20a828511de3348334e529a3d046a180416 upstream. I read the commit log of the following two: - bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML") - 334ef6ed06fa ("init/Kconfig: make COMPILE_TEST depend on !S390") Both are talking about HAS_IOMEM dependency missing in many drivers. So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me. This does not change the behavior of UML. UML still cannot enable COMPILE_TEST because it does not provide HAS_IOMEM. The current dependency for S390 is too strong. Under the condition of CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST. I also removed the meaningless 'default n'. Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KP Singh <kpsingh@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Quentin Perret <qperret@google.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: "Enrico Weigelt, metux IT consult" <lkml@metux.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10init/Kconfig: make COMPILE_TEST depend on !S390Heiko Carstens
commit 334ef6ed06fa1a54e35296b77b693bcf6d63ee9e upstream. While allmodconfig and allyesconfig build for s390 there are also various bots running compile tests with randconfig, where PCI is disabled. This reveals that a lot of drivers should actually depend on HAS_IOMEM. Adding this to each device driver would be a never ending story, therefore just disable COMPILE_TEST for s390. The reasoning is more or less the same as described in commit bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML"). Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04kgdb: fix to kill breakpoints on initmem after bootSumit Garg
commit d54ce6158e354f5358a547b96299ecd7f3725393 upstream. Currently breakpoints in kernel .init.text section are not handled correctly while allowing to remove them even after corresponding pages have been freed. Fix it via killing .init.text section breakpoints just prior to initmem pages being freed. Doug: "HW breakpoints aren't handled by this patch but it's probably not such a big deal". Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.org Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Suggested-by: Doug Anderson <dianders@chromium.org> Acked-by: Doug Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTOREChris Wilson
commit bfe3911a91047557eb0e620f95a370aee6a248c7 upstream. Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category. Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store. Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp. References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> # DRM depends on kcmp Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> # systemd uses kcmp Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10fgraph: Initialize tracing_graph_pause at task creationSteven Rostedt (VMware)
commit 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 upstream. On some archs, the idle task can call into cpu_suspend(). The cpu_suspend() will disable or pause function graph tracing, as there's some paths in bringing down the CPU that can have issues with its return address being modified. The task_struct structure has a "tracing_graph_pause" atomic counter, that when set to something other than zero, the function graph tracer will not modify the return address. The problem is that the tracing_graph_pause counter is initialized when the function graph tracer is enabled. This can corrupt the counter for the idle task if it is suspended in these architectures. CPU 1 CPU 2 ----- ----- do_idle() cpu_suspend() pause_graph_tracing() task_struct->tracing_graph_pause++ (0 -> 1) start_graph_tracing() for_each_online_cpu(cpu) { ftrace_graph_init_idle_task(cpu) task-struct->tracing_graph_pause = 0 (1 -> 0) unpause_graph_tracing() task_struct->tracing_graph_pause-- (0 -> -1) The above should have gone from 1 to zero, and enabled function graph tracing again. But instead, it is set to -1, which keeps it disabled. There's no reason that the field tracing_graph_pause on the task_struct can not be initialized at boot up. Cc: stable@vger.kernel.org Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339 Reported-by: pierre.gondois@arm.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19rcu-tasks: Move RCU-tasks initialization to before early_initcall()Uladzislau Rezki (Sony)
[ Upstream commit 1b04fa9900263b4e217ca2509fd778b32c2b4eb2 ] PowerPC testing encountered boot failures due to RCU Tasks not being fully initialized until core_initcall() time. This commit therefore initializes RCU Tasks (along with Rude RCU and RCU Tasks Trace) just before early_initcall() time, thus allowing waiting on RCU Tasks grace periods from early_initcall() handlers. Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/ Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09exec: Transform exec_update_mutex into a rw_semaphoreEric W. Biederman
[ Upstream commit f7cfd871ae0c5008d94b6f66834e7845caa93c15 ] Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <jannh@google.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-11initramfs: fix clang build failureArnd Bergmann
There is only one function in init/initramfs.c that is in the .text section, and it is marked __weak. When building with clang-12 and the integrated assembler, this leads to a bug with recordmcount: ./scripts/recordmcount "init/initramfs.o" Cannot find symbol for section 2: .text. init/initramfs.o: failed I'm not quite sure what exactly goes wrong, but I notice that this function is only ever called from an __init function, and normally inlined. Marking it __init as well is clearly correct and it leads to recordmcount no longer complaining. Link: https://lkml.kernel.org/r/20201204165742.3815221-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Barret Rhoden <brho@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-06Merge tag 'kbuild-fixes-v5.10-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Move -Wcast-align to W=3, which tends to be false-positive and there is no tree-wide solution. - Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a preprocessor option and makes sense for .S files as well. - Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings. - Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings. - Fix undesirable line breaks in *.mod files. * tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: avoid split lines in .mod files kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1 kbuild: Hoist '--orphan-handling' into Kconfig Kbuild: do not emit debug info for assembly with LLVM_IAS=1 kbuild: use -fmacro-prefix-map for .S sources Makefile.extrawarn: move -Wcast-align to W=3
2020-12-02Merge tag 'trace-v5.10-rc6-bootconfig' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull bootconfig fixes from Steven Rostedt: "Have bootconfig size and checksum be little endian In case the bootconfig is created on one kind of endian machine, and then read on the other kind of endian kernel, the size and checksum will be incorrect. Instead, have both the size and checksum always be little endian and have the tool and the kernel convert it from little endian to or from the host endian" * tag 'trace-v5.10-rc6-bootconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: docs: bootconfig: Add the endianness of fields tools/bootconfig: Store size and checksum in footer as le32 bootconfig: Load size and checksum in the footer as le32
2020-12-01kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1Nathan Chancellor
ld.lld 10.0.1 spews a bunch of various warnings about .rela sections, along with a few others. Newer versions of ld.lld do not have these warnings. As a result, do not add '--orphan-handling=warn' to LDFLAGS_vmlinux if ld.lld's version is not new enough. Link: https://github.com/ClangBuiltLinux/linux/issues/1187 Link: https://github.com/ClangBuiltLinux/linux/issues/1193 Reported-by: Arvind Sankar <nivedita@alum.mit.edu> Reported-by: kernelci.org bot <bot@kernelci.org> Reported-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-12-01kbuild: Hoist '--orphan-handling' into KconfigNathan Chancellor
Currently, '--orphan-handling=warn' is spread out across four different architectures in their respective Makefiles, which makes it a little unruly to deal with in case it needs to be disabled for a specific linker version (in this case, ld.lld 10.0.1). To make it easier to control this, hoist this warning into Kconfig and the main Makefile so that disabling it is simpler, as the warning will only be enabled in a couple places (main Makefile and a couple of compressed boot folders that blow away LDFLAGS_vmlinx) and making it conditional is easier due to Kconfig syntax. One small additional benefit of this is saving a call to ld-option on incremental builds because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN. To keep the list of supported architectures the same, introduce CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to gain this automatically after all of the sections are specified and size asserted. A special thanks to Kees Cook for the help text on this config. Link: https://github.com/ClangBuiltLinux/linux/issues/1187 Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-11-30bootconfig: Load size and checksum in the footer as le32Masami Hiramatsu
Load the size and the checksum fields in the footer as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583934457.547349.10504070298990791074.stgit@devnote2 Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-27Merge tag 'printk-for-5.10-rc6-fixup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fixes from Petr Mladek: - do not lose trailing newline in pr_cont() calls - two trivial fixes for a dead store and a config description * tag 'printk-for-5.10-rc6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: finalize records with trailing newlines printk: remove unneeded dead-store assignment init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT description
2020-11-12bootconfig: Extend the magic check range to the preceding 3 bytesMasami Hiramatsu
Since Grub may align the size of initrd to 4 if user pass initrd from cpio, we have to check the preceding 3 bytes as well. Link: https://lkml.kernel.org/r/160520205132.303174.4876760192433315429.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 85c46b78da58 ("bootconfig: Add bootconfig magic word for indicating bootconfig explicitly") Reported-by: Chen Yu <yu.chen.surf@gmail.com> Tested-by: Chen Yu <yu.chen.surf@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-03init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT descriptionPaul Menzel
Currently, LOG_BUF_SHIFT defaults to 17, which is 2 ^ 17 bytes = 128 KB, and LOG_CPU_MAX_BUF_SHIFT defaults to 12, which is 2 ^ 12 bytes = 4 KB. Half of 128 KB is 64 KB, so more than 16 CPUs are required for the value to be used, as then the sum of contributions is greater than 64 KB for the first time. My guess is, that the description was written with the configuration values used in the SUSE in mind. Fixes: 23b2899f7f194f06e ("printk: allow increasing the ring buffer depending on the number of CPUs") Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200811092924.6256-1-pmenzel@molgen.mpg.de
2020-10-18Merge tag 'linux-kselftest-kunit-5.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull more Kunit updates from Shuah Khan: - add Kunit to kernel_init() and remove KUnit from init calls entirely. This addresses the concern that Kunit would not work correctly during late init phase. - add a linker section where KUnit can put references to its test suites. This is the first step in transitioning to dispatching all KUnit tests from a centralized executor rather than having each as its own separate late_initcall. - add a centralized executor to dispatch tests rather than relying on late_initcall to schedule each test suite separately. Centralized execution is for built-in tests only; modules will execute tests when loaded. - convert bitfield test to use KUnit framework - Documentation updates for naming guidelines and how kunit_test_suite() works. - add test plan to KUnit TAP format * tag 'linux-kselftest-kunit-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: lib: kunit: Fix compilation test when using TEST_BIT_FIELD_COMPILE lib: kunit: add bitfield test conversion to KUnit Documentation: kunit: add a brief blurb about kunit_test_suite kunit: test: add test plan to KUnit TAP format init: main: add KUnit to kernel init kunit: test: create a single centralized executor for all tests vmlinux.lds.h: add linker section for KUnit test suites Documentation: kunit: Add naming guidelines
2020-10-15Merge tag 'net-next-5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ...
2020-10-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "The latest advances in computer science from the trivial queue" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: xtensa: fix Kconfig typo spelling.txt: Remove some duplicate entries mtd: rawnand: oxnas: cleanup/simplify code selftests: vm: add fragment CONFIG_GUP_BENCHMARK perf: Fix opt help text for --no-bpf-event HID: logitech-dj: Fix spelling in comment bootconfig: Fix kernel message mentioning CONFIG_BOOT_CONFIG MAINTAINERS: rectify MMP SUPPORT after moving cputype.h scif: Fix spelling of EACCES printk: fix global comment lib/bitmap.c: fix spello fs: Fix missing 'bit' in comment
2020-10-13Merge tag 'printk-for-5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: "The big new thing is the fully lockless ringbuffer implementation, including the support for continuous lines. It will allow to store and read messages in any situation wihtout the risk of deadlocks and without the need of temporary per-CPU buffers. The access is still serialized by logbuf_lock. It synchronizes few more operations, for example, temporary buffer for formatting the message, syslog and kmsg_dump operations. The lock removal is being discussed and should be ready for the next release. The continuous lines are handled exactly the same way as before to avoid regressions in user space. It means that they are appended to the last message when the caller is the same. Only the last message can be extended. The data ring includes plain text of the messages. Except for an integer at the beginning of each message that points back to the descriptor ring with other metadata. The dictionary has to stay. journalctl uses it to filter the log. It allows to show messages related to a given device. The dictionary values are stored in the descriptor ring with the other metadata. This is the first part of the printk rework as discussed at Plumbers 2019, see https://lore.kernel.org/r/87k1acz5rx.fsf@linutronix.de. The next big step will be handling consoles by kthreads during the normal system operation. It will require special handling of situations when the kthreads could not get scheduled, for example, early boot, suspend, panic. Other changes: - Add John Ogness as a reviewer for printk subsystem. He is author of the rework and is familiar with the code and history. - Fix locking in serial8250_do_startup() to prevent lockdep report. - Few code cleanups" * tag 'printk-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (27 commits) printk: Use fallthrough pseudo-keyword printk: reduce setup_text_buf size to LOG_LINE_MAX printk: avoid and/or handle record truncation printk: remove dict ring printk: move dictionary keys to dev_printk_info printk: move printk_info into separate array printk: reimplement log_cont using record extension printk: ringbuffer: add finalization/extension support printk: ringbuffer: change representation of states printk: ringbuffer: clear initial reserved fields printk: ringbuffer: add BLK_DATALESS() macro printk: ringbuffer: relocate get_data() printk: ringbuffer: avoid memcpy() on state_var printk: ringbuffer: fix setting state in desc_read() kernel.h: Move oops_in_progress to printk.h scripts/gdb: update for lockless printk ringbuffer scripts/gdb: add utils.read_ulong() docs: vmcoreinfo: add lockless printk ringbuffer vmcoreinfo printk: reduce LOG_BUF_SHIFT range for H8300 printk: ringbuffer: support dataless records ...
2020-10-13Merge tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: - Add blkcg accounting for io-wq offload (Dennis) - A use-after-free fix for io-wq (Hillf) - Cancelation fixes and improvements - Use proper files_struct references for offload - Cleanup of io_uring_get_socket() since that can now go into our own header - SQPOLL fixes and cleanups, and support for sharing the thread - Improvement to how page accounting is done for registered buffers and huge pages, accounting the real pinned state - Series cleaning up the xarray code (Willy) - Various cleanups, refactoring, and improvements (Pavel) - Use raw spinlock for io-wq (Sebastian) - Add support for ring restrictions (Stefano) * tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (62 commits) io_uring: keep a pointer ref_node in file_data io_uring: refactor *files_register()'s error paths io_uring: clean file_data access in files_register io_uring: don't delay io_init_req() error check io_uring: clean leftovers after splitting issue io_uring: remove timeout.list after hrtimer cancel io_uring: use a separate struct for timeout_remove io_uring: improve submit_state.ios_left accounting io_uring: simplify io_file_get() io_uring: kill extra check in fixed io_file_get() io_uring: clean up ->files grabbing io_uring: don't io_prep_async_work() linked reqs io_uring: Convert advanced XArray uses to the normal API io_uring: Fix XArray usage in io_uring_add_task_file io_uring: Fix use of XArray in __io_uring_files_cancel io_uring: fix break condition for __io_uring_register() waiting io_uring: no need to call xa_destroy() on empty xarray io_uring: batch account ->req_issue and task struct references io_uring: kill callback_head argument for io_req_task_work_add() io_uring: move req preps out of io_issue_sqe() ...
2020-10-12Merge tag 'docs-5.10' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "As hoped, things calmed down for docs this cycle; fewer changes and almost no conflicts at all. This includes: - A reworked and expanded user-mode Linux document - Some simplifications and improvements for submitting-patches.rst - An emergency fix for (some) problems with Sphinx 3.x - Some welcome automarkup improvements to automatically generate cross-references to struct definitions and other documents - The usual collection of translation updates, typo fixes, etc" * tag 'docs-5.10' of git://git.lwn.net/linux: (81 commits) gpiolib: Update indentation in driver.rst for code excerpts Documentation/admin-guide: tainted-kernels: Fix typo occured Documentation: better locations for sysfs-pci, sysfs-tagging docs: programming-languages: refresh blurb on clang support Documentation: kvm: fix a typo Documentation: Chinese translation of Documentation/arm64/amu.rst doc: zh_CN: index files in arm64 subdirectory mailmap: add entry for <mstarovoitov@marvell.com> doc: seq_file: clarify role of *pos in ->next() docs: trace: ring-buffer-design.rst: use the new SPDX tag Documentation: kernel-parameters: clarify "module." parameters Fix references to nommu-mmap.rst docs: rewrite admin-guide/sysctl/abi.rst docs: fb: Remove vesafb scrollback boot option docs: fb: Remove sstfb scrollback boot option docs: fb: Remove matroxfb scrollback boot option docs: fb: Remove framebuffer scrollback boot option docs: replace the old User Mode Linux HowTo with a new one Documentation/admin-guide: blockdev/ramdisk: remove use of "rdev" Documentation/admin-guide: README & svga: remove use of "rdev" ...
2020-10-12Merge branch 'printk-rework' into for-linusPetr Mladek
2020-10-09init: main: add KUnit to kernel initBrendan Higgins
Although we have not seen any actual examples where KUnit doesn't work because it runs in the late init phase of the kernel, it has been a concern for some time that this could potentially be an issue in the future. So, remove KUnit from init calls entirely, instead call directly from kernel_init() so that KUnit runs after late init. Co-developed-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Brendan Higgins <brendanhiggins@google.com> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-09-30io_uring: don't rely on weak ->files referencesJens Axboe
Grab actual references to the files_struct. To avoid circular references issues due to this, we add a per-task note that keeps track of what io_uring contexts a task has used. When the tasks execs or exits its assigned files, we cancel requests based on this tracking. With that, we can grab proper references to the files table, and no longer need to rely on stashing away ring_fd and ring_file to check if the ring_fd may have been closed. Cc: stable@vger.kernel.org # v5.5+ Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24Fix references to nommu-mmap.rstStephen Kitt
nommu-mmap.rst was moved to Documentation/admin-guide/mm; this patch updates the remaining stale references to Documentation/mm. Fixes: 800c02f5d030 ("docs: move nommu-mmap.txt to admin-guide and rename to ReST") Signed-off-by: Stephen Kitt <steve@sk2.org> Link: https://lore.kernel.org/r/20200812092230.27541-1-steve@sk2.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18bootconfig: init: make xbc_namebuf staticJason Yan
This eliminates the following sparse warning: init/main.c:306:6: warning: symbol 'xbc_namebuf' was not declared. Should it be static? Link: https://lkml.kernel.org/r/20200915070324.2239473-1-yanaijie@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18kprobes: tracing/kprobes: Fix to kill kprobes on initmem after bootMasami Hiramatsu
Since kprobe_event= cmdline option allows user to put kprobes on the functions in initmem, kprobe has to make such probes gone after boot. Currently the probes on the init functions in modules will be handled by module callback, but the kernel init text isn't handled. Without this, kprobes may access non-exist text area to disable or remove it. Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2 Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter") Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-08printk: reduce LOG_BUF_SHIFT range for H8300John Ogness
The .bss section for the h8300 is relatively small. A value of CONFIG_LOG_BUF_SHIFT that is larger than 19 will create a static printk ringbuffer that is too large. Limit the range appropriately for the H8300. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200812073122.25412-1-john.ogness@linutronix.de
2020-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
We got slightly different patches removing a double word in a comment in net/ipv4/raw.c - picked the version from net. Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached values instead of VNIC login response buffer (following what commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login response buffer") did). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-04init: fix error check in clean_path()Barret Rhoden
init_stat() returns 0 on success, same as vfs_lstat(). When it replaced vfs_lstat(), the '!' was dropped. Fixes: 716308a5331b ("init: add an init_stat helper") Signed-off-by: Barret Rhoden <brho@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-01bootconfig: Fix kernel message mentioning CONFIG_BOOT_CONFIGShaokun Zhang
Fix up one typo: CONFIG_BOOTCONFIG -> CONFIG_BOOT_CONFIG Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-08-28bpf: Introduce sleepable BPF programsAlexei Starovoitov
Introduce sleepable BPF programs that can request such property for themselves via BPF_F_SLEEPABLE flag at program load time. In such case they will be able to use helpers like bpf_copy_from_user() that might sleep. At present only fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only when they are attached to kernel functions that are known to allow sleeping. The non-sleepable programs are relying on implicit rcu_read_lock() and migrate_disable() to protect life time of programs, maps that they use and per-cpu kernel structures used to pass info between bpf programs and the kernel. The sleepable programs cannot be enclosed into rcu_read_lock(). migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs should not be enclosed in migrate_disable() as well. Therefore rcu_read_lock_trace is used to protect the life time of sleepable progs. There are many networking and tracing program types. In many cases the 'struct bpf_prog *' pointer itself is rcu protected within some other kernel data structure and the kernel code is using rcu_dereference() to load that program pointer and call BPF_PROG_RUN() on it. All these cases are not touched. Instead sleepable bpf programs are allowed with bpf trampoline only. The program pointers are hard-coded into generated assembly of bpf trampoline and synchronize_rcu_tasks_trace() is used to protect the life time of the program. The same trampoline can hold both sleepable and non-sleepable progs. When rcu_read_lock_trace is held it means that some sleepable bpf program is running from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru maps. These map types are waiting on programs to complete via synchronize_rcu_tasks_trace(); Updates to trampoline now has to do synchronize_rcu_tasks_trace() and synchronize_rcu_tasks() to wait for sleepable progs to finish and for trampoline assembly to finish. This is the first step of introducing sleepable progs. Eventually dynamically allocated hash maps can be allowed and networking program types can become sleepable too. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
2020-08-20bpf: Add kernel module with user mode driver that populates bpffs.Alexei Starovoitov
Add kernel module with user mode driver that populates bpffs with BPF iterators. $ mount bpffs /my/bpffs/ -t bpf $ ls -la /my/bpffs/ total 4 drwxrwxrwt 2 root root 0 Jul 2 00:27 . drwxr-xr-x 19 root root 4096 Jul 2 00:09 .. -rw------- 1 root root 0 Jul 2 00:27 maps.debug -rw------- 1 root root 0 Jul 2 00:27 progs.debug The user mode driver will load BPF Type Formats, create BPF maps, populate BPF maps, load two BPF programs, attach them to BPF iterators, and finally send two bpf_link IDs back to the kernel. The kernel will pin two bpf_links into newly mounted bpffs instance under names "progs.debug" and "maps.debug". These two files become human readable. $ cat /my/bpffs/progs.debug id name attached 11 dump_bpf_map bpf_iter_bpf_map 12 dump_bpf_prog bpf_iter_bpf_prog 27 test_pkt_access 32 test_main test_pkt_access test_pkt_access 33 test_subprog1 test_pkt_access_subprog1 test_pkt_access 34 test_subprog2 test_pkt_access_subprog2 test_pkt_access 35 test_subprog3 test_pkt_access_subprog3 test_pkt_access 36 new_get_skb_len get_skb_len test_pkt_access 37 new_get_skb_ifindex get_skb_ifindex test_pkt_access 38 new_get_constant get_constant test_pkt_access The BPF program dump_bpf_prog() in iterators.bpf.c is printing this data about all BPF programs currently loaded in the system. This information is unstable and will change from kernel to kernel as ".debug" suffix conveys. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200819042759.51280-4-alexei.starovoitov@gmail.com
2020-08-14Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds
Pull OpenRISC updates from Stafford Horne: "A few patches all over the place during this cycle, mostly bug and sparse warning fixes for OpenRISC, but a few enhancements too. Note, there are 2 non OpenRISC specific fixups. Non OpenRISC fixes: - In init we need to align the init_task correctly to fix an issue with MUTEX_FLAGS, reviewed by Peter Z. No one picked this up so I kept it on my tree. - In asm-generic/io.h I fixed up some sparse warnings, OK'd by Arnd. Arnd asked to merge it via my tree. OpenRISC fixes: - Many fixes for OpenRISC sprase warnings. - Add support OpenRISC SMP tlb flushing rather than always flushing the entire TLB on every CPU. - Fix bug when dumping stack via /proc/xxx/stack of user threads" * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: uaccess: Add user address space check to access_ok openrisc: signal: Fix sparse address space warnings openrisc: uaccess: Remove unused macro __addr_ok openrisc: uaccess: Use static inline function in access_ok openrisc: uaccess: Fix sparse address space warnings openrisc: io: Fixup defines and move include to the end asm-generic/io.h: Fix sparse warnings on big-endian architectures openrisc: Implement proper SMP tlb flushing openrisc: Fix oops caused when dumping stack openrisc: Add support for external initrd images init: Align init_task to avoid conflict with MUTEX_FLAGS openrisc: fix __user in raw_copy_to_user()'s prototype
2020-08-10Merge tag 'locking-urgent-2020-08-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ...
2020-08-07Merge tag 'trace-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - The biggest news in that the tracing ring buffer can now time events that interrupted other ring buffer events. Before this change, if an interrupt came in while recording another event, and that interrupt also had an event, those events would all have the same time stamp as the event it interrupted. Now, with the new design, those events will have a unique time stamp and rightfully display the time for those events that were recorded while interrupting another event. - Bootconfig how has an "override" operator that lets the users have a default config, but then add options to override the default. - A fix was made to properly filter function graph tracing to the ftrace PIDs. This came in at the end of the -rc cycle, and needs to be backported. - Several clean ups, performance updates, and minor fixes as well. * tag 'trace-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (39 commits) tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE tracing: Use trace_sched_process_free() instead of exit() for pid tracing bootconfig: Fix to find the initargs correctly Documentation: bootconfig: Add bootconfig override operator tools/bootconfig: Add testcases for value override operator lib/bootconfig: Add override operator support kprobes: Remove show_registers() function prototype tracing/uprobe: Remove dead code in trace_uprobe_register() kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler ftrace: Fix ftrace_trace_task return value tracepoint: Use __used attribute definitions from compiler_attributes.h tracepoint: Mark __tracepoint_string's __used trace : Have tracing buffer info use kvzalloc instead of kzalloc tracing: Remove outdated comment in stack handling ftrace: Do not let direct or IPMODIFY ftrace_ops be added to module and set trampolines ftrace: Setup correct FTRACE_FL_REGS flags for module tracing/hwlat: Honor the tracing_cpumask tracing/hwlat: Drop the duplicate assignment in start_kthread() tracing: Save one trace_event->type by using __TRACE_LAST_TYPE ...
2020-08-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few MM hotfixes - kthread, tools, scripts, ntfs and ocfs2 - some of MM Subsystems affected by this patch series: kthread, tools, scripts, ntfs, ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore, sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits) mm: vmscan: consistent update to pgrefill mm/vmscan.c: fix typo khugepaged: khugepaged_test_exit() check mmget_still_valid() khugepaged: retract_page_tables() remember to test exit khugepaged: collapse_pte_mapped_thp() protect the pmd lock khugepaged: collapse_pte_mapped_thp() flush the right range mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible mm: thp: replace HTTP links with HTTPS ones mm/page_alloc: fix memalloc_nocma_{save/restore} APIs mm/page_alloc.c: skip setting nodemask when we are in interrupt mm/page_alloc: fallbacks at most has 3 elements mm/page_alloc: silence a KASAN false positive mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask() mm/page_alloc.c: simplify pageblock bitmap access mm/page_alloc.c: extract the common part in pfn_to_bitidx() mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits mm/shuffle: remove dynamic reconfiguration mm/memory_hotplug: document why shuffle_zone() is relevant mm/page_alloc: remove nr_free_pagecache_pages() mm: remove vm_total_pages ...
2020-08-07kasan, arm64: don't instrument functions that enable kasanAndrey Konovalov
This patch prepares Software Tag-Based KASAN for stack tagging support. With stack tagging enabled, KASAN tags stack variable in each function in its prologue. In start_kernel() stack variables get tagged before KASAN is enabled via setup_arch()->kasan_init(). As the result the tags for start_kernel()'s stack variables end up in the temporary shadow memory. Later when KASAN gets enabled, switched to normal shadow, and starts checking tags, this leads to false-positive reports, as proper tags are missing in normal shadow. Disable KASAN instrumentation for start_kernel(). Also disable it for arm64's setup_arch() as a precaution (it doesn't have any stack variables right now). [andreyknvl@google.com: reorder attributes for start_kernel()] Link: http://lkml.kernel.org/r/26fb6165a17abcf61222eda5184c030fb6b133d1.1596544734.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Cc: Ard Biesheuvel <ardb@kernel.org> Link: http://lkml.kernel.org/r/55d432671a92e931ab8234b03dc36b14d4c21bfb.1596199677.git.andreyknvl@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/slab: expand CONFIG_SLAB_FREELIST_HARDENED to include SLABKees Cook
Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB" In reviewing Vlastimil Babka's latest slub debug series, I realized[1] that several checks under CONFIG_SLAB_FREELIST_HARDENED weren't being applied to SLAB. Fix this by expanding the Kconfig coverage, and adding a simple double-free test for SLAB. This patch (of 2): Include SLAB caches when performing kmem_cache pointer verification. A defense against such corruption[1] should be applied to all the allocators. With this added, the "SLAB_FREE_CROSS" and "SLAB_FREE_PAGE" LKDTM tests now pass on SLAB: lkdtm: Performing direct entry SLAB_FREE_CROSS lkdtm: Attempting cross-cache slab free ... ------------[ cut here ]------------ cache_from_obj: Wrong slab cache. lkdtm-heap-b but object is from lkdtm-heap-a WARNING: CPU: 2 PID: 2195 at mm/slab.h:530 kmem_cache_free+0x8d/0x1d0 ... lkdtm: Performing direct entry SLAB_FREE_PAGE lkdtm: Attempting non-Slab slab free ... ------------[ cut here ]------------ virt_to_cache: Object is not a Slab page! WARNING: CPU: 1 PID: 2202 at mm/slab.h:489 kmem_cache_free+0x196/0x1d0 Additionally clean up neighboring Kconfig entries for clarity, readability, and redundant option removal. [1] https://github.com/ThomasKing2014/slides/raw/master/Building%20universal%20Android%20rooting%20with%20a%20type%20confusion%20vulnerability.pdf Fixes: 598a0717a816 ("mm/slab: validate cache membership under freelist hardening") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Popov <alex.popov@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Garrett <mjg59@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Link: http://lkml.kernel.org/r/20200625215548.389774-1-keescook@chromium.org Link: http://lkml.kernel.org/r/20200625215548.389774-2-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07Merge branch 'hch.init_path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull init and set_fs() cleanups from Al Viro: "Christoph's 'getting rid of ksys_...() uses under KERNEL_DS' series" * 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (50 commits) init: add an init_dup helper init: add an init_utimes helper init: add an init_stat helper init: add an init_mknod helper init: add an init_mkdir helper init: add an init_symlink helper init: add an init_link helper init: add an init_eaccess helper init: add an init_chmod helper init: add an init_chown helper init: add an init_chroot helper init: add an init_chdir helper init: add an init_rmdir helper init: add an init_unlink helper init: add an init_umount helper init: add an init_mount helper init: mark create_dev as __init init: mark console_on_rootfs as __init init: initialize ramdisk_execute_command at compile time devtmpfs: refactor devtmpfsd() ...
2020-08-04Merge tag 'docs-5.9' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more" * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits) scripts/kernel-doc: optionally treat warnings as errors docs: ia64: correct typo mailmap: add entry for <alobakin@marvell.com> doc/zh_CN: add cpu-load Chinese version Documentation/admin-guide: tainted-kernels: fix spelling mistake MAINTAINERS: adjust kprobes.rst entry to new location devices.txt: document rfkill allocation PCI: correct flag name docs: filesystems: vfs: correct flag name docs: filesystems: vfs: correct sync_mode flag names docs: path-lookup: markup fixes for emphasis docs: path-lookup: more markup fixes docs: path-lookup: fix HTML entity mojibake CREDITS: Replace HTTP links with HTTPS ones docs: process: Add an example for creating a fixes tag doc/zh_CN: add Chinese translation prefer section doc/zh_CN: add clearing-warn-once Chinese version doc/zh_CN: add admin-guide index doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label futex: MAINTAINERS: Re-add selftests directory ...
2020-08-04init: add an init_dup helperChristoph Hellwig
Add a simple helper to grab a reference to a file and install it at the next available fd, and switch the early init code over to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-04Merge branch 'exec-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve updates from Eric Biederman: "During the development of v5.7 I ran into bugs and quality of implementation issues related to exec that could not be easily fixed because of the way exec is implemented. So I have been diggin into exec and cleaning up what I can. This cycle I have been looking at different ideas and different implementations to see what is possible to improve exec, and cleaning the way exec interfaces with in kernel users. Only cleaning up the interfaces of exec with rest of the kernel has managed to stabalize and make it through review in time for v5.9-rc1 resulting in 2 sets of changes this cycle. - Implement kernel_execve - Make the user mode driver code a better citizen With kernel_execve the code size got a little larger as the copying of parameters from userspace and copying of parameters from userspace is now separate. The good news is kernel threads no longer need to play games with set_fs to use exec. Which when combined with the rest of Christophs set_fs changes should security bugs with set_fs much more difficult" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) exec: Implement kernel_execve exec: Factor bprm_stack_limits out of prepare_arg_pages exec: Factor bprm_execve out of do_execve_common exec: Move bprm_mm_init into alloc_bprm exec: Move initialization of bprm->filename into alloc_bprm exec: Factor out alloc_bprm exec: Remove unnecessary spaces from binfmts.h umd: Stop using split_argv umd: Remove exit_umh bpfilter: Take advantage of the facilities of struct pid exit: Factor thread_group_exited out of pidfd_poll umd: Track user space drivers with struct pid bpfilter: Move bpfilter_umh back into init data exec: Remove do_execve_file umh: Stop calling do_execve_file umd: Transform fork_usermode_blob into fork_usermode_driver umd: Rename umd_info.cmdline umd_info.driver_name umd: For clarity rename umh_info umd_info umh: Separate the user mode driver and the user mode helper support umh: Remove call_usermodehelper_setup_file. ...
2020-08-04Merge tag 'seccomp-v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "There are a bunch of clean ups and selftest improvements along with two major updates to the SECCOMP_RET_USER_NOTIF filter return: EPOLLHUP support to more easily detect the death of a monitored process, and being able to inject fds when intercepting syscalls that expect an fd-opening side-effect (needed by both container folks and Chrome). The latter continued the refactoring of __scm_install_fd() started by Christoph, and in the process found and fixed a handful of bugs in various callers. - Improved selftest coverage, timeouts, and reporting - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner) - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)" * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD seccomp: Introduce addfd ioctl to seccomp user notifier fs: Expand __receive_fd() to accept existing fd pidfd: Replace open-coded receive_fd() fs: Add receive_fd() wrapper for __receive_fd() fs: Move __scm_install_fd() to __receive_fd() net/scm: Regularize compat handling of scm_detach_fds() pidfd: Add missing sock updates for pidfd_getfd() net/compat: Add missing sock updates for SCM_RIGHTS selftests/seccomp: Check ENOSYS under tracing selftests/seccomp: Refactor to use fixture variants selftests/harness: Clean up kern-doc for fixtures seccomp: Use -1 marker for end of mode 1 syscall list seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall() selftests/seccomp: Make kcmp() less required seccomp: Use pr_fmt selftests/seccomp: Improve calibration loop selftests/seccomp: use 90s as timeout selftests/seccomp: Expand benchmark to per-filter measurements ...
2020-08-04bootconfig: Fix to find the initargs correctlyMasami Hiramatsu
Since the parse_args() stops parsing at '--', bootconfig_params() will never get the '--' as param and initargs_found never be true. In the result, if we pass some init arguments via the bootconfig, those are always appended to the kernel command line with '--' even if the kernel command line already has '--'. To fix this correctly, check the return value of parse_args() and set initargs_found true if the return value is not an error but a valid address. Link: https://lkml.kernel.org/r/159650953285.270383.14822353843556363851.stgit@devnote2 Fixes: f61872bb58a1 ("bootconfig: Use parse_args() to find bootconfig and '--'") Cc: stable@vger.kernel.org Reported-by: Arvind Sankar <nivedita@alum.mit.edu> Suggested-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>