summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2021-02-26kernel: delete repeated words in commentsRandy Dunlap
Drop repeated words in kernel/events/. {if, the, that, with, time} Drop repeated words in kernel/locking/. {it, no, the} Drop repeated words in kernel/sched/. {in, not} Link: https://lkml.kernel.org/r/20210127023412.26292-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Will Deacon <will@kernel.org> [kernel/locking/] Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26groups: simplify struct group_info allocationHubert Jasudowicz
Combine kmalloc and vmalloc into a single call. Use struct_size macro instead of direct size calculation. Link: https://lkml.kernel.org/r/ba9ba5beea9a44b7196c41a0d9528abd5f20dd2e.1611620846.git.hubert.jasudowicz@gmail.com Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com> Cc: Gao Xiang <xiang@kernel.org> Cc: Micah Morton <mortonm@chromium.org> Cc: Michael Kelley <mikelley@microsoft.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Thomas Cedeno <thomascedeno@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26sysctl.c: fix underflow value setting risk in vm_tableLin Feng
Apart from subsystem specific .proc_handler handler, all ctl_tables with extra1 and extra2 members set should use proc_dointvec_minmax instead of proc_dointvec, or the limit set in extra* never work and potentially echo underflow values(negative numbers) is likely make system unstable. Especially vfs_cache_pressure and zone_reclaim_mode, -1 is apparently not a valid value, but we can set to them. And then kernel may crash. # echo -1 > /proc/sys/vm/vfs_cache_pressure Link: https://lkml.kernel.org/r/20201223105535.2875-1-linf@wangsu.com Signed-off-by: Lin Feng <linf@wangsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26tracing: add error_report_end trace pointAlexander Potapenko
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3. This patchset adds a tracepoint, error_repor_end, that is to be used by KFENCE, KASAN, and potentially other bug detection tools, when they print an error report. One of the possible use cases is userspace collection of kernel error reports: interested parties can subscribe to the tracing event via tracefs, and get notified when an error report occurs. This patch (of 3): Introduce error_report_end tracepoint. It can be used in debugging tools like KASAN, KFENCE, etc. to provide extensions to the error reporting mechanisms (e.g. allow tests hook into error reporting, ease error report collection from production kernels). Another benefit would be making use of ftrace for debugging or benchmarking the tools themselves. Should we need it, the tracepoint name leaves us with the possibility to introduce a complementary error_report_start tracepoint in the future. Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-25Merge tag 'kbuild-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Fix false-positive build warnings for ARCH=ia64 builds - Optimize dictionary size for module compression with xz - Check the compiler and linker versions in Kconfig - Fix misuse of extra-y - Support DWARF v5 debug info - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x exceeded the limit - Add generic syscall{tbl,hdr}.sh for cleanups across arches - Minor cleanups of genksyms - Minor cleanups of Kconfig * tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits) initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD kbuild: remove deprecated 'always' and 'hostprogs-y/m' kbuild: parse C= and M= before changing the working directory kbuild: reuse this-makefile to define abs_srctree kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig kconfig: omit --oldaskconfig option for 'make config' kconfig: fix 'invalid option' for help option kconfig: remove dead code in conf_askvalue() kconfig: clean up nested if-conditionals in check_conf() kconfig: Remove duplicate call to sym_get_string_value() Makefile: Remove # characters from compiler string Makefile: reuse CC_VERSION_TEXT kbuild: check the minimum linker version in Kconfig kbuild: remove ld-version macro scripts: add generic syscallhdr.sh scripts: add generic syscalltbl.sh arch: syscalls: remove $(srctree)/ prefix from syscall tables arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work gen_compile_commands: prune some directories kbuild: simplify access to the kernel's version ...
2021-02-24Merge tag 'x86-entry-2021-02-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq entry updates from Thomas Gleixner: "The irq stack switching was moved out of the ASM entry code in course of the entry code consolidation. It ended up being suboptimal in various ways. This reworks the X86 irq stack handling: - Make the stack switching inline so the stackpointer manipulation is not longer at an easy to find place. - Get rid of the unnecessary indirect call. - Avoid the double stack switching in interrupt return and reuse the interrupt stack for softirq handling. - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused about the stack pointer manipulation" * tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix stack-swizzle for FRAME_POINTER=y um: Enforce the usage of asm-generic/softirq_stack.h x86/softirq/64: Inline do_softirq_own_stack() softirq: Move do_softirq_own_stack() to generic asm header softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK x86/softirq: Remove indirection in do_softirq_own_stack() x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall x86/entry: Convert device interrupts to inline stack switching x86/entry: Convert system vectors to irq stack macro x86/irq: Provide macro for inlining irq stack switching x86/apic: Split out spurious handling code x86/irq/64: Adjust the per CPU irq stack pointer by 8 x86/irq: Sanitize irq stack tracking x86/entry: Fix instrumentation annotation
2021-02-24Merge tag 'driver-core-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / debugfs update from Greg KH: "Here is the "big" driver core and debugfs update for 5.12-rc1 This set of driver core patches caused a bunch of problems in linux-next for the past few weeks, when Saravana tried to set fw_devlink=on as the default functionality. This caused a number of systems to stop booting, and lots of bugs were fixed in this area for almost all of the reported systems, but this option is not ready to be turned on just yet for the default operation based on this testing, so I've reverted that change at the very end so we don't have to worry about regressions in 5.12 We will try to turn this on for 5.13 if testing goes better over the next few months. Other than the fixes caused by the fw_devlink testing in here, there's not much more: - debugfs fixes for invalid input into debugfs_lookup() - kerneldoc cleanups - warn message if platform drivers return an error on their remove callback (a futile effort, but good to catch). All of these have been in linux-next for a while now, and the regressions have gone away with the revert of the fw_devlink change" * tag 'driver-core-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits) Revert "driver core: Set fw_devlink=on by default" of: property: fw_devlink: Ignore interrupts property for some configs debugfs: do not attempt to create a new file before the filesystem is initalized debugfs: be more robust at handling improper input in debugfs_lookup() driver core: auxiliary bus: Fix calling stage for auxiliary bus init of: irq: Fix the return value for of_irq_parse_one() stub of: irq: make a stub for of_irq_parse_one() clk: Mark fwnodes when their clock provider is added/removed PM: domains: Mark fwnodes when their powerdomain is added/removed irqdomain: Mark fwnodes when their irqdomain is added/removed driver core: fw_devlink: Handle suppliers that don't use driver core of: property: Add fw_devlink support for optional properties driver core: Add fw_devlink.strict kernel param of: property: Don't add links to absent suppliers driver core: fw_devlink: Detect supplier devices that will never be added driver core: platform: Emit a warning if a remove callback returned non-zero of: property: Fix fw_devlink handling of interrupts/interrupts-extended gpiolib: Don't probe gpio_device if it's not the primary device device.h: Remove bogus "the" in kerneldoc gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default ...
2021-02-24Merge tag 'dma-mapping-5.12' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping updates from Christoph Hellwig: - add support to emulate processing delays in the DMA API benchmark selftest (Barry Song) - remove support for non-contiguous noncoherent allocations, which aren't used and will be replaced by a different API * tag 'dma-mapping-5.12' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: remove the {alloc,free}_noncoherent methods dma-mapping: benchmark: pretend DMA is transmitting
2021-02-23Merge tag 'keys-misc-20210126' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring updates from David Howells: "Here's a set of minor keyrings fixes/cleanups that I've collected from various people for the upcoming merge window. A couple of them might, in theory, be visible to userspace: - Make blacklist_vet_description() reject uppercase letters as they don't match the all-lowercase hex string generated for a blacklist search. This may want reconsideration in the future, but, currently, you can't add to the blacklist keyring from userspace and the only source of blacklist keys generates lowercase descriptions. - Fix blacklist_init() to use a new KEY_ALLOC_* flag to indicate that it wants KEY_FLAG_KEEP to be set rather than passing KEY_FLAG_KEEP into keyring_alloc() as KEY_FLAG_KEEP isn't a valid alloc flag. This isn't currently a problem as the blacklist keyring isn't currently writable by userspace. The rest of the patches are cleanups and I don't think they should have any visible effect" * tag 'keys-misc-20210126' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: watch_queue: rectify kernel-doc for init_watch() certs: Replace K{U,G}IDT_INIT() with GLOBAL_ROOT_{U,G}ID certs: Fix blacklist flag type confusion PKCS#7: Fix missing include certs: Fix blacklisted hexadecimal hash string check certs/blacklist: fix kernel doc interface issue crypto: public_key: Remove redundant header file from public_key.h keys: remove trailing semicolon in macro definition crypto: pkcs7: Use match_string() helper to simplify the code PKCS#7: drop function from kernel-doc pkcs7_validate_trust_one encrypted-keys: Replace HTTP links with HTTPS ones crypto: asymmetric_keys: fix some comments in pkcs7_parser.h KEYS: remove redundant memset security: keys: delete repeated words in comments KEYS: asymmetric: Fix kerneldoc security/keys: use kvfree_sensitive() watch_queue: Drop references to /dev/watch_queue keys: Remove outdated __user annotations security: keys: Fix fall-through warnings for Clang
2021-02-23Merge tag 'clang-lto-v5.12-rc1-part2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more clang LTO updates from Kees Cook: "Clang LTO x86 enablement. Full disclosure: while this has _not_ been in linux-next (since it initially looked like the objtool dependencies weren't going to make v5.12), it has been under daily build and runtime testing by Sami for quite some time. These x86 portions have been discussed on lkml, with Peter, Josh, and others helping nail things down. The bulk of the changes are to get objtool working happily. The rest of the x86 enablement is very small. Summary: - Generate __mcount_loc in objtool (Peter Zijlstra) - Support running objtool against vmlinux.o (Sami Tolvanen) - Clang LTO enablement for x86 (Sami Tolvanen)" Link: https://lore.kernel.org/lkml/20201013003203.4168817-26-samitolvanen@google.com/ Link: https://lore.kernel.org/lkml/cover.1611263461.git.jpoimboe@redhat.com/ * tag 'clang-lto-v5.12-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kbuild: lto: force rebuilds when switching CONFIG_LTO x86, build: allow LTO to be selected x86, cpu: disable LTO for cpu.c x86, vdso: disable LTO only for vDSO kbuild: lto: postpone objtool objtool: Split noinstr validation from --vmlinux x86, build: use objtool mcount tracing: add support for objtool mcount objtool: Don't autodetect vmlinux.o objtool: Fix __mcount_loc generation with Clang's assembler objtool: Add a pass for generating __mcount_loc
2021-02-23Merge tag 'pm-5.12-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are fixes and cleanups on top of the power management material for 5.12-rc1 merged previously. Specifics: - Address cpufreq regression introduced in 5.11 that causes CPU frequency reporting to be distorted on systems with CPPC that use acpi-cpufreq as the scaling driver (Rafael Wysocki). - Fix regression introduced during the 5.10 development cycle related to CPU hotplug and policy recreation in the qcom-cpufreq-hw driver (Shawn Guo). - Fix recent regression in the operating performance points (OPP) framework that may cause frequency updates to be skipped by mistake in some cases (Jonathan Marek). - Simplify schedutil governor code and remove a misleading comment from it (Yue Hu). - Fix kerneldoc comment typo in the cpufreq core (Yue Hu)" * tag 'pm-5.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Fix typo in kerneldoc comment cpufreq: schedutil: Remove update_lock comment from struct sugov_policy definition cpufreq: schedutil: Remove needless sg_policy parameter from ignore_dl_rate_limit() cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks opp: Don't skip freq update for different frequency
2021-02-23Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-23tracing: add support for objtool mcountSami Tolvanen
This change adds build support for using objtool to generate __mcount_loc sections. Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2021-02-23Merge branches 'pm-cpufreq' and 'pm-opp'Rafael J. Wysocki
* pm-cpufreq: cpufreq: Fix typo in kerneldoc comment cpufreq: schedutil: Remove update_lock comment from struct sugov_policy definition cpufreq: schedutil: Remove needless sg_policy parameter from ignore_dl_rate_limit() cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks * pm-opp: opp: Don't skip freq update for different frequency
2021-02-23Merge tag 'modules-for-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: - Retire EXPORT_UNUSED_SYMBOL() and EXPORT_SYMBOL_GPL_FUTURE(). These export types were introduced between 2006 - 2008. All the of the unused symbols have been long removed and gpl future symbols were converted to gpl quite a long time ago, and I don't believe these export types have been used ever since. So, I think it should be safe to retire those export types now (Christoph Hellwig) - Refactor and clean up some aged code cruft in the module loader (Christoph Hellwig) - Build {,module_}kallsyms_on_each_symbol only when livepatching is enabled, as it is the only caller (Christoph Hellwig) - Unexport find_module() and module_mutex and fix the last module callers to not rely on these anymore. Make module_mutex internal to the module loader (Christoph Hellwig) - Harden ELF checks on module load and validate ELF structures before checking the module signature (Frank van der Linden) - Fix undefined symbol warning for clang (Fangrui Song) - Fix smatch warning (Dan Carpenter) * tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: potential uninitialized return in module_kallsyms_on_each_symbol() module: remove EXPORT_UNUSED_SYMBOL* module: remove EXPORT_SYMBOL_GPL_FUTURE module: move struct symsearch to module.c module: pass struct find_symbol_args to find_symbol module: merge each_symbol_section into find_symbol module: remove each_symbol_in_section module: mark module_mutex static kallsyms: only build {,module_}kallsyms_on_each_symbol when required kallsyms: refactor {,module_}kallsyms_on_each_symbol module: use RCU to synchronize find_module module: unexport find_module and module_mutex drm: remove drm_fb_helper_modinit powerpc/powernv: remove get_cxl_module module: harden ELF info handling module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
2021-02-23Merge tag 'clang-lto-v5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull clang LTO updates from Kees Cook: "Clang Link Time Optimization. This is built on the work done preparing for LTO by arm64 folks, tracing folks, etc. This includes the core changes as well as the remaining pieces for arm64 (LTO has been the default build method on Android for about 3 years now, as it is the prerequisite for the Control Flow Integrity protections). While x86 LTO enablement is done, it depends on some pending objtool clean-ups. It's possible that I'll send a "part 2" pull request for LTO that includes x86 support. For merge log posterity, and as detailed in commit dc5723b02e52 ("kbuild: add support for Clang LTO"), here is the lt;dr to do an LTO build: make LLVM=1 LLVM_IAS=1 defconfig scripts/config -e LTO_CLANG_THIN make LLVM=1 LLVM_IAS=1 (To do a cross-compile of arm64, add "CROSS_COMPILE=aarch64-linux-gnu-" and "ARCH=arm64" to the "make" command lines.) Summary: - Clang LTO build infrastructure and arm64-specific enablement (Sami Tolvanen) - Recursive build CC_FLAGS_LTO fix (Alexander Lobakin)" * tag 'clang-lto-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kbuild: prevent CC_FLAGS_LTO self-bloating on recursive rebuilds arm64: allow LTO to be selected arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS arm64: vdso: disable LTO drivers/misc/lkdtm: disable LTO for rodata.o efi/libstub: disable LTO scripts/mod: disable LTO for empty.c modpost: lto: strip .lto from module names PCI: Fix PREL32 relocations for LTO init: lto: fix PREL32 relocations init: lto: ensure initcall ordering kbuild: lto: add a default list of used symbols kbuild: lto: merge module sections kbuild: lto: limit inlining kbuild: lto: fix module versioning kbuild: add support for Clang LTO tracing: move function tracer options to Kconfig
2021-02-22Merge tag 'topic/iomem-mmap-vs-gup-2021-02-22' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm/drm Pull follow_pfn() updates from Daniel Vetter: "Fixes around VM_FPNMAP and follow_pfn: - replace mm/frame_vector.c by get_user_pages in misc/habana and drm/exynos drivers, then move that into media as it's sole user - close race in generic_access_phys - s390 pci ioctl fix of this series landed in 5.11 already - properly revoke iomem mappings (/dev/mem, pci files)" * tag 'topic/iomem-mmap-vs-gup-2021-02-22' of git://anongit.freedesktop.org/drm/drm: PCI: Revoke mappings like devmem PCI: Also set up legacy files only after sysfs init sysfs: Support zapping of binary attr mmaps resource: Move devmem revoke code to resource framework /dev/mem: Only set filp->f_mapping PCI: Obey iomem restrictions for procfs mmap mm: Close race in generic_access_phys media: videobuf2: Move frame_vector into media subsystem mm/frame-vector: Use FOLL_LONGTERM misc/habana: Use FOLL_LONGTERM for userptr misc/habana: Stop using frame_vector helpers drm/exynos: Use FOLL_LONGTERM for g2d cmdlists drm/exynos: Stop using frame_vector helpers
2021-02-22Merge tag 'topic/kcmp-kconfig-2021-02-22' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm/drm Pull kcmp kconfig update from Daniel Vetter: "Make the kcmp syscall available independently of checkpoint/restore. drm userspaces uses this, systemd uses this, so makes sense to pull it out from the checkpoint-restore bundle. Kees reviewed this from security pov and is happy with the final version" Link: https://lwn.net/Articles/845448/ * tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm: kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
2021-02-22Merge branch 'for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull qorkqueue updates from Tejun Heo: "Tracepoint and comment updates only" * 'for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Use %s instead of function name workqueue: tracing the name of the workqueue instead of it's address workqueue: fix annotation for WQ_SYSFS
2021-02-22Merge branch 'for-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing interesting. Just two minor patches" * 'for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: fix typos in comments cgroup: cgroup.{procs,threads} factor out common parts
2021-02-22Merge tag 'trace-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Update to the way irqs and preemption is tracked via the trace event PC field - Fix handling of unregistering event failing due to allocate memory. This is only triggered by failure injection, as it is pretty much guaranteed to have less than a page allocation succeed. - Do not show the useless "filter" or "enable" files for the "ftrace" trace system, as they have no effect on doing anything. - Add a warning if kprobes are registered more than once. - Synthetic events now have their fields parsed by semicolons. Old formats without semicolons will still work, but new features will require them. - New option to allow trace events to show %p without hashing in trace file. The trace file can only be read by root, and reading the raw event buffer did not have any pointers hashed, so this does not expose anything new. - New directory in tools called tools/tracing, where a new tool that reads sequential latency reports from the ftrace latency tracers. - Other minor fixes and cleanups. * tag 'trace-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) kprobes: Fix to delay the kprobes jump optimization tracing/tools: Add the latency-collector to tools directory tracing: Make hash-ptr option default tracing: Add ptr-hash option to show the hashed pointer value tracing: Update the stage 3 of trace event macro comment tracing: Show real address for trace event arguments selftests/ftrace: Add '!event' synthetic event syntax check selftests/ftrace: Update synthetic event syntax errors tracing: Add a backward-compatibility check for synthetic event creation tracing: Update synth command errors tracing: Rework synthetic event command parsing tracing/dynevent: Delegate parsing to create function kprobes: Warn if the kprobe is reregistered ftrace: Remove unused ftrace_force_update() tracepoints: Code clean up tracepoints: Do not punish non static call users tracepoints: Remove unnecessary "data_args" macro parameter tracing: Do not create "enable" or "filter" files for ftrace event subsystem kernel: trace: preemptirq_delay_test: add cpu affinity tracepoint: Do not fail unregistering a probe due to memory failure ...
2021-02-22Merge tag 'kgdb-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Another fairly small set of changes of changes this cycle. The most significant functional change is a fix to better manage the flags when allocating memory. Additionally there is the removal of some unused code (which is slightly more dramatic than it sounds given it means there are now no tasklets in kgdb) together with a tidy up of the debug prints and some spelling corrections for the documentation" * tag 'kgdb-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kgdb: Remove kgdb_schedule_breakpoint() kdb: Make memory allocations more robust kdb: kdb_support: Fix debugging information problem kernel: debug: fix typo issue kgdb: rectify kernel-doc for kgdb_unregister_io_module()
2021-02-22Merge tag 'printk-for-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - New "no_hash_pointers" kernel parameter causes that %p shows raw pointer values instead of hashed ones. It is intended only for debugging purposes. Misuse is prevented by a fat warning message that is inspired by trace_printk(). - Prevent a possible deadlock when flushing printk_safe buffers during panic(). - Fix performance regression caused by the lockless printk ringbuffer. It was visible with huge log buffer and long messages. - Documentation fix-up. * tag 'printk-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: lib/vsprintf: no_hash_pointers prints all addresses as unhashed kselftest: add support for skipped tests lib: use KSTM_MODULE_GLOBALS macro in kselftest drivers printk: avoid prb_first_valid_seq() where possible printk: fix deadlock when kernel panic printk: rectify kernel-doc for prb_rec_init_wr()
2021-02-22Merge branch 'printk-rework' into for-linusPetr Mladek
2021-02-21Merge tag 'seccomp-v5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "Two small seccomp updates. This contains a fix for a build failure that went unnoticed for many years, and a memory barrier correction: - Fix a non-FILTER build failure for some architectures (Paul Cercueil) - Improve performance with correct memory barrier (wanghongzhe)" * tag 'seccomp-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Improve performace by optimizing rmb() seccomp: Add missing return in non-void function
2021-02-21Merge tag 'integrity-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull IMA updates from Mimi Zohar: "New is IMA support for measuring kernel critical data, as per usual based on policy. The first example measures the in memory SELinux policy. The second example measures the kernel version. In addition are four bug fixes to address memory leaks and a missing 'static' function declaration" * tag 'integrity-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: Make function integrity_add_key() static ima: Free IMA measurement buffer after kexec syscall ima: Free IMA measurement buffer on error IMA: Measure kernel version in early boot selinux: include a consumer of the new IMA critical data hook IMA: define a builtin critical data measurement policy IMA: extend critical data hook to limit the measurement based on a label IMA: limit critical data measurement based on a label IMA: add policy rule to measure critical data IMA: define a hook to measure kernel integrity critical data IMA: add support to measure buffer data hash IMA: generalize keyring specific measurement constructs evm: Fix memleak in init_desc
2021-02-21Merge tag 'audit-pr-20210215' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Three very trivial patches for audit this time" * tag 'audit-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: Make audit_filter_syscall() return void audit: Remove leftover reference to the audit_tasklet kernel/audit: convert comma to semicolon
2021-02-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "x86: - Support for userspace to emulate Xen hypercalls - Raise the maximum number of user memslots - Scalability improvements for the new MMU. Instead of the complex "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent, but the code that can run against page faults is limited. Right now only page faults take the lock for reading; in the future this will be extended to some cases of page table destruction. I hope to switch the default MMU around 5.12-rc3 (some testing was delayed due to Chinese New Year). - Cleanups for MAXPHYADDR checks - Use static calls for vendor-specific callbacks - On AMD, use VMLOAD/VMSAVE to save and restore host state - Stop using deprecated jump label APIs - Workaround for AMD erratum that made nested virtualization unreliable - Support for LBR emulation in the guest - Support for communicating bus lock vmexits to userspace - Add support for SEV attestation command - Miscellaneous cleanups PPC: - Support for second data watchpoint on POWER10 - Remove some complex workarounds for buggy early versions of POWER9 - Guest entry/exit fixes ARM64: - Make the nVHE EL2 object relocatable - Cleanups for concurrent translation faults hitting the same page - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Simplification of the early init hypercall handling Non-KVM changes (with acks): - Detection of contended rwlocks (implemented only for qrwlocks, because KVM only needs it for x86) - Allow __DISABLE_EXPORTS from assembly code - Provide a saner follow_pfn replacements for modules" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits) KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes KVM: selftests: Don't bother mapping GVA for Xen shinfo test KVM: selftests: Fix hex vs. decimal snafu in Xen test KVM: selftests: Fix size of memslots created by Xen tests KVM: selftests: Ignore recently added Xen tests' build output KVM: selftests: Add missing header file needed by xAPIC IPI tests KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static locking/arch: Move qrwlock.h include after qspinlock.h KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path KVM: PPC: remove unneeded semicolon KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest KVM: PPC: Book3S HV: Fix radix guest SLB side channel KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR ...
2021-02-21Merge tag 'mips_5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Thomas Bogendoerfer: - added support for Nintendo N64 - added support for Realtek RTL83XX SoCs - kaslr support for Loongson64 - first steps to get rid of set_fs() - DMA runtime coherent/non-coherent selection cleanup - cleanups and fixes * tag 'mips_5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (98 commits) Revert "MIPS: Add basic support for ptrace single step" vmlinux.lds.h: catch more UBSAN symbols into .data MIPS: kernel: Drop kgdb_call_nmi_hook MAINTAINERS: Add git tree for KVM/mips MIPS: Use common way to parse elfcorehdr MIPS: Simplify EVA cache handling Revert "MIPS: kernel: {ftrace,kgdb}: Set correct address limit for cache flushes" MIPS: remove CONFIG_DMA_PERDEV_COHERENT MIPS: remove CONFIG_DMA_MAYBE_COHERENT driver core: lift dma_default_coherent into common code MIPS: refactor the runtime coherent vs noncoherent DMA indicators MIPS/alchemy: factor out the DMA coherent setup MIPS/malta: simplify plat_setup_iocoherency MIPS: Add basic support for ptrace single step MAINTAINERS: replace non-matching patterns for loongson{2,3} MIPS: Make check condition for SDBBP consistent with EJTAG spec mips: Replace lkml.org links with lore Revert "MIPS: microMIPS: Fix the judgment of mm_jr16_op and mm_jalr_op" MIPS: crash_dump.c: Simplify copy_oldmem_page() Revert "mips: Manually call fdt_init_reserved_mem() method" ...
2021-02-21Merge tag 'perf-core-2021-02-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance event updates from Ingo Molnar: - Add CPU-PMU support for Intel Sapphire Rapids CPUs - Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer two-parameter sampling event feedback. Not used yet, but is intended for Golden Cove CPU-PMU, which can provide both the instruction latency and the cache latency information for memory profiling events. - Remove experimental, default-disabled perfmon-v4 counter_freezing support that could only be enabled via a boot option. The hardware is hopelessly broken, we'd like to make sure nobody starts relying on this, as it would only end in tears. - Fix energy/power events on Intel SPR platforms - Simplify the uprobes resume_execution() logic - Misc smaller fixes. * tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/rapl: Fix psys-energy event on Intel SPR platform perf/x86/rapl: Only check lower 32bits for RAPL energy counters perf/x86/rapl: Add msr mask support perf/x86/kvm: Add Cascade Lake Xeon steppings to isolation_ucodes[] perf/x86/intel: Support CPUID 10.ECX to disable fixed counters perf/x86/intel: Add perf core PMU support for Sapphire Rapids perf/x86/intel: Filter unsupported Topdown metrics event perf/x86/intel: Factor out intel_update_topdown_event() perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT perf/intel: Remove Perfmon-v4 counter_freezing support x86/perf: Use static_call for x86_pmu.guest_get_msrs perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info perf/x86/intel/uncore: Store the logical die id instead of the physical die id. x86/kprobes: Do not decode opcode in resume_execution()
2021-02-21Merge tag 'sched-core-2021-02-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove *USER_*PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups" * tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) sched,x86: Allow !PREEMPT_DYNAMIC entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point entry: Explicitly flush pending rcuog wakeup before last rescheduling point rcu/nocb: Trigger self-IPI on late deferred wake up before user resume rcu/nocb: Perform deferred wake up before last idle's need_resched() check rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers sched/features: Distinguish between NORMAL and DEADLINE hrtick sched/features: Fix hrtick reprogramming sched/deadline: Reduce rq lock contention in dl_add_task_root_domain() uprobes: (Re)add missing get_uprobe() in __find_uprobe() smp: Process pending softirqs in flush_smp_call_function_from_idle() sched: Harden PREEMPT_DYNAMIC static_call: Allow module use without exposing static_call_key sched: Add /debug/sched_preempt preempt/dynamic: Support dynamic preempt with preempt= boot option preempt/dynamic: Provide irqentry_exit_cond_resched() static call preempt/dynamic: Provide preempt_schedule[_notrace]() static calls preempt/dynamic: Provide cond_resched() and might_resched() static calls preempt: Introduce CONFIG_PREEMPT_DYNAMIC static_call: Provide DEFINE_STATIC_CALL_RET0() ...
2021-02-21Merge tag 'locking-core-2021-02-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Core locking primitives updates: - Remove mutex_trylock_recursive() from the API - no users left - Simplify + constify the futex code a bit Lockdep updates: - Teach lockdep about local_lock_t - Add CONFIG_DEBUG_IRQFLAGS=y debug config option to check for potentially unsafe IRQ mask restoration patterns. (I.e. calling raw_local_irq_restore() with IRQs enabled.) - Add wait context self-tests - Fix graph lock corner case corrupting internal data structures - Fix noinstr annotations LKMM updates: - Simplify the litmus tests - Documentation fixes KCSAN updates: - Re-enable KCSAN instrumentation in lib/random32.c Misc fixes: - Don't branch-trace static label APIs - DocBook fix - Remove stale leftover empty file" * tag 'locking-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) checkpatch: Don't check for mutex_trylock_recursive() locking/mutex: Kill mutex_trylock_recursive() s390: Use arch_local_irq_{save,restore}() in early boot code lockdep: Noinstr annotate warn_bogus_irq_restore() locking/lockdep: Avoid unmatched unlock locking/rwsem: Remove empty rwsem.h locking/rtmutex: Add missing kernel-doc markup futex: Remove unneeded gotos futex: Change utime parameter to be 'const ... *' lockdep: report broken irq restoration jump_label: Do not profile branch annotations locking: Add Reviewers locking/selftests: Add local_lock inversion tests locking/lockdep: Exclude local_lock_t from IRQ inversions locking/lockdep: Clean up check_redundant() a bit locking/lockdep: Add a skip() function to __bfs() locking/lockdep: Mark local_lock_t locking/selftests: More granular debug_locks_verbose lockdep/selftest: Add wait context selftests tools/memory-model: Fix typo in klitmus7 compatibility table ...
2021-02-21Merge tag 'core-rcu-2021-02-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "These are the latest RCU updates for v5.12: - Documentation updates. - Miscellaneous fixes. - kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return addresses to more easily locate bugs. This has a couple of RCU-related commits, but is mostly MM. Was pulled in with akpm's agreement. - Per-callback-batch tracking of numbers of callbacks, which enables better debugging information and smarter reactions to large numbers of callbacks. - The first round of changes to allow CPUs to be runtime switched from and to callback-offloaded state. - CONFIG_PREEMPT_RT-related changes. - RCU CPU stall warning updates. - Addition of polling grace-period APIs for SRCU. - Torture-test and torture-test scripting updates, including a "torture everything" script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale. Plus does an allmodconfig build. - nolibc fixes for the torture tests" * tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits) percpu_ref: Dump mem_dump_obj() info upon reference-count underflow rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback mm: Make mem_obj_dump() vmalloc() dumps include start and length mm: Make mem_dump_obj() handle vmalloc() memory mm: Make mem_dump_obj() handle NULL and zero-sized pointers mm: Add mem_dump_obj() to print source of memory block tools/rcutorture: Fix position of -lgcc in mkinitrd.sh tools/nolibc: Fix position of -lgcc in the documented example tools/nolibc: Emit detailed error for missing alternate syscall number definitions tools/nolibc: Remove incorrect definitions of __ARCH_WANT_* tools/nolibc: Get timeval, timespec and timezone from linux/time.h tools/nolibc: Implement poll() based on ppoll() tools/nolibc: Implement fork() based on clone() tools/nolibc: Make getpgrp() fall back to getpgid(0) tools/nolibc: Make dup2() rely on dup3() when available tools/nolibc: Add the definition for dup() rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01 torture: Maintain torture-specific set of CPUs-online books torture: Clean up after torture-test CPU hotplugging rcutorture: Make object_debug also double call_rcu() heap object ...
2021-02-21Merge tag 'timers-core-2021-02-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Time and timer updates: - Instead of new drivers remove tango, sirf, u300 and atlas drivers - Add suspend/resume support for microchip pit64b - The usual fixes, improvements and cleanups here and there" * tag 'timers-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timens: Delete no-op time_ns_init() alarmtimer: Update kerneldoc clocksource/drivers/timer-microchip-pit64b: Add clocksource suspend/resume clocksource/drivers/prima: Remove sirf prima driver clocksource/drivers/atlas: Remove sirf atlas driver clocksource/drivers/tango: Remove tango driver clocksource/drivers/u300: Remove the u300 driver dt-bindings: timer: nuvoton: Clarify that interrupt of timer 0 should be specified clocksource/drivers/davinci: Move pr_fmt() before the includes clocksource/drivers/efm32: Drop unused timer code
2021-02-21Merge tag 'irq-core-2021-02-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the irq subsystem: - The usual new irq chip driver (Realtek RTL83xx) - Removal of sirfsoc and tango irq chip drivers - Conversion of the sun6i chip support to hierarchical irq domains - The usual fixes, improvements and cleanups all over the place" * tag 'irq-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/imx: IMX_INTMUX should not default to y, unconditionally irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap irqchip/csky-mpintc: Prevent selection on unsupported platforms irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller dt-bindings: interrupt-controller: Add Realtek RTL838x/RTL839x support irqchip/ls-extirq: add IRQCHIP_SKIP_SET_WAKE to the irqchip flags genirq: Use new tasklet API for resend_tasklet dt-bindings: qcom,pdc: Add compatible for SM8350 dt-bindings: qcom,pdc: Add compatible for SM8250 irqchip/sun6i-r: Add wakeup support irqchip/sun6i-r: Use a stacked irqchip driver dt-bindings: irq: sun6i-r: Add a compatible for the H3 dt-bindings: irq: sun6i-r: Split the binding from sun7i-nmi irqchip/gic-v3: Fix typos in PMR/RPR SCR_EL3.FIQ handling explanation irqchip: Remove sirfsoc driver irqchip: Remove sigma tango driver
2021-02-21Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-21Merge tag 'oprofile-removal-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux Pull oprofile and dcookies removal from Viresh Kumar: "Remove oprofile and dcookies support The 'oprofile' user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. The dcookies stuff is only used by the oprofile code. Now that oprofile's support is getting removed from the kernel, there is no need for dcookies as well. Remove kernel's old oprofile and dcookies support" * tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux: fs: Remove dcookies support drivers: Remove CONFIG_OPROFILE support arch: xtensa: Remove CONFIG_OPROFILE support arch: x86: Remove CONFIG_OPROFILE support arch: sparc: Remove CONFIG_OPROFILE support arch: sh: Remove CONFIG_OPROFILE support arch: s390: Remove CONFIG_OPROFILE support arch: powerpc: Remove oprofile arch: powerpc: Stop building and using oprofile arch: parisc: Remove CONFIG_OPROFILE support arch: mips: Remove CONFIG_OPROFILE support arch: microblaze: Remove CONFIG_OPROFILE support arch: ia64: Remove rest of perfmon support arch: ia64: Remove CONFIG_OPROFILE support arch: hexagon: Don't select HAVE_OPROFILE arch: arc: Remove CONFIG_OPROFILE support arch: arm: Remove CONFIG_OPROFILE support arch: alpha: Remove CONFIG_OPROFILE support
2021-02-21Merge branch 'work.elf-compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ELF compat updates from Al Viro: "Sanitizing ELF compat support, especially for triarch architectures: - X32 handling cleaned up - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now - Kconfig side of things regularized Eventually I hope to have compat_binfmt_elf.c killed, with both native and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed by kbuild, but that's a separate story - not included here" * 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of COMPAT_ELF_EXEC_PAGESIZE compat_binfmt_elf: don't bother with undef of ELF_ARCH Kconfig: regularize selection of CONFIG_BINFMT_ELF mips compat: switch to compat_binfmt_elf.c mips: don't bother with ELF_CORE_EFLAGS mips compat: don't bother with ELF_ET_DYN_BASE mips: KVM_GUEST makes no sense for 64bit builds... mips: kill unused definitions in binfmt_elf[on]32.c mips binfmt_elf*32.c: use elfcore-compat.h x32: make X32, !IA32_EMULATION setups able to execute x32 binaries [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly elf_prstatus: collect the common part (everything before pr_reg) into a struct binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
2021-02-20Merge tag 'pm-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add a new power capping facility allowing aggregate power constraints to be applied to sets of devices in a distributed manner, add a new CPU ID to the RAPL power capping driver and improve it, drop a cpufreq driver belonging to a platform that is not supported any more, drop two redundant cpufreq driver flags, update cpufreq drivers (intel_pstate, brcmstb-avs, qcom-hw), update the operating performance points (OPP) framework (code cleanups, new helpers, devfreq-related modifications), clean up devfreq, extend the PM clock layer, update the cpupower utility and make assorted janitorial changes. Specifics: - Add new power capping facility called DTPM (Dynamic Thermal Power Management), based on the existing power capping framework, to allow aggregate power constraints to be applied to sets of devices in a distributed manner, along with a CPU backend driver based on the Energy Model (Daniel Lezcano, Dan Carpenter, Colin Ian King). - Add AlderLake Mobile support to the Intel RAPL power capping driver and make it use the topology interface when laying out the system topology (Zhang Rui, Yunfeng Ye). - Drop the cpufreq tango driver belonging to a platform that is not supported any more (Arnd Bergmann). - Drop the redundant CPUFREQ_STICKY and CPUFREQ_PM_NO_WARN cpufreq driver flags (Viresh Kumar). - Update cpufreq drivers: * Fix max CPU frequency discovery in the intel_pstate driver and make janitorial changes in it (Chen Yu, Rafael Wysocki, Nigel Christian). * Fix resource leaks in the brcmstb-avs-cpufreq driver (Christophe JAILLET). * Make the tegra20 driver use the resource-managed API (Dmitry Osipenko). * Enable boost support in the qcom-hw driver (Shawn Guo). - Update the operating performance points (OPP) framework: * Clean up the OPP core (Dmitry Osipenko, Viresh Kumar). * Extend the OPP API by adding new helpers to it (Dmitry Osipenko, Viresh Kumar). * Allow required OPPs to be used for devfreq devices and update the devfreq governor code accordingly (Saravana Kannan). * Prepare the framework for introducing new dev_pm_opp_set_opp() helper (Viresh Kumar). * Drop dev_pm_opp_set_bw() and update related drivers (Viresh Kumar). * Allow lazy linking of required-OPPs (Viresh Kumar). - Simplify and clean up devfreq somewhat (Lukasz Luba, Yang Li, Pierre Kuo). - Update the generic power domains (genpd) framework: * Use device's next wakeup to determine domain idle state (Lina Iyer). * Improve initialization and debug (Dmitry Osipenko). * Simplify computations (Abaci Team). - Make janitorial changes in the core code handling system sleep and PM-runtime (Bhaskar Chowdhury, Bjorn Helgaas, Rikard Falkeborn, Zqiang). - Update the MAINTAINERS entry for the exynos cpuidle driver and drop DEBUG definition from intel_idle (Krzysztof Kozlowski, Tom Rix). - Extend the PM clock layer to cover clocks that must sleep (Nicolas Pitre). - Update the cpupower utility: * Update cpupower command, add support for AMD family 0x19 and clean up the code to remove many of the family checks to make future family updates easier (Nathan Fontenot, Robert Richter). * Add Makefile dependencies for install targets to allow building cpupower in parallel rather than serially (Ivan Babrou). - Make janitorial changes in power management Kconfig (Lukasz Luba)" * tag 'pm-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits) MAINTAINERS: cpuidle: exynos: include header in file pattern powercap: intel_rapl: Use topology interface in rapl_init_domains() powercap: intel_rapl: Use topology interface in rapl_add_package() PM: sleep: Constify static struct attribute_group PM: Kconfig: remove unneeded "default n" options PM: EM: update Kconfig description and drop "default n" option cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN cpufreq: Remove CPUFREQ_STICKY flag PM / devfreq: Add required OPPs support to passive governor PM / devfreq: Cache OPP table reference in devfreq OPP: Add function to look up required OPP's for a given OPP PM / devfreq: rk3399_dmc: Remove unneeded semicolon opp: Replace ENOTSUPP with EOPNOTSUPP opp: Fix "foo * bar" should be "foo *bar" opp: Don't ignore clk_get() errors other than -ENOENT opp: Update bandwidth requirements based on scaling up/down opp: Allow lazy-linking of required-opps opp: Remove dev_pm_opp_set_bw() devfreq: tegra30: Migrate to dev_pm_opp_set_opp() drm: msm: Migrate to dev_pm_opp_set_opp() ...
2021-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: "Here is what we have this merge window: 1) Support SW steering for mlx5 Connect-X6Dx, from Yevgeny Kliteynik. 2) Add RSS multi group support to octeontx2-pf driver, from Geetha Sowjanya. 3) Add support for KS8851 PHY. From Marek Vasut. 4) Add support for GarfieldPeak bluetooth controller from Kiran K. 5) Add support for half-duplex tcan4x5x can controllers. 6) Add batch skb rx processing to bcrm63xx_enet, from Sieng Piaw Liew. 7) Rework RX port offload infrastructure, particularly wrt, UDP tunneling, from Jakub Kicinski. 8) Add BCM72116 PHY support, from Florian Fainelli. 9) Remove Dsa specific notifiers, they are unnecessary. From Vladimir Oltean. 10) Add support for picosecond rx delay in dwmac-meson8b chips. From Martin Blumenstingl. 11) Support TSO on xfrm interfaces from Eyal Birger. 12) Add support for MP_PRIO to mptcp stack, from Geliang Tang. 13) Support BCM4908 integrated switch, from Rafał Miłecki. 14) Support for directly accessing kernel module variables via module BTF info, from Andrii Naryiko. 15) Add DASH (esktop and mobile Architecture for System Hardware) support to r8169 driver, from Heiner Kallweit. 16) Add rx vlan filtering to dpaa2-eth, from Ionut-robert Aron. 17) Add support for 100 base0x SFP devices, from Bjarni Jonasson. 18) Support link aggregation in DSA, from Tobias Waldekranz. 19) Support for bitwidse atomics in bpf, from Brendan Jackman. 20) SmartEEE support in at803x driver, from Russell King. 21) Add support for flow based tunneling to GTP, from Pravin B Shelar. 22) Allow arbitrary number of interconnrcts in ipa, from Alex Elder. 23) TLS RX offload for bonding, from Tariq Toukan. 24) RX decap offklload support in mac80211, from Felix Fietkou. 25) devlink health saupport in octeontx2-af, from George Cherian. 26) Add TTL attr to SCM_TIMESTAMP_OPT_STATS, from Yousuk Seung 27) Delegated actionss support in mptcp, from Paolo Abeni. 28) Support receive timestamping when doin zerocopy tcp receive. From Arjun Ray. 29) HTB offload support for mlx5, from Maxim Mikityanskiy. 30) UDP GRO forwarding, from Maxim Mikityanskiy. 31) TAPRIO offloading in dsa hellcreek driver, from Kurt Kanzenbach. 32) Weighted random twos choice algorithm for ipvs, from Darby Payne. 33) Fix netdev registration deadlock, from Johannes Berg. 34) Various conversions to new tasklet api, from EmilRenner Berthing. 35) Bulk skb allocations in veth, from Lorenzo Bianconi. 36) New ethtool interface for lane setting, from Danielle Ratson. 37) Offload failiure notifications for routes, from Amit Cohen. 38) BCM4908 support, from Rafał Miłecki. 39) Support several new iwlwifi chips, from Ihab Zhaika. 40) Flow drector support for ipv6 in i40e, from Przemyslaw Patynowski. 41) Support for mhi prrotocols, from Loic Poulain. 42) Optimize bpf program stats. 43) Implement RFC6056, for better port randomization, from Eric Dumazet. 44) hsr tag offloading support from George McCollister. 45) Netpoll support in qede, from Bhaskar Upadhaya. 46) 2005/400g speed support in bonding 3ad mode, from Nikolay Aleksandrov. 47) Netlink event support in mptcp, from Florian Westphal. 48) Better skbuff caching, from Alexander Lobakin. 49) MRP (Media Redundancy Protocol) offloading in DSA and a few drivers, from Horatiu Vultur. 50) mqprio saupport in mvneta, from Maxime Chevallier. 51) Remove of_phy_attach, no longer needed, from Florian Fainelli" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1766 commits) octeontx2-pf: Fix otx2_get_fecparam() cteontx2-pf: cn10k: Prevent harmless double shift bugs net: stmmac: Add PCI bus info to ethtool driver query output ptp: ptp_clockmatrix: clean-up - parenthesis around a == b are unnecessary ptp: ptp_clockmatrix: Simplify code - remove unnecessary `err` variable. ptp: ptp_clockmatrix: Coding style - tighten vertical spacing. ptp: ptp_clockmatrix: Clean-up dev_*() messages. ptp: ptp_clockmatrix: Remove unused header declarations. ptp: ptp_clockmatrix: Add alignment of 1 PPS to idtcm_perout_enable. ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock. net: stmmac: dwmac-sun8i: Add a shutdown callback net: stmmac: dwmac-sun8i: Minor probe function cleanup net: stmmac: dwmac-sun8i: Use reset_control_reset net: stmmac: dwmac-sun8i: Remove unnecessary PHY power check net: stmmac: dwmac-sun8i: Return void from PHY unpower r8169: use macro pm_ptr net: mdio: Remove of_phy_attach() net: mscc: ocelot: select PACKING in the Kconfig net: re-solve some conflicts after net -> net-next merge net: dsa: tag_rtl4_a: Support also egress tags ...
2021-02-19kprobes: Fix to delay the kprobes jump optimizationMasami Hiramatsu
Commit 36dadef23fcc ("kprobes: Init kprobes in early_initcall") moved the kprobe setup in early_initcall(), which includes kprobe jump optimization. The kprobes jump optimizer involves synchronize_rcu_tasks() which depends on the ksoftirqd and rcu_spawn_tasks_*(). However, since those are setup in core_initcall(), kprobes jump optimizer can not run at the early_initcall(). To avoid this issue, make the kprobe optimization disabled in the early_initcall() and enables it in subsys_initcall(). Note that non-optimized kprobes is still available after early_initcall(). Only jump optimization is delayed. Link: https://lkml.kernel.org/r/161365856280.719838.12423085451287256713.stgit@devnote2 Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: RCU <rcu@vger.kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daniel Axtens <dja@axtens.net> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Theodore Y . Ts'o" <tytso@mit.edu> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: stable@vger.kernel.org Reported-by: Paul E. McKenney <paulmck@kernel.org> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Uladzislau Rezki <urezki@gmail.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-19cpufreq: schedutil: Remove update_lock comment from struct sugov_policy ↵Yue Hu
definition Currently, update_lock is also used in sugov_update_single_freq(). The comment is not helpful anymore. Signed-off-by: Yue Hu <huyue2@yulong.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-19cpufreq: schedutil: Remove needless sg_policy parameter from ↵Yue Hu
ignore_dl_rate_limit() Since sg_policy is a member of struct sugov_cpu. Also remove the local variable in sugov_update_single_common() to make the code more clean. Signed-off-by: Yue Hu <huyue2@yulong.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Minor subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-17entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling pointFrederic Weisbecker
Following the idle loop model, cleanly check for pending rcuog wakeup before the last rescheduling point upon resuming to guest mode. This way we can avoid to do it from rcu_user_enter() with the last resort self-IPI hack that enforces rescheduling. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org
2021-02-17entry: Explicitly flush pending rcuog wakeup before last rescheduling pointFrederic Weisbecker
Following the idle loop model, cleanly check for pending rcuog wakeup before the last rescheduling point on resuming to user mode. This way we can avoid to do it from rcu_user_enter() with the last resort self-IPI hack that enforces rescheduling. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-5-frederic@kernel.org
2021-02-17rcu/nocb: Trigger self-IPI on late deferred wake up before user resumeFrederic Weisbecker
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Unfortunately the call to rcu_user_enter() is already past the last rescheduling opportunity before we resume to userspace or to guest mode. We may escape there with the woken task ignored. The ultimate resort to fix every callsites is to trigger a self-IPI (nohz_full depends on arch to implement arch_irq_work_raise()) that will trigger a reschedule on IRQ tail or guest exit. Eventually every site that want a saner treatment will need to carefully place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit need_resched() check upon resume. Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-4-frederic@kernel.org
2021-02-17rcu/nocb: Perform deferred wake up before last idle's need_resched() checkFrederic Weisbecker
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Usually a local wake up happening while running the idle task is handled in one of the need_resched() checks carefully placed within the idle loop that can break to the scheduler. Unfortunately the call to rcu_idle_enter() is already beyond the last generic need_resched() check and we may halt the CPU with a resched request unhandled, leaving the task hanging. Fix this with splitting the rcuog wakeup handling from rcu_idle_enter() and place it before the last generic need_resched() check in the idle loop. It is then assumed that no call to call_rcu() will be performed after that in the idle loop until the CPU is put in low power mode. Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org
2021-02-17rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callersFrederic Weisbecker
Deferred wakeup of rcuog kthreads upon RCU idle mode entry is going to be handled differently whether initiated by idle, user or guest. Prepare with pulling that control up to rcu_eqs_enter() callers. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-2-frederic@kernel.org
2021-02-17sched/features: Distinguish between NORMAL and DEADLINE hrtickJuri Lelli
The HRTICK feature has traditionally been servicing configurations that need precise preemptions point for NORMAL tasks. More recently, the feature has been extended to also service DEADLINE tasks with stringent runtime enforcement needs (e.g., runtime < 1ms with HZ=1000). Enabling HRTICK sched feature currently enables the additional timer and task tick for both classes, which might introduced undesired overhead for no additional benefit if one needed it only for one of the cases. Separate HRTICK sched feature in two (and leave the traditional case name unmodified) so that it can be selectively enabled when needed. With: $ echo HRTICK > /sys/kernel/debug/sched_features the NORMAL/fair hrtick gets enabled. With: $ echo HRTICK_DL > /sys/kernel/debug/sched_features the DEADLINE hrtick gets enabled. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210208073554.14629-3-juri.lelli@redhat.com
2021-02-17sched/features: Fix hrtick reprogrammingJuri Lelli
Hung tasks and RCU stall cases were reported on systems which were not 100% busy. Investigation of such unexpected cases (no sign of potential starvation caused by tasks hogging the system) pointed out that the periodic sched tick timer wasn't serviced anymore after a certain point and that caused all machinery that depends on it (timers, RCU, etc.) to stop working as well. This issues was however only reproducible if HRTICK was enabled. Looking at core dumps it was found that the rbtree of the hrtimer base used also for the hrtick was corrupted (i.e. next as seen from the base root and actual leftmost obtained by traversing the tree are different). Same base is also used for periodic tick hrtimer, which might get "lost" if the rbtree gets corrupted. Much alike what described in commit 1f71addd34f4c ("tick/sched: Do not mess with an enqueued hrtimer") there is a race window between hrtimer_set_expires() in hrtick_start and hrtimer_start_expires() in __hrtick_restart() in which the former might be operating on an already queued hrtick hrtimer, which might lead to corruption of the base. Use hrtick_start() (which removes the timer before enqueuing it back) to ensure hrtick hrtimer reprogramming is entirely guarded by the base lock, so that no race conditions can occur. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210208073554.14629-2-juri.lelli@redhat.com