summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2024-05-08codetag: rcu-ify idx to codetag lookupmemalloc-prof-v7Kent Overstreet
This substantially reworks how we store and iterate over codetags in multiple modules. We now provide a stable index for each codetag, which doesn't change as modules are loaded and unloaded - meaning on module load, we find a free range of indices for the new module. Index-to-codetag lookup uses an eytzinger tree, and we use RCU for the eytzinger tree for lockless lookup. The eytzinger tree has one entry per module, where entries store the starting index for each module's range. After eytzinger lookup we use eytzinger_to_inorder(), as the codetag_modules are still a flat array. With the new stable indices, iteration becomes simpler; the iterator just stores the current index and peek() finds the first codetag >= the current index. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08darray: lift from bcachefsKent Overstreet
dynamic arrays - inspired from CCAN darrays, so we don't have to open code them. darray_init() darray_exit() darray_push() darray_make_room() darray_for_each() etc... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-07eytzinger: Promote to include/linux/Kent Overstreet
eytzinger trees are a faster alternative to binary search. They're a bit more expensive to setup, but lookups perform much better assuming the tree isn't entirely in cache. Binary search is a worst case scenario for branch prediction and prefetching, but eytzinger trees have children adjacent in memory and thus we can prefetch before knowing the result of a comparison. An eytzinger tree is a binary tree laid out in an array, with the same geometry as the usual binary heap construction, but used as a search tree instead. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-07stackdepot: use gfp_nested_mask() instead of open coded maskingDave Chinner
The stackdepot code is used by KASAN and lockdep for recoding stack traces. Both of these track allocation context information, and so their internal allocations must obey the caller allocation contexts to avoid generating their own false positive warnings that have nothing to do with the code they are instrumenting/tracking. We also don't want recording stack traces to deplete emergency memory reserves - debug code is useless if it creates new issues that can't be replicated when the debug code is disabled. Switch the stackdepot allocation masking to use gfp_nested_mask() to address these issues. gfp_nested_mask() also strips GFP_ZONEMASK naturally, so that greatly simplifies this code. Link: https://lkml.kernel.org/r/20240430054604.4169568-3-david@fromorbit.com Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-07fooAndrew Morton
2024-05-05xarray: inline xas_descend to improve performanceLong Li
The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes") modified the xas_descend function in such a way that it was no longer being compiled as an inline function, because it increased the size of xas_descend(), and the compiler no longer optimizes it as inline. This had a negative impact on performance, xas_descend is called frequently to traverse downwards in the xarray tree, making it a hot function. Inlining xas_descend has been shown to significantly improve performance by approximately 4.95% in the iozone write test. Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz #iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest Before this patch: kB reclen write rewrite read reread 67108864 16384 2230080 3637689 6315197 5496027 After this patch: kB reclen write rewrite read reread 67108864 16384 2340360 3666175 6272401 5460782 Percentage change: 4.95% 0.78% -0.68% -0.64% This patch introduces inlining to the xas_descend function. While this change increases the size of lib/xarray.o, the performance gains in critical workloads make this an acceptable trade-off. Size comparison before and after patch: .text .data .bss file 0x3502 0 0 lib/xarray.o.before 0x3602 0 0 lib/xarray.o.after Link: https://lkml.kernel.org/r/20240416061628.3768901-1-leo.lilong@huawei.com Signed-off-by: Long Li <leo.lilong@huawei.com> Cc: Hou Tao <houtao1@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: yangerkun <yangerkun@huawei.com> Cc: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05XArray: set the marks correctly when splitting an entryMatthew Wilcox (Oracle)
If we created a new node to replace an entry which had search marks set, we were setting the search mark on every entry in that node. That works fine when we're splitting to order 0, but when splitting to a larger order, we must not set the search marks on the sibling entries. Link: https://lkml.kernel.org/r/20240501153120.4094530-1-willy@infradead.org Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/ZjFGCOYk3FK_zVy3@bombadil.infradead.org Tested-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05lib/test_xarray.c: fix error assumptions on check_xa_multi_store_adv_add()Luis Chamberlain
While testing lib/test_xarray in userspace I've noticed we can fail with: make -C tools/testing/radix-tree ./tools/testing/radix-tree/xarray BUG at check_xa_multi_store_adv_add:749 xarray: 0x55905fb21a00x head 0x55905fa1d8e0x flags 0 marks 0 0 0 0: 0x55905fa1d8e0x xarray: ../../../lib/test_xarray.c:749: check_xa_multi_store_adv_add: Assertion `0' failed. Aborted We get a failure with a BUG_ON(), and that is because we actually can fail due to -ENOMEM, the check in xas_nomem() will fix this for us so it makes no sense to expect no failure inside the loop. So modify the check and since this is also useful for instructional purposes clarify the situation. The check for XA_BUG_ON(xa, xa_load(xa, index) != p) is already done at the end of the loop so just remove the bogus on inside the loop. With this we now pass the test in both kernel and userspace: In userspace: ./tools/testing/radix-tree/xarray XArray: 149092856 of 149092856 tests passed In kernel space: XArray: 148257077 of 148257077 tests passed Link: https://lkml.kernel.org/r/20240423192221.301095-3-mcgrof@kernel.org Fixes: a60cc288a1a2 ("test_xarray: add tests for advanced multi-index use") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05maple_tree: fix mas_empty_area_rev() null pointer dereferenceLiam R. Howlett
Currently the code calls mas_start() followed by mas_data_end() if the maple state is MA_START, but mas_start() may return with the maple state node == NULL. This will lead to a null pointer dereference when checking information in the NULL node, which is done in mas_data_end(). Avoid setting the offset if there is no node by waiting until after the maple state is checked for an empty or single entry state. A user could trigger the events to cause a kernel oops by unmapping all vmas to produce an empty maple tree, then mapping a vma that would cause the scenario described above. Link: https://lkml.kernel.org/r/20240422203349.2418465-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Marius Fleischer <fleischermarius@gmail.com> Closes: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW2-R6w@mail.gmail.com/ Link: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW2-R6w@mail.gmail.com/ Tested-by: Marius Fleischer <fleischermarius@gmail.com> Tested-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25mm/filemap: optimize filemap folio addingKairui Song
Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order. Drop the lock and do the allocation only if a split is needed. In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranged conflict check and order retrieving, one for the second check after allocation, one for the insert after split). Testing with 4K pages, in an 8G cgroup, with 16G brd as block device: echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting Before: bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691 iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691 After (+7.3%): bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651 iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651 Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting): echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap -thp=1 --readonly \ --rw=randread --time_based --ramp_time=30s --runtime=10m \ --group_reporting fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap \ --rw=randread --time_based --runtime=5s --group_reporting Before: bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976 iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976· READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec After (+12.5%): bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146 iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146 READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read. Link: https://lkml.kernel.org/r/20240415171857.19244-5-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib/xarray: introduce a new helper xas_get_orderKairui Song
It can be used after xas_load to check the order of loaded entries. Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk. Added new test for xas_get_order, to make the test work, we have to export xas_get_order with EXPORT_SYMBOL_GPL. Also fix a sparse warning by checking the slot value with xa_entry instead of accessing it directly, as suggested by Matthew Wilcox. [kasong@tencent.com: simplify comment, sparse warning fix, per Matthew Wilcox] Link: https://lkml.kernel.org/r/20240416071722.45997-4-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20240415171857.19244-4-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25alloc_tag: Tighten file permissions on /proc/allocinfoKees Cook
The /proc/allocinfo file exposes a tremendous about of information about kernel build details, memory allocations (obviously), and potentially even image layout (due to ordering). As this is intended to be consumed by system owners (like /proc/slabinfo), use the same file permissions as there: 0400. Link: https://lkml.kernel.org/r/20240425200844.work.184-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: add memory allocations report in show_mem()Suren Baghdasaryan
Include allocations in show_mem reports. Link: https://lkml.kernel.org/r/20240321163705.3067592-33-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25rhashtable: plumb through alloc tagKent Overstreet
This gives better memory allocation profiling results; rhashtable allocations will be accounted to the code that initialized the rhashtable. [surenb@google.com: undo _noprof additions in the documentation] Link: https://lkml.kernel.org/r/20240326231453.1206227-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-32-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: add codetag reference into slabobj_extSuren Baghdasaryan
To store code tag for every slab object, a codetag reference is embedded into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y. Link: https://lkml.kernel.org/r/20240321163705.3067592-23-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: introduce early boot parameter to avoid page_ext memory overheadSuren Baghdasaryan
The highest memory overhead from memory allocation profiling comes from page_ext objects. This overhead exists even if the feature is disabled but compiled-in. To avoid it, introduce an early boot parameter that prevents page_ext object creation. The new boot parameter is a tri-state with possible values of 0|1|never. When it is set to "never" the memory allocation profiling support is disabled, and overhead is minimized (currently no page_ext objects are allocated, in the future more overhead might be eliminated). As a result we also lose ability to enable memory allocation profiling at runtime (because there is no space to store alloctag references). Runtime sysctrl becomes read-only if the early boot parameter was set to "never". Note that the default value of this boot parameter depends on the CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT configuration. When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n the boot parameter is set to "never", therefore eliminating any overhead. CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y results in boot parameter being set to 1 (enabled). This allows distributions to avoid any overhead by setting CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n config and with no changes to the kernel command line. We reuse sysctl.vm.mem_profiling boot parameter name in order to avoid introducing yet another control. This change turns it into a tri-state early boot parameter. Link: https://lkml.kernel.org/r/20240321163705.3067592-16-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: introduce support for page allocation taggingSuren Baghdasaryan
Introduce helper functions to easily instrument page allocators by storing a pointer to the allocation tag associated with the code that allocated the page in a page_ext field. Link: https://lkml.kernel.org/r/20240321163705.3067592-15-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: add allocation tagging support for memory allocation profilingSuren Baghdasaryan
Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily instrument memory allocators. It registers an "alloc_tags" codetag type with /proc/allocinfo interface to output allocation tag information when the feature is enabled. CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory allocation profiling instrumentation. Memory allocation profiling can be enabled or disabled at runtime using /proc/sys/vm/mem_profiling sysctl when CONFIG_MEM_ALLOC_PROFILING_DEBUG=n. CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT enables memory allocation profiling by default. [surenb@google.com: Documentation/filesystems/proc.rst: fix allocinfo title] Link: https://lkml.kernel.org/r/20240326073813.727090-1-surenb@google.com [surenb@google.com: do limited memory accounting for modules with ARCH_NEEDS_WEAK_PER_CPU] Link: https://lkml.kernel.org/r/20240402180933.1663992-2-surenb@google.com [klarasmodin@gmail.com: explicitly include irqflags.h in alloc_tag.h] Link: https://lkml.kernel.org/r/20240407133252.173636-1-klarasmodin@gmail.com [surenb@google.com: fix alloc_tag_init() to prevent passing NULL to PTR_ERR()] Link: https://lkml.kernel.org/r/20240417003349.2520094-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-14-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Klara Modin <klarasmodin@gmail.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: prevent module unloading if memory is not freedSuren Baghdasaryan
Skip freeing module's data section if there are non-zero allocation tags because otherwise, once these allocations are freed, the access to their code tag would cause UAF. Link: https://lkml.kernel.org/r/20240321163705.3067592-13-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: code tagging module supportSuren Baghdasaryan
Add support for code tagging from dynamically loaded modules. Link: https://lkml.kernel.org/r/20240321163705.3067592-12-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: code tagging frameworkSuren Baghdasaryan
Add basic infrastructure to support code tagging which stores tag common information consisting of the module name, function, file name and line number. Provide functions to register a new code tag type and navigate between code tags. Link: https://lkml.kernel.org/r/20240321163705.3067592-11-surenb@google.com Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib/test_hmm.c: handle src_pfns and dst_pfns allocation failureDuoming Zhou
The kcalloc() in dmirror_device_evict_chunk() will return null if the physical memory has run out. As a result, if src_pfns or dst_pfns is dereferenced, the null pointer dereference bug will happen. Moreover, the device is going away. If the kcalloc() fails, the pages mapping a chunk could not be evicted. So add a __GFP_NOFAIL flag in kcalloc(). Finally, as there is no need to have physically contiguous memory, Switch kcalloc() to kvcalloc() in order to avoid failing allocations. Link: https://lkml.kernel.org/r/20240312005905.9939-1-duoming@zju.edu.cn Fixes: b2ef9f5a5cb3 ("mm/hmm/test: add selftest driver for HMM") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Cc: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-24stackdepot: respect __GFP_NOLOCKDEP allocation flagAndrey Ryabinin
If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ #49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/20240418141133.22950-1-ryabinin.a.a@gmail.com Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Xiubo Li <xiubli@redhat.com> Closes: https://lore.kernel.org/all/a0caa289-ca02-48eb-9bf2-d86fd47b71f4@redhat.com/ Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Closes: https://lore.kernel.org/all/f9ff999a-e170-b66b-7caf-293f2b147ac2@opensource.wdc.com/ Suggested-by: Dave Chinner <david@fromorbit.com> Tested-by: Xiubo Li <xiubli@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-11Merge tag 'net-6.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth. Current release - new code bugs: - netfilter: complete validation of user input - mlx5: disallow SRIOV switchdev mode when in multi-PF netdev Previous releases - regressions: - core: fix u64_stats_init() for lockdep when used repeatedly in one file - ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr - bluetooth: fix memory leak in hci_req_sync_complete() - batman-adv: avoid infinite loop trying to resize local TT - drv: geneve: fix header validation in geneve[6]_xmit_skb - drv: bnxt_en: fix possible memory leak in bnxt_rdma_aux_device_init() - drv: mlx5: offset comp irq index in name by one - drv: ena: avoid double-free clearing stale tx_info->xdpf value - drv: pds_core: fix pdsc_check_pci_health deadlock Previous releases - always broken: - xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING - bluetooth: fix setsockopt not validating user input - af_unix: clear stale u->oob_skb. - nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies - drv: virtio_net: fix guest hangup on invalid RSS update - drv: mlx5e: Fix mlx5e_priv_init() cleanup flow - dsa: mt7530: trap link-local frames regardless of ST Port State" * tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits) net: ena: Set tx_info->xdpf value to NULL net: ena: Fix incorrect descriptor free behavior net: ena: Wrong missing IO completions check order net: ena: Fix potential sign extension issue af_unix: Fix garbage collector racing against connect() net: dsa: mt7530: trap link-local frames regardless of ST Port State Revert "s390/ism: fix receive message buffer allocation" net: sparx5: fix wrong config being used when reconfiguring PCS net/mlx5: fix possible stack overflows net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev net/mlx5e: RSS, Block XOR hash with over 128 channels net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit net/mlx5e: HTB, Fix inconsistencies with QoS SQs number net/mlx5e: Fix mlx5e_priv_init() cleanup flow net/mlx5e: RSS, Block changing channels number when RXFH is configured net/mlx5: Correctly compare pkt reformat ids net/mlx5: Properly link new fs rules into the tree net/mlx5: offset comp irq index in name by one net/mlx5: Register devlink first under devlink lock net/mlx5: E-switch, store eswitch pointer before registering devlink_param ...
2024-04-10Merge tag 'hardening-v6.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - gcc-plugins/stackleak: Avoid .head.text section (Ard Biesheuvel) - ubsan: fix unused variable warning in test module (Arnd Bergmann) - Improve entropy diffusion in randomize_kstack * tag 'hardening-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randomize_kstack: Improve entropy diffusion ubsan: fix unused variable warning in test module gcc-plugins/stackleak: Avoid .head.text section
2024-04-08lib: checksum: hide unused expected_csum_ipv6_magic[]Arnd Bergmann
When CONFIG_NET is disabled, an extra warning shows up for this unused variable: lib/checksum_kunit.c:218:18: error: 'expected_csum_ipv6_magic' defined but not used [-Werror=unused-const-variable=] Replace the #ifdef with an IS_ENABLED() check that makes the compiler's dead-code-elimination take care of the link failure. Fixes: f24a70106dc1 ("lib: checksum: Fix build with CONFIG_NET=n") Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-05stackdepot: rename pool_index to pool_index_plus_1Peter Collingbourne
Commit 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") changed the meaning of the pool_index field to mean "the pool index plus 1". This made the code accessing this field less self-documenting, as well as causing debuggers such as drgn to not be able to easily remain compatible with both old and new kernels, because they typically do that by testing for presence of the new field. Because stackdepot is a debugging tool, we should make sure that it is debugger friendly. Therefore, give the field a different name to improve readability as well as enabling debugger backwards compatibility. This is needed in 6.9, which would otherwise become an odd release with the new semantics and old name so debuggers wouldn't recognize the new semantics there. Fixes: 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88 Signed-off-by: Peter Collingbourne <pcc@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-03ubsan: fix unused variable warning in test moduleArnd Bergmann
This is one of the drivers with an unused variable that is marked 'const'. Adding a __used annotation here avoids the warning and lets us enable the option by default: lib/test_ubsan.c:137:28: error: unused variable 'skip_ubsan_array' [-Werror,-Wunused-const-variable] Fixes: 4a26f49b7b3d ("ubsan: expand tests and reporting") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240403080702.3509288-3-arnd@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-23Merge tag 'hardening-v6.9-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more hardening updates from Kees Cook: - CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck) - Fix needless UTF-8 character in arch/Kconfig (Liu Song) - Improve __counted_by warning message in LKDTM (Nathan Chancellor) - Refactor DEFINE_FLEX() for default use of __counted_by - Disable signed integer overflow sanitizer on GCC < 8 * tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lkdtm/bugs: Improve warning message for compilers without counted_by support overflow: Change DEFINE_FLEX to take __counted_by member Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST" arch/Kconfig: eliminate needless UTF-8 character in Kconfig help ubsan: Disable signed integer overflow sanitizer on GCC < 8
2024-03-22overflow: Change DEFINE_FLEX to take __counted_by memberKees Cook
The norm should be flexible array structures with __counted_by annotations, so DEFINE_FLEX() is updated to expect that. Rename the non-annotated version to DEFINE_RAW_FLEX(), and update the few existing users. Additionally add selftests for the macros. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-22Merge tag 'fbdev-for-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev updates from Helge Deller: - Allow console fonts up to 64x128 pixels (Samuel Thibault) - Prevent division-by-zero in fb monitor code (Roman Smirnov) - Drop Renesas ARM platforms from Mobile LCDC framebuffer driver (Geert Uytterhoeven) - Various code cleanups in viafb, uveafb and mb862xxfb drivers by Aleksandr Burakov, Li Zhijian and Michael Ellerman * tag 'fbdev-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: panel-tpo-td043mtea1: Convert sprintf() to sysfs_emit() fbmon: prevent division by zero in fb_videomode_from_videomode() fbcon: Increase maximum font width x height to 64 x 128 fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2 fbdev: mb862xxfb: Fix defined but not used error fbdev: uvesafb: Convert sprintf/snprintf to sysfs_emit fbdev: Restrict FB_SH_MOBILE_LCDC to SuperH
2024-03-21Merge tag 'kbuild-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Generate a list of built DTB files (arch/*/boot/dts/dtbs-list) - Use more threads when building Debian packages in parallel - Fix warnings shown during the RPM kernel package uninstallation - Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to Makefile - Support GCC's -fmin-function-alignment flag - Fix a null pointer dereference bug in modpost - Add the DTB support to the RPM package - Various fixes and cleanups in Kconfig * tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits) kconfig: tests: test dependency after shuffling choices kconfig: tests: add a test for randconfig with dependent choices kconfig: tests: support KCONFIG_SEED for the randconfig runner kbuild: rpm-pkg: add dtb files in kernel rpm kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig() kconfig: check prompt for choice while parsing kconfig: lxdialog: remove unused dialog colors kconfig: lxdialog: fix button color for blackbg theme modpost: fix null pointer dereference kbuild: remove GCC's default -Wpacked-bitfield-compat flag kbuild: unexport abs_srctree and abs_objtree kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1 kconfig: remove named choice support kconfig: use linked list in get_symbol_str() to iterate over menus kconfig: link menus to a symbol kbuild: fix inconsistent indentation in top Makefile kbuild: Use -fmin-function-alignment when available alpha: merge two entries for CONFIG_ALPHA_GAMMA alpha: merge two entries for CONFIG_ALPHA_EV4 kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj) ...
2024-03-21Merge tag 'driver-core-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core and kernfs changes for 6.9-rc1. Nothing all that crazy here, just some good updates that include: - automatic attribute group hiding from Dan Williams (he fixed up my horrible attempt at doing this.) - kobject lock contention fixes from Eric Dumazet - driver core cleanups from Andy - kernfs rcu work from Tejun - fw_devlink changes to resolve some reported issues - other minor changes, all details in the shortlog All of these have been in linux-next for a long time with no reported issues" * tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits) device: core: Log warning for devices pending deferred probe on timeout driver: core: Use dev_* instead of pr_* so device metadata is added driver: core: Log probe failure as error and with device metadata of: property: fw_devlink: Add support for "post-init-providers" property driver core: Add FWLINK_FLAG_IGNORE to completely ignore a fwnode link driver core: Adds flags param to fwnode_link_add() debugfs: fix wait/cancellation handling during remove device property: Don't use "proxy" headers device property: Move enum dev_dma_attr to fwnode.h driver core: Move fw_devlink stuff to where it belongs driver core: Drop unneeded 'extern' keyword in fwnode.h firmware_loader: Suppress warning on FW_OPT_NO_WARN flag sysfs:Addresses documentation in sysfs_merge_group and sysfs_unmerge_group. firmware_loader: introduce __free() cleanup hanler platform-msi: Remove usage of the deprecated ida_simple_xx() API sysfs: Introduce DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE() sysfs: Document new "group visible" helpers sysfs: Fix crash on empty group attributes array sysfs: Introduce a mechanism to hide static attribute_groups sysfs: Introduce a mechanism to hide static attribute_groups ...
2024-03-21Merge tag 'tty-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver updates from Greg KH: "Here is the big set of TTY/Serial driver updates and cleanups for 6.9-rc1. Included in here are: - more tty cleanups from Jiri - loads of 8250 driver cleanups from Andy - max310x driver updates - samsung serial driver updates - uart_prepare_sysrq_char() updates for many drivers - platform driver remove callback void cleanups - stm32 driver updates - other small tty/serial driver updates All of these have been in linux-next for a long time with no reported issues" * tag 'tty-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits) dt-bindings: serial: stm32: add power-domains property serial: 8250_dw: Replace ACPI device check by a quirk serial: Lock console when calling into driver before registration serial: 8250_uniphier: Switch to use uart_read_port_properties() serial: 8250_tegra: Switch to use uart_read_port_properties() serial: 8250_pxa: Switch to use uart_read_port_properties() serial: 8250_omap: Switch to use uart_read_port_properties() serial: 8250_of: Switch to use uart_read_port_properties() serial: 8250_lpc18xx: Switch to use uart_read_port_properties() serial: 8250_ingenic: Switch to use uart_read_port_properties() serial: 8250_dw: Switch to use uart_read_port_properties() serial: 8250_bcm7271: Switch to use uart_read_port_properties() serial: 8250_bcm2835aux: Switch to use uart_read_port_properties() serial: 8250_aspeed_vuart: Switch to use uart_read_port_properties() serial: port: Introduce a common helper to read properties serial: core: Add UPIO_UNKNOWN constant for unknown port type serial: core: Move struct uart_port::quirks closer to possible values serial: sh-sci: Call sci_serial_{in,out}() directly serial: core: only stop transmit when HW fifo is empty serial: pch: Use uart_prepare_sysrq_char(). ...
2024-03-18Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST"Guenter Roeck
This reverts commit 4acf1de35f41549e60c3c02a8defa7cb95eabdf2. Commit d055c6a2cc16 ("kunit: memcpy: Mark tests as slow using test attributes") marks slow memcpy unit tests as slow. Since this commit, the tests can be disabled with a module parameter, and the configuration option to skip the slow tests is no longer needed. Revert the patch introducing it. Cc: David Gow <davidgow@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240314151200.2285314-1-linux@roeck-us.net Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-18ubsan: Disable signed integer overflow sanitizer on GCC < 8Kees Cook
For opting functions out of sanitizer coverage, the "no_sanitize" attribute is used, but in GCC this wasn't introduced until GCC 8. Disable the sanitizer unless we're not using GCC, or it is GCC version 8 or higher. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202403110643.27JXEVCI-lkp@intel.com/ Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-16Merge tag 'cxl-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds
Pull CXL updates from Dan Williams: "CXL has mechanisms to enumerate the performance characteristics of memory devices. Those mechanisms allow Linux to build the equivalent of ACPI SRAT, SLIT, and HMAT tables dynamically at runtime. That capability is necessary because static ACPI can not represent dynamic CXL configurations (and reconfigurations). So, building on the v6.8 work to add "Quality of Service" enumeration, this update plumbs CXL "access coordinates" (read/write access latency and bandwidth) in all the same places that ACPI HMAT feeds similar data. Follow-on patches from the -mm side can then use that data to feed mechanisms like mm/memory-tiers.c. Greg has acked the touch to drivers/base/. The other feature update this cycle is support for CXL error injection via the ACPI EINJ module. That facility enables injection of bus protocol errors provided the user knows the magic address values to insert in the interface. To hide that magic, and make this easier to use, new error injection attributes were added to CXL debugfs. That interface injects the errors relative to a CXL object rather than require user tooling to know how to lookup and inject RCRB (Root Complex Register Block) addresses into the raw EINJ debugfs interface. It received some helpful review comments from Tony, but no explicit acks from the ACPI side. The primary user visible change for existing EINJ users is that they may find that einj.ko was already loaded by cxl_core.ko. Previously, einj.ko was only loaded on demand. The usual collection of miscellaneous cleanups are also present this cycle. Summary: - Supplement ACPI HMAT reported memory performance with native CXL memory performance enumeration - Add support for CXL error injection via the ACPI EINJ mechanism - Cleanup CXL DOE and CDAT integration - Miscellaneous cleanups and fixes" * tag 'cxl-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits) Documentation/ABI/testing/debugfs-cxl: Fix "Unexpected indentation" lib/firmware_table: Provide buffer length argument to cdat_table_parse() cxl/pci: Get rid of pointer arithmetic reading CDAT table cxl/pci: Rename DOE mailbox handle to doe_mb cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location cxl/core: Add CXL EINJ debugfs files EINJ, Documentation: Update EINJ kernel doc EINJ: Add CXL error type support EINJ: Migrate to a platform driver cxl/region: Deal with numa nodes not enumerated by SRAT cxl/region: Add memory hotplug notifier for cxl region cxl/region: Add sysfs attribute for locality attributes of CXL regions cxl/region: Calculate performance data for a region cxl: Set cxlmd->endpoint before adding port device cxl: Move QoS class to be calculated from the nearest CPU cxl: Split out host bridge access coordinates cxl: Split out combine_coordinates() for common shared usage ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes ACPI: HMAT: Introduce 2 levels of generic port access class base/node / ACPI: Enumerate node access class for 'struct access_coordinate' ...
2024-03-16fbcon: Increase maximum font width x height to 64 x 128Samuel Thibault
By using bitmaps we actually support whatever size we would want, but the console currently limits fonts to 64x128 (which gives 60x16 text on 4k screens), so we don't need more for now, and we can easily increase later. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Helge Deller <deller@gmx.de>
2024-03-15Merge tag 'sparc-for-6.9-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc Pull sparc updates from Andreas Larsson: - Fix missing prototype warnings in various places, including switching to using generic cmpdi2/ucmpdi2 and parport.h and stop selecting unneeded GENERIC_ISA_DMA. - Reduce duplicate code by using shared font data, with dependency fixup in separate commit touching lib/fonts. - Convert sbus drives to use remove callbacks returning void - Fix return values of __setup handlers - Section mismatch fix for grpci pci drivers - Make the vio bus type constant - Kconfig cleanups and fixes - Typo fixes * tag 'sparc-for-6.9-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc: lib/fonts: Allow Sparc console 8x16 font for sparc64 early boot text console sbus: uctrl: Convert to platform remove callback returning void sbus: flash: Convert to platform remove callback returning void sbus: envctrl: Convert to platform remove callback returning void sbus: display7seg: Convert to platform remove callback returning void sbus: bbc_i2c: Convert to platform remove callback returning void sbus: Add prototype for bbc_envctrl_init and bbc_envctrl_cleanup to header sparc32: Fix section mismatch in leon_pci_grpci sparc32: Fix parport build with sparc32 sparc32: Do not select GENERIC_ISA_DMA mtd: maps: sun_uflash: Declare uflash_devinit static sparc32: Fix build with trapbase sparc32: Use generic cmpdi2/ucmpdi2 variants sparc: select FRAME_POINTER instead of redefining it sparc: vDSO: fix return value of __setup handler sparc64: NMI watchdog: fix return value of __setup handler sparc: vio: make vio_bus_type const sparc: Fix typos sparc: Use shared font data sparc: remove obsolete config ARCH_ATU
2024-03-15Merge tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs updates from Kent Overstreet: - Subvolume children btree; this is needed for providing a userspace interface for walking subvolumes, which will come later - Lots of improvements to directory structure checking - Improved journal pipelining, significantly improving performance on high iodepth write workloads - Discard path improvements: the discard path is more efficient, and no longer flushes the journal unnecessarily - Buffered write path can now avoid taking the inode lock - new mm helper: memalloc_flags_{save|restore} - mempool now does kvmalloc mempools * tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs: (128 commits) bcachefs: time_stats: shrink time_stat_buffer for better alignment bcachefs: time_stats: split stats-with-quantiles into a separate structure bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet bcachefs: time_stats: add larger units bcachefs: pull out time_stats.[ch] bcachefs: reconstruct_alloc cleanup bcachefs: fix bch_folio_sector padding bcachefs: Fix btree key cache coherency during replay bcachefs: Always flush write buffer in delete_dead_inodes() bcachefs: Fix order of gc_done passes bcachefs: fix deletion of indirect extents in btree_gc bcachefs: Prefer struct_size over open coded arithmetic bcachefs: Kill unused flags argument to btree_split() bcachefs: Check for writing superblocks with nonsense member seq fields bcachefs: fix bch2_journal_buf_to_text() lib/generic-radix-tree.c: Make nodes more reasonably sized bcachefs: copy_(to|from)_user_errcode() bcachefs: Split out bkey_types.h bcachefs: fix lost journal buf wakeup due to improved pipelining bcachefs: intercept mountoption value for bool type ...
2024-03-14Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-03-14Merge tag 'pci-v6.9-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Consolidate interrupt related code in irq.c (Ilpo Järvinen) - Reduce kernel size by replacing sysfs resource macros with functions (Ilpo Järvinen) - Reduce kernel size by compiling sysfs support only when CONFIG_SYSFS=y (Lukas Wunner) - Avoid using Extended Tags on 3ware-9650SE Root Port to work around an apparent hardware defect (Jörg Wedekind) Resource management: - Fix an MMIO mapping leak in pci_iounmap() (Philipp Stanner) - Move pci_iomap.c and other PCI-specific devres code to drivers/pci (Philipp Stanner) - Consolidate PCI devres code in devres.c (Philipp Stanner) Power management: - Avoid D3cold on Asus B1400 PCI-NVMe bridge, where firmware doesn't know how to return correctly to D0, and remove previous quirk that wasn't as specific (Daniel Drake) - Allow runtime PM when the driver enables it but doesn't need any runtime PM callbacks (Raag Jadav) - Drain runtime-idle callbacks before driver removal to avoid races between .remove() and .runtime_idle(), which caused intermittent page faults when the rtsx .runtime_idle() accessed registers that its .remove() had already unmapped (Rafael J. Wysocki) Virtualization: - Avoid Secondary Bus Reset on LSI FW643 so it can be assigned to VMs with VFIO, e.g., for professional audio software on many Apple machines, at the cost of leaking state between VMs (Edmund Raile) Error handling: - Print all logged TLP Prefixes, not just the first, after AER or DPC errors (Ilpo Järvinen) - Quirk the DPC PIO log size for Intel Raptor Lake Root Ports, which still don't advertise a legal size (Paul Menzel) - Ignore expected DPC Surprise Down errors on hot removal (Smita Koralahalli) - Block runtime suspend while handling AER errors to avoid races that prevent the device form being resumed from D3hot (Stanislaw Gruszka) Peer-to-peer DMA: - Use atomic XA allocation in RCU read section (Christophe JAILLET) ASPM: - Collect bits of ASPM-related code that we need even without CONFIG_PCIEASPM into aspm.c (David E. Box) - Save/restore L1 PM Substates config for suspend/resume (David E. Box) - Update save_save when ASPM config is changed, so a .slot_reset() during error recovery restores the changed config, not the .probe()-time config (Vidya Sagar) Endpoint framework: - Refactor and improve pci_epf_alloc_space() API (Niklas Cassel) - Clean up endpoint BAR descriptions (Niklas Cassel) - Fix ntb_register_device() name leak in error path (Yang Yingliang) - Return actual error code for pci_vntb_probe() failure (Yang Yingliang) Broadcom STB PCIe controller driver: - Fix MDIO write polling, which previously never waited for completion (Jonathan Bell) Cadence PCIe endpoint driver: - Clear the ARI "Next Function Number" of last function (Jasko-EXT Wojciech) Freescale i.MX6 PCIe controller driver: - Simplify by replacing switch statements with function pointers for different hardware variants (Frank Li) - Simplify by using clk_bulk*() API (Frank Li) - Remove redundant DT clock and reg/reg-name details (Frank Li) - Add i.MX95 DT and driver support for both Root Complex and Endpoint mode (Frank Li) Microsoft Hyper-V host bridge driver: - Reduce memory usage by limiting ring buffer size to 16KB instead of 4 pages (Michael Kelley) Qualcomm PCIe controller driver: - Add X1E80100 DT and driver support (Abel Vesa) - Add DT 'required-opps' for SoCs that require a minimum performance level (Johan Hovold) - Make DT 'msi-map-mask' optional, depending on how MSI interrupts are mapped (Johan Hovold) - Disable ASPM L0s for sc8280xp, sa8540p and sa8295p because the PHY configuration isn't tuned correctly for L0s (Johan Hovold) - Split dt-binding qcom,pcie.yaml into qcom,pcie-common.yaml and separate files for SA8775p, SC7280, SC8180X, SC8280XP, SM8150, SM8250, SM8350, SM8450, SM8550 for easier reviewing (Krzysztof Kozlowski) - Enable BDF to SID translation by disabling bypass mode (Manivannan Sadhasivam) - Add endpoint MHI support for Snapdragon SA8775P SoC (Mrinmay Sarkar) Synopsys DesignWare PCIe controller driver: - Allocate 64-bit MSI address if no 32-bit address is available (Ajay Agarwal) - Fix endpoint Resizable BAR to actually advertise the required 1MB size (Niklas Cassel) MicroSemi Switchtec management driver: - Release resources if the .probe() fails (Christophe JAILLET) Miscellaneous: - Make pcie_port_bus_type const (Ricardo B. Marliere)" * tag 'pci-v6.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (77 commits) PCI/ASPM: Update save_state when configuration changes PCI/ASPM: Disable L1 before configuring L1 Substates PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state() PCI/ASPM: Save L1 PM Substates Capability for suspend/resume PCI: hv: Fix ring buffer size calculation PCI: dwc: endpoint: Fix advertised resizable BAR size PCI: cadence: Clear the ARI Capability Next Function Number of the last function PCI: dwc: Strengthen the MSI address allocation logic PCI: brcmstb: Fix broken brcm_pcie_mdio_write() polling PCI: qcom: Add X1E80100 PCIe support dt-bindings: PCI: qcom: Document the X1E80100 PCIe Controller PCI: qcom: Enable BDF to SID translation properly PCI/AER: Generalize TLP Header Log reading PCI/AER: Use explicit register size for PCI_ERR_CAP PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p dt-bindings: PCI: qcom: Do not require 'msi-map-mask' dt-bindings: PCI: qcom: Allow 'required-opps' PCI/AER: Block runtime suspend when handling errors PCI/ASPM: Move pci_save_ltr_state() to aspm.c PCI/ASPM: Always build aspm.c ...
2024-03-13lib/generic-radix-tree.c: Make nodes more reasonably sizedKent Overstreet
this code originally used the page allocator directly, but most code shouldn't do that - PAGE_SIZE varies with architecture, and slab is faster. 4k is also on the large side for typical usage, 512 bytes is a better choice for typical usage that might be somewhat sparse. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13Merge tag 'modules-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules updates from Luis Chamberlain: "Christophe Leroy did most of the work on this release, first with a few cleanups on CONFIG_STRICT_KERNEL_RWX and ending with error handling for when set_memory_XX() can fail. This is part of a larger effort to clean up all these callers which can fail, modules is just part of it" * tag 'modules-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: Don't ignore errors from set_memory_XX() lib/test_kmod: fix kernel-doc warnings powerpc: Simplify strict_kernel_rwx_enabled() modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around rodata_enabled init: Declare rodata_enabled and mark_rodata_ro() at all time module: Change module_enable_{nx/x/ro}() to more explicit names module: Use set_memory_rox()
2024-03-13lib/firmware_table: Provide buffer length argument to cdat_table_parse()Robert Richter
There exist card implementations with a CDAT table using a fixed size buffer, but with entries filled in that do not fill the whole table length size. Then, the last entry in the CDAT table may not mark the end of the CDAT table buffer specified by the length field in the CDAT header. It can be shorter with trailing unused (zero'ed) data. The actual table length is determined while reading all CDAT entries of the table with DOE. If the table is greater than expected (containing zero'ed trailing data), the CDAT parser fails with: [ 48.691717] Malformed DSMAS table length: (24:0) [ 48.702084] [CDAT:0x00] Invalid zero length [ 48.711460] cxl_port endpoint1: Failed to parse CDAT: -22 In addition, a check of the table buffer length is missing to prevent an out-of-bound access then parsing the CDAT table. Hardening code against device returning borked table. Fix that by providing an optional buffer length argument to acpi_parse_entries_array() that can be used by cdat_table_parse() to propagate the buffer size down to its users to check the buffer length. This also prevents a possible out-of-bound access mentioned. Add a check to warn about a malformed CDAT table length. Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Len Brown <lenb@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/ZdEnopFO0Tl3t2O1@rric.localdomain Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12Merge tag 'printk-for-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: "Improve the behavior during panic. The issues were found when testing the ongoing changes introducing atomic consoles and printk kthreads: - pr_flush() has to wait for the last reserved record instead of the last finalized one. Note that records are finalized in random order when generated by more CPUs in parallel. - Ignore non-finalized records during panic(). Messages printed on panic-CPU are always finalized. Messages printed by other CPUs might never be finalized when the CPUs get stopped. - Block new printk() calls on non-panic CPUs completely. Backtraces are printed before entering the panic mode. Later messages would just mess information printed by the panic CPU. - Do not take console_lock in console_flush_on_panic() at all. The original code did try_lock()/console_unlock(). The unlock part might cause a deadlock when panic() happened in a scheduler code. - Fix conversion of 64-bit sequence number for 32-bit atomic operations" * tag 'printk-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: dump_stack: Do not get cpu_sync for panic CPU panic: Flush kernel log buffer at the end printk: Avoid non-panic CPUs writing to ringbuffer printk: Disable passing console lock owner completely during panic() printk: ringbuffer: Skip non-finalized records in panic printk: Wait for all reserved records with pr_flush() printk: ringbuffer: Cleanup reader terminology printk: Add this_cpu_in_panic() printk: For @suppress_panic_printk check for other CPU in panic printk: ringbuffer: Clarify special lpos values printk: ringbuffer: Do not skip non-finalized records with prb_next_seq() printk: Use prb_first_seq() as base for 32bit seq macros printk: Adjust mapping for 32bit seq macros printk: nbcon: Relocate 32bit seq macros
2024-03-12Merge tag 'net-next-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...
2024-03-12Merge tag 'hardening-v6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "As is pretty normal for this tree, there are changes all over the place, especially for small fixes, selftest improvements, and improved macro usability. Some header changes ended up landing via this tree as they depended on the string header cleanups. Also, a notable set of changes is the work for the reintroduction of the UBSAN signed integer overflow sanitizer so that we can continue to make improvements on the compiler side to make this sanitizer a more viable future security hardening option. Summary: - string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko) - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit Mogalapalli) - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael Ellerman) - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn) - Handle tail call optimization better in LKDTM (Douglas Anderson) - Use long form types in overflow.h (Andy Shevchenko) - Add flags param to string_get_size() (Andy Shevchenko) - Add Coccinelle script for potential struct_size() use (Jacob Keller) - Fix objtool corner case under KCFI (Josh Poimboeuf) - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng) - Add str_plural() helper (Michal Wajdeczko, Kees Cook) - Ignore relocations in .notes section - Add comments to explain how __is_constexpr() works - Fix m68k stack alignment expectations in stackinit Kunit test - Convert string selftests to KUnit - Add KUnit tests for fortified string functions - Improve reporting during fortified string warnings - Allow non-type arg to type_max() and type_min() - Allow strscpy() to be called with only 2 arguments - Add binary mode to leaking_addresses scanner - Various small cleanups to leaking_addresses scanner - Adding wrapping_*() arithmetic helper - Annotate initial signed integer wrap-around in refcount_t - Add explicit UBSAN section to MAINTAINERS - Fix UBSAN self-test warnings - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL - Reintroduce UBSAN's signed overflow sanitizer" * tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits) selftests/powerpc: Fix load_unaligned_zeropad build failure string: Convert helpers selftest to KUnit string: Convert selftest to KUnit sh: Fix build with CONFIG_UBSAN=y compiler.h: Explain how __is_constexpr() works overflow: Allow non-type arg to type_max() and type_min() VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() lib/string_helpers: Add flags param to string_get_size() x86, relocs: Ignore relocations in .notes section objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks overflow: Use POD in check_shl_overflow() lib: stackinit: Adjust target string to 8 bytes for m68k sparc: vdso: Disable UBSAN instrumentation kernel.h: Move lib/cmdline.c prototypes to string.h leaking_addresses: Provide mechanism to scan binary files leaking_addresses: Ignore input device status lines leaking_addresses: Use File::Temp for /tmp files MAINTAINERS: Update LEAKING_ADDRESSES details fortify: Improve buffer overflow reporting fortify: Add KUnit tests for runtime overflows ...
2024-03-12assoc_array: fix the return value in assoc_array_insert_mid_shortcut()Roman Smirnov
Returning the edit variable is redundant because it is dereferenced right before it is returned. It would be better to return true. Found by Linux Verification Center (linuxtesting.org) with Svace. Link: https://lkml.kernel.org/r/20240307071717.5318-1-r.smirnov@omp.ru Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>