bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2021-05-26	vt: Fix character height handling with VT_RESIZEX	Maciej W. Rozycki
	commit 860dafa902595fb5f1d23bbcce1215188c3341e6 upstream. Restore the original intent of the VT_RESIZEX ioctl's `v_clin' parameter which is the number of pixel rows per character (cell) rather than the height of the font used. For framebuffer devices the two values are always the same, because the former is inferred from the latter one. For VGA used as a true text mode device these two parameters are independent from each other: the number of pixel rows per character is set in the CRT controller, while font height is in fact hardwired to 32 pixel rows and fonts of heights below that value are handled by padding their data with blanks when loaded to hardware for use by the character generator. One can change the setting in the CRT controller and it will update the screen contents accordingly regardless of the font loaded. The `v_clin' parameter is used by the `vgacon' driver to set the height of the character cell and then the cursor position within. Make the parameter explicit then, by defining a new `vc_cell_height' struct member of `vc_data', set it instead of `vc_font.height' from `v_clin' in the VT_RESIZEX ioctl, and then use it throughout the `vgacon' driver except where actual font data is accessed which as noted above is independent from the CRTC setting. This way the framebuffer console driver is free to ignore the `v_clin' parameter as irrelevant, as it always should have, avoiding any issues attempts to give the parameter a meaning there could have caused, such as one that has led to commit 988d0763361b ("vt_ioctl: make VT_RESIZEX behave like VT_RESIZE"): "syzbot is reporting UAF/OOB read at bit_putcs()/soft_cursor() [1][2], for vt_resizex() from ioctl(VT_RESIZEX) allows setting font height larger than actual font height calculated by con_font_set() from ioctl(PIO_FONT). Since fbcon_set_font() from con_font_set() allocates minimal amount of memory based on actual font height calculated by con_font_set(), use of vt_resizex() can cause UAF/OOB read for font data." The problem first appeared around Linux 2.5.66 which predates our repo history, but the origin could be identified with the old MIPS/Linux repo also at: <git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux.git> as commit 9736a3546de7 ("Merge with Linux 2.5.66."), where VT_RESIZEX code in `vt_ioctl' was updated as follows: if (clin) - video_font_height = clin; + vc->vc_font.height = clin; making the parameter apply to framebuffer devices as well, perhaps due to the use of "font" in the name of the original `video_font_height' variable. Use "cell" in the new struct member then to avoid ambiguity. References: [1] https://syzkaller.appspot.com/bug?id=32577e96d88447ded2d3b76d71254fb855245837 [2] https://syzkaller.appspot.com/bug?id=6b8355d27b2b94fb5cedf4655e3a59162d9e48e3 Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org # v2.6.12+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19	clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940	Tony Lindgren
	commit 25de4ce5ed02994aea8bc111d133308f6fd62566 upstream. There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days. To work around the issue, we need to use timer-ti-dm percpu timers instead. Let's configure dmtimer3 and 4 as percpu timers by default, and warn about the issue if the dtb is not configured properly. Let's do this as a single patch so it can be backported to v5.8 and later kernels easily. Note that this patch depends on earlier timer-ti-dm systimer posted mode fixes, and a preparatory clockevent patch "clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue". For more information, please see the errata for "AM572x Sitara Processors Silicon Revisions 1.1, 2.0": https://www.ti.com/lit/er/sprz429m/sprz429m.pdf The concept is based on earlier reference patches done by Tero Kristo and Keerthy. Cc: Keerthy <j-keerthy@ti.com> Cc: Tero Kristo <kristo@kernel.org> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210323074326.28302-3-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19	mm: fix struct page layout on 32-bit systems	Matthew Wilcox (Oracle)
	commit 9ddb3c14afba8bc5950ed297f02d4ae05ff35cd1 upstream. 32-bit architectures which expect 8-byte alignment for 8-byte integers and need 64-bit DMA addresses (arm, mips, ppc) had their struct page inadvertently expanded in 2019. When the dma_addr_t was added, it forced the alignment of the union to 8 bytes, which inserted a 4 byte gap between 'flags' and the union. Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. This restores the alignment to that of an unsigned long. We always store the low bits in the first word to prevent the PageTail bit from being inadvertently set on a big endian platform. If that happened, get_user_pages_fast() racing against a page which was freed and reallocated to the page_pool could dereference a bogus compound_head(), which would be hard to trace back to this cause. Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Matteo Croce <mcroce@linux.microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19	kyber: fix out of bounds access when preempted	Omar Sandoval
	[ Upstream commit efed9a3337e341bd0989161b97453b52567bc59d ] __blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx for the current CPU again and uses that to get the corresponding Kyber context in the passed hctx. However, the thread may be preempted between the two calls to blk_mq_get_ctx(), and the ctx returned the second time may no longer correspond to the passed hctx. This "works" accidentally most of the time, but it can cause us to read garbage if the second ctx came from an hctx with more ctx's than the first one (i.e., if ctx->index_hw[hctx->type] > hctx->nr_ctx). This manifested as this UBSAN array index out of bounds error reported by Jakub: UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9 index 13106 is out of range for type 'long unsigned int [128]' Call Trace: dump_stack+0xa4/0xe5 ubsan_epilogue+0x5/0x40 __ubsan_handle_out_of_bounds.cold.13+0x2a/0x34 queued_spin_lock_slowpath+0x476/0x480 do_raw_spin_lock+0x1c2/0x1d0 kyber_bio_merge+0x112/0x180 blk_mq_submit_bio+0x1f5/0x1100 submit_bio_noacct+0x7b0/0x870 submit_bio+0xc2/0x3a0 btrfs_map_bio+0x4f0/0x9d0 btrfs_submit_data_bio+0x24e/0x310 submit_one_bio+0x7f/0xb0 submit_extent_page+0xc4/0x440 __extent_writepage_io+0x2b8/0x5e0 __extent_writepage+0x28d/0x6e0 extent_write_cache_pages+0x4d7/0x7a0 extent_writepages+0xa2/0x110 do_writepages+0x8f/0x180 __writeback_single_inode+0x99/0x7f0 writeback_sb_inodes+0x34e/0x790 __writeback_inodes_wb+0x9e/0x120 wb_writeback+0x4d2/0x660 wb_workfn+0x64d/0xa10 process_one_work+0x53a/0xa80 worker_thread+0x69/0x5b0 kthread+0x20b/0x240 ret_from_fork+0x1f/0x30 Only Kyber uses the hctx, so fix it by passing the request_queue to ->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can map the queues itself to avoid the mismatch. Fixes: a6088845c2bf ("block: kyber: make kyber more friendly with merging") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19	mm/hugetlb: fix F_SEAL_FUTURE_WRITE	Peter Xu
	commit 22247efd822e6d263f3c8bd327f3f769aea9b1d9 upstream. Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2. Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to hugetlbfs, which I can easily verify using the memfd_test program, which seems that the program is hardly run with hugetlbfs pages (as by default shmem). Meanwhile I found another probably even more severe issue on that hugetlb fork won't wr-protect child cow pages, so child can potentially write to parent private pages. Patch 2 addresses that. After this series applied, "memfd_test hugetlbfs" should start to pass. This patch (of 2): F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. There is a test program for that and it fails constantly. $ ./memfd_test hugetlbfs memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE mmap() didn't fail as expected Aborted (core dumped) I think it's probably because no one is really running the hugetlbfs test. Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in shmem_mmap(). Generalize a helper for that. Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19	netfilter: xt_SECMARK: add new revision to fix structure layout	Pablo Neira Ayuso
	[ Upstream commit c7d13358b6a2f49f81a34aa323a2d0878a0532a2 ] This extension breaks when trying to delete rules, add a new revision to fix this. Fixes: 5e6874cdb8de ("[SECMARK]: Add xtables SECMARK target") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19	SUNRPC: Remove trace_xprt_transmit_queued	Chuck Lever
	[ Upstream commit 6cf23783f750634e10daeede48b0f5f5d64ebf3a ] This tracepoint can crash when dereferencing snd_task because when some transports connect, they put a cookie in that field instead of a pointer to an rpc_task. BUG: KASAN: use-after-free in trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc] Read of size 2 at addr ffff8881a83bd3a0 by task git/331872 CPU: 11 PID: 331872 Comm: git Tainted: G S 5.12.0-rc2-00007-g3ab6e585a7f9 #1453 Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015 Call Trace: dump_stack+0x9c/0xcf print_address_description.constprop.0+0x18/0x239 kasan_report+0x174/0x1b0 trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc] xprt_prepare_transmit+0x8e/0xc1 [sunrpc] call_transmit+0x4d/0xc6 [sunrpc] Fixes: 9ce07ae5eb1d ("SUNRPC: Replace dprintk() call site in xprt_prepare_transmit") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19	NFS: nfs4_bitmask_adjust() must not change the server global bitmasks	Trond Myklebust
	[ Upstream commit 332d1a0373be32a3a3c152756bca45ff4f4e11b5 ] As currently set, the calls to nfs4_bitmask_adjust() will end up overwriting the contents of the nfs_server cache_consistency_bitmask field. The intention here should be to modify a private copy of that mask in the close/delegreturn/write arguments. Fixes: 76bd5c016ef4 ("NFSv4: make cache consistency bitmask dynamic") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19	net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM	Heiner Kallweit
	[ Upstream commit fba863b816049b03f3fbb07b10ebdcfe5c4141f7 ] Resume callback of the PHY driver is called after the one for the MAC driver. The PHY driver resume callback calls phy_init_hw(), and this is potentially problematic if the MAC driver calls phy_start() in its resume callback. One issue was reported with the fec driver and a KSZ8081 PHY which seems to become unstable if a soft reset is triggered during aneg. The new flag allows MAC drivers to indicate that they take care of suspending/resuming the PHY. Then the MAC PM callbacks can handle any dependency between MAC and PHY PM. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19	i2c: Add I2C_AQ_NO_REP_START adapter quirk	Bence Csókás
	[ Upstream commit aca01415e076aa96cca0f801f4420ee5c10c660d ] This quirk signifies that the adapter cannot do a repeated START, it always issues a STOP condition after transfers. Suggested-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Bence Csókás <bence98@sch.bme.hu> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19	PM: runtime: Fix unpaired parent child_count for force_resume	Tony Lindgren
	commit c745253e2a691a40c66790defe85c104a887e14a upstream. As pm_runtime_need_not_resume() relies also on usage_count, it can return a different value in pm_runtime_force_suspend() compared to when called in pm_runtime_force_resume(). Different return values can happen if anything calls PM runtime functions in between, and causes the parent child_count to increase on every resume. So far I've seen the issue only for omapdrm that does complicated things with PM runtime calls during system suspend for legacy reasons: omap_atomic_commit_tail() for omapdrm.0 dispc_runtime_get() wakes up 58000000.dss as it's the dispc parent dispc_runtime_resume() rpm_resume() increases parent child_count dispc_runtime_put() won't idle, PM runtime suspend blocked pm_runtime_force_suspend() for 58000000.dss, !pm_runtime_need_not_resume() __update_runtime_status() system suspended pm_runtime_force_resume() for 58000000.dss, pm_runtime_need_not_resume() pm_runtime_enable() only called because of pm_runtime_need_not_resume() omap_atomic_commit_tail() for omapdrm.0 dispc_runtime_get() wakes up 58000000.dss as it's the dispc parent dispc_runtime_resume() rpm_resume() increases parent child_count dispc_runtime_put() won't idle, PM runtime suspend blocked ... rpm_suspend for 58000000.dss but parent child_count is now unbalanced Let's fix the issue by adding a flag for needs_force_resume and use it in pm_runtime_force_resume() instead of pm_runtime_need_not_resume(). Additionally omapdrm system suspend could be simplified later on to avoid lots of unnecessary PM runtime calls and the complexity it adds. The driver can just use internal functions that are shared between the PM runtime and system suspend related functions. Fixes: 4918e1f87c5f ("PM / runtime: Rework pm_runtime_force_suspend/resume()") Signed-off-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Cc: 4.16+ <stable@vger.kernel.org> # 4.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14	smp: Fix smp_call_function_single_async prototype	Arnd Bergmann
	commit 1139aeb1c521eb4a050920ce6c64c36c4f2a3ab7 upstream. As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct call_single_data"), the smp code prefers 32-byte aligned call_single_data objects for performance reasons, but the block layer includes an instance of this structure in the main 'struct request' that is more senstive to size than to performance here, see 4ccafe032005 ("block: unalign call_single_data in struct request"). The result is a violation of the calling conventions that clang correctly points out: block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch] smp_call_function_single_async(cpu, &rq->csd); It does seem that the usage of the call_single_data without cache line alignment should still be allowed by the smp code, so just change the function prototype so it accepts both, but leave the default alignment unchanged for the other users. This seems better to me than adding a local hack to shut up an otherwise correct warning in the caller. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jens Axboe <axboe@kernel.dk> Link: https://lkml.kernel.org/r/20210505211300.3174456-1-arnd@kernel.org [nc: Fix conflicts] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14	net: bridge: mcast: fix broken length + header check for MRDv6 Adv.	Linus Lüssing
	[ Upstream commit 99014088156cd78867d19514a0bc771c4b86b93b ] The IPv6 Multicast Router Advertisements parsing has the following two issues: For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes, assuming MLDv2 Reports with at least one multicast address entry). When ipv6_mc_check_mld_msg() tries to parse an Multicast Router Advertisement its MLD length check will fail - and it will wrongly return -EINVAL, even if we have a valid MRD Advertisement. With the returned -EINVAL the bridge code will assume a broken packet and will wrongly discard it, potentially leading to multicast packet loss towards multicast routers. The second issue is the MRD header parsing in br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header immediately after the IPv6 header (IPv6 next header type). However according to RFC4286, section 2 all MRD messages contain a Router Alert option (just like MLD). So instead there is an IPv6 Hop-by-Hop option for the Router Alert between the IPv6 and ICMPv6 header, again leading to the bridge wrongly discarding Multicast Router Advertisements. To fix these two issues, introduce a new return value -ENODATA to ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop option which is not an MLD but potentially an MRD packet. This also simplifies further parsing in the bridge code, as ipv6_mc_check_mld() already fully checks the ICMPv6 header and hop-by-hop option. These issues were found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc). Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	netfilter: nftables_offload: VLAN id needs host byteorder in flow dissector	Pablo Neira Ayuso
	[ Upstream commit ff4d90a89d3d4d9814e0a2696509a7d495be4163 ] The flow dissector representation expects the VLAN id in host byteorder. Add the NFT_OFFLOAD_F_NETWORK2HOST flag to swap the bytes from nft_cmp. Fixes: a82055af5959 ("netfilter: nft_payload: add VLAN offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	netfilter: nft_payload: fix C-VLAN offload support	Pablo Neira Ayuso
	[ Upstream commit 14c20643ef9457679cc6934d77adc24296505214 ] - add another struct flow_dissector_key_vlan for C-VLAN - update layer 3 dependency to allow to match on IPv4/IPv6 Fixes: 89d8fd44abfb ("netfilter: nft_payload: add C-VLAN offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	iommu/vt-d: Invalidate PASID cache when root/context entry changed	Lu Baolu
	[ Upstream commit c0474a606ecb9326227b4d68059942f9db88a897 ] When the Intel IOMMU is operating in the scalable mode, some information from the root and context table may be used to tag entries in the PASID cache. Software should invalidate the PASID-cache when changing root or context table entries. Suggested-by: Ashok Raj <ashok.raj@intel.com> Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210320025415.641201-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	iommu: Fix a boundary issue to avoid performance drop	Xiang Chen
	[ Upstream commit 3431c3f660a39f6ced954548a59dba6541ce3eb1 ] After the change of patch ("iommu: Switch gather->end to the inclusive end"), the performace drops from 1600+K IOPS to 1200K in our kunpeng ARM64 platform. We find that the range [start1, end1) actually is joint from the range [end1, end2), but it is considered as disjoint after the change, so it needs more times of TLB sync, and spends more time on it. So fix the boundary issue to avoid performance drop. Fixes: 862c3715de8f ("iommu: Switch gather->end to the inclusive end") Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1616643504-120688-1-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	udp: never accept GSO_FRAGLIST packets	Paolo Abeni
	[ Upstream commit 78352f73dc5047f3f744764cc45912498c52f3c9 ] Currently the UDP protocol delivers GSO_FRAGLIST packets to the sockets without the expected segmentation. This change addresses the issue introducing and maintaining a couple of new fields to explicitly accept SKB_GSO_UDP_L4 or GSO_FRAGLIST packets. Additionally updates udp_unexpected_gso() accordingly. UDP sockets enabling UDP_GRO stil keep accept_udp_fraglist zeroed. v1 -> v2: - use 2 bits instead of a whole GSO bitmask (Willem) Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	gpio: guard gpiochip_irqchip_add_domain() with GPIOLIB_IRQCHIP	Álvaro Fernández Rojas
	[ Upstream commit 9c7d24693d864f90b27aad5d15fbfe226c02898b ] The current code doesn't check if GPIOLIB_IRQCHIP is enabled, which results in a compilation error when trying to build gpio-regmap if CONFIG_GPIOLIB_IRQCHIP isn't enabled. Fixes: 6a45b0e2589f ("gpiolib: Introduce gpiochip_irqchip_add_domain()") Suggested-by: Michael Walle <michael@walle.cc> Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Michael Walle <michael@walle.cc> Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Link: https://lore.kernel.org/r/20210324081923.20379-2-noltari@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	iommu/dma: Resurrect the "forcedac" option	Robin Murphy
	[ Upstream commit 3542dcb15cef66c0b9e6c3b33168eb657e0d9520 ] In converting intel-iommu over to the common IOMMU DMA ops, it quietly lost the functionality of its "forcedac" option. Since this is a handy thing both for testing and for performance optimisation on certain platforms, reimplement it under the common IOMMU parameter namespace. For the sake of fixing the inadvertent breakage of the Intel-specific parameter, remove the dmar_forcedac remnants and hook it up as an alias while documenting the transition to the new common parameter. Fixes: c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/7eece8e0ea7bfbe2cd0e30789e0d46df573af9b0.1614961776.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	RDMA/mlx5: Fix query RoCE port	Maor Gottlieb
	[ Upstream commit 7852546f524595245382a919e752468f73421451 ] mlx5_is_roce_enabled returns the devlink RoCE init value, therefore it should be used only when driver is loaded. Instead we just need to read the roce_en field. In addition, rename mlx5_is_roce_enabled to mlx5_is_roce_init_enabled. Fixes: 7a58779edd75 ("IB/mlx5: Improve query port for representor port") Link: https://lore.kernel.org/r/20210304124517.1100608-2-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	HID: plantronics: Workaround for double volume key presses	Maxim Mikityanskiy
	[ Upstream commit f567d6ef8606fb427636e824c867229ecb5aefab ] Plantronics Blackwire 3220 Series (047f:c056) sends HID reports twice for each volume key press. This patch adds a quirk to hid-plantronics for this product ID, which will ignore the second volume key press if it happens within 5 ms from the last one that was handled. The patch was tested on the mentioned model only, it shouldn't affect other models, however, this quirk might be needed for them too. Auto-repeat (when a key is held pressed) is not affected, because the rate is about 3 times per second, which is far less frequent than once in 5 ms. Fixes: 81bb773faed7 ("HID: plantronics: Update to map volume up/down controls") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	usb: typec: tcpm: Honour pSnkStdby requirement during negotiation	Badhri Jagan Sridharan
	[ Upstream commit 123086843372bc93d26f52edfb71dbf951cd2f17 ] >From PD Spec: The Sink Shall transition to Sink Standby before a positive or negative voltage transition of VBUS. During Sink Standby the Sink Shall reduce its power draw to pSnkStdby. This allows the Source to manage the voltage transition as well as supply sufficient operating current to the Sink to maintain PD operation during the transition. The Sink Shall complete this transition to Sink Standby within tSnkStdby after evaluating the Accept Message from the Source. The transition when returning to Sink operation from Sink Standby Shall be completed within tSnkNewPower. The pSnkStdby requirement Shall only apply if the Sink power draw is higher than this level. The above requirement needs to be met to prevent hard resets from port partner. Without the patch: (5V/3A during SNK_DISCOVERY all the way through explicit contract) [ 95.711984] CC1: 0 -> 0, CC2: 0 -> 5 [state TOGGLING, polarity 0, connected] [ 95.712007] state change TOGGLING -> SNK_ATTACH_WAIT [rev3 NONE_AMS] [ 95.712017] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @ 170 ms [rev3 NONE_AMS] [ 95.837190] VBUS on [ 95.882075] state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED [delayed 170 ms] [ 95.882082] state change SNK_DEBOUNCED -> SNK_ATTACHED [rev3 NONE_AMS] [ 95.882086] polarity 1 [ 95.883151] set_auto_vbus_discharge_threshold mode:0 pps_active:n vbus:5000 ret:0 [ 95.883441] enable vbus discharge ret:0 [ 95.883445] Requesting mux state 1, usb-role 2, orientation 2 [ 95.883776] state change SNK_ATTACHED -> SNK_STARTUP [rev3 NONE_AMS] [ 95.883879] pending state change SNK_STARTUP -> SNK_DISCOVERY @ 500 ms [rev3 NONE_AMS] [ 96.038960] VBUS on [ 96.383939] state change SNK_STARTUP -> SNK_DISCOVERY [delayed 500 ms] [ 96.383946] Setting voltage/current limit 5000 mV 3000 mA [ 96.383961] vbus=0 charge:=1 [ 96.386044] state change SNK_DISCOVERY -> SNK_WAIT_CAPABILITIES [rev3 NONE_AMS] [ 96.386309] pending state change SNK_WAIT_CAPABILITIES -> HARD_RESET_SEND @ 450 ms [rev3 NONE_AMS] [ 96.394404] PD RX, header: 0x2161 [1] [ 96.394408] PDO 0: type 0, 5000 mV, 3000 mA [E] [ 96.394410] PDO 1: type 0, 9000 mV, 2000 mA [] [ 96.394412] state change SNK_WAIT_CAPABILITIES -> SNK_NEGOTIATE_CAPABILITIES [rev2 POWER_NEGOTIATION] [ 96.394416] Setting usb_comm capable false [ 96.395083] cc=0 cc1=0 cc2=5 vbus=0 vconn=sink polarity=1 [ 96.395089] Requesting PDO 1: 9000 mV, 2000 mA [ 96.395093] PD TX, header: 0x1042 [ 96.397404] PD TX complete, status: 0 [ 96.397424] pending state change SNK_NEGOTIATE_CAPABILITIES -> HARD_RESET_SEND @ 60 ms [rev2 POWER_NEGOTIATION] [ 96.400826] PD RX, header: 0x363 [1] [ 96.400829] state change SNK_NEGOTIATE_CAPABILITIES -> SNK_TRANSITION_SINK [rev2 POWER_NEGOTIATION] [ 96.400832] pending state change SNK_TRANSITION_SINK -> HARD_RESET_SEND @ 500 ms [rev2 POWER_NEGOTIATION] [ 96.577315] PD RX, header: 0x566 [1] [ 96.577321] Setting voltage/current limit 9000 mV 2000 mA [ 96.578363] set_auto_vbus_discharge_threshold mode:3 pps_active:n vbus:9000 ret:0 [ 96.578370] state change SNK_TRANSITION_SINK -> SNK_READY [rev2 POWER_NEGOTIATION] With the patch: [ 168.398573] CC1: 0 -> 0, CC2: 0 -> 5 [state TOGGLING, polarity 0, connected] [ 168.398605] state change TOGGLING -> SNK_ATTACH_WAIT [rev3 NONE_AMS] [ 168.398619] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @ 170 ms [rev3 NONE_AMS] [ 168.522348] VBUS on [ 168.568676] state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED [delayed 170 ms] [ 168.568684] state change SNK_DEBOUNCED -> SNK_ATTACHED [rev3 NONE_AMS] [ 168.568688] polarity 1 [ 168.569867] set_auto_vbus_discharge_threshold mode:0 pps_active:n vbus:5000 ret:0 [ 168.570158] enable vbus discharge ret:0 [ 168.570161] Requesting mux state 1, usb-role 2, orientation 2 [ 168.570504] state change SNK_ATTACHED -> SNK_STARTUP [rev3 NONE_AMS] [ 168.570634] pending state change SNK_STARTUP -> SNK_DISCOVERY @ 500 ms [rev3 NONE_AMS] [ 169.070689] state change SNK_STARTUP -> SNK_DISCOVERY [delayed 500 ms] [ 169.070695] Setting voltage/current limit 5000 mV 3000 mA [ 169.070702] vbus=0 charge:=1 [ 169.072719] state change SNK_DISCOVERY -> SNK_WAIT_CAPABILITIES [rev3 NONE_AMS] [ 169.073145] pending state change SNK_WAIT_CAPABILITIES -> HARD_RESET_SEND @ 450 ms [rev3 NONE_AMS] [ 169.077162] PD RX, header: 0x2161 [1] [ 169.077172] PDO 0: type 0, 5000 mV, 3000 mA [E] [ 169.077178] PDO 1: type 0, 9000 mV, 2000 mA [] [ 169.077183] state change SNK_WAIT_CAPABILITIES -> SNK_NEGOTIATE_CAPABILITIES [rev2 POWER_NEGOTIATION] [ 169.077191] Setting usb_comm capable false [ 169.077753] cc=0 cc1=0 cc2=5 vbus=0 vconn=sink polarity=1 [ 169.077759] Requesting PDO 1: 9000 mV, 2000 mA [ 169.077762] PD TX, header: 0x1042 [ 169.079990] PD TX complete, status: 0 [ 169.080013] pending state change SNK_NEGOTIATE_CAPABILITIES -> HARD_RESET_SEND @ 60 ms [rev2 POWER_NEGOTIATION] [ 169.083183] VBUS on [ 169.084195] PD RX, header: 0x363 [1] [ 169.084200] state change SNK_NEGOTIATE_CAPABILITIES -> SNK_TRANSITION_SINK [rev2 POWER_NEGOTIATION] [ 169.084206] Setting standby current 5000 mV @ 500 mA [ 169.084209] Setting voltage/current limit 5000 mV 500 mA [ 169.084220] pending state change SNK_TRANSITION_SINK -> HARD_RESET_SEND @ 500 ms [rev2 POWER_NEGOTIATION] [ 169.260222] PD RX, header: 0x566 [1] [ 169.260227] Setting voltage/current limit 9000 mV 2000 mA [ 169.261315] set_auto_vbus_discharge_threshold mode:3 pps_active:n vbus:9000 ret:0 [ 169.261321] state change SNK_TRANSITION_SINK -> SNK_READY [rev2 POWER_NEGOTIATION] [ 169.261570] AMS POWER_NEGOTIATION finished Fixes: f0690a25a140b ("staging: typec: USB Type-C Port Manager (tcpm)") Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Badhri Jagan Sridharan <badhri@google.com> Link: https://lore.kernel.org/r/20210414024000.4175263-1-badhri@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	security: keys: trusted: fix TPM2 authorizations	James Bottomley
	[ Upstream commit de66514d934d70ce73c302ce0644b54970fc7196 ] In TPM 1.2 an authorization was a 20 byte number. The spec actually recommended you to hash variable length passwords and use the sha1 hash as the authorization. Because the spec doesn't require this hashing, the current authorization for trusted keys is a 40 digit hex number. For TPM 2.0 the spec allows the passing in of variable length passwords and passphrases directly, so we should allow that in trusted keys for ease of use. Update the 'blobauth' parameter to take this into account, so we can now use plain text passwords for the keys. so before keyctl add trusted kmk "new 32 blobauth=f572d396fae9206628714fb2ce00f72e94f2258fkeyhandle=81000001" @u after we will accept both the old hex sha1 form as well as a new directly supplied password: keyctl add trusted kmk "new 32 blobauth=hello keyhandle=81000001" @u Since a sha1 hex code must be exactly 40 bytes long and a direct password must be 20 or less, we use the length as the discriminator for which form is input. Note this is both and enhancement and a potential bug fix. The TPM 2.0 spec requires us to strip leading zeros, meaning empyty authorization is a zero length HMAC whereas we're currently passing in 20 bytes of zeros. A lot of TPMs simply accept this as OK, but the Microsoft TPM emulator rejects it with TPM_RC_BAD_AUTH, so this patch makes the Microsoft TPM emulator work with trusted keys. Fixes: 0fe5480303a1 ("keys, trusted: seal/unseal with TPM 2.0 chips") Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	tty: fix return value for unsupported ioctls	Johan Hovold
	[ Upstream commit 1b8b20868a6d64cfe8174a21b25b74367bdf0560 ] Drivers should return -ENOTTY ("Inappropriate I/O control operation") when an ioctl isn't supported, while -EINVAL is used for invalid arguments. Fix up the TIOCMGET, TIOCMSET and TIOCGICOUNT helpers which returned -EINVAL when a tty driver did not implement the corresponding operations. Note that the TIOCMGET and TIOCMSET helpers predate git and do not get a corresponding Fixes tag below. Fixes: d281da7ff6f7 ("tty: Make tiocgicount a handler") Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20210407095208.31838-3-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	tty: actually undefine superseded ASYNC flags	Johan Hovold
	[ Upstream commit d09845e98a05850a8094ea8fd6dd09a8e6824fff ] Some kernel-internal ASYNC flags have been superseded by tty-port flags and should no longer be used by kernel drivers. Fix the misspelled "__KERNEL__" compile guards which failed their sole purpose to break out-of-tree drivers that have not yet been updated. Fixes: 5c0517fefc92 ("tty: core: Undefine ASYNC_* flags superceded by TTY_PORT* flags") Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20210407095208.31838-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	PM: runtime: Replace inline function pm_runtime_callbacks_present()	YueHaibing
	[ Upstream commit 953c1fd96b1a70bcbbfb10973c2126eba8d891c7 ] Commit 9a7875461fd0 ("PM: runtime: Replace pm_runtime_callbacks_present()") forgot to change the inline version. Fixes: 9a7875461fd0 ("PM: runtime: Replace pm_runtime_callbacks_present()") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	spi: Fix use-after-free with devm_spi_alloc_*	William A. Kennington III
	[ Upstream commit 794aaf01444d4e765e2b067cba01cc69c1c68ed9 ] We can't rely on the contents of the devres list during spi_unregister_controller(), as the list is already torn down at the time we perform devres_find() for devm_spi_release_controller. This causes devices registered with devm_spi_alloc_{master,slave}() to be mistakenly identified as legacy, non-devm managed devices and have their reference counters decremented below 0. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 660 at lib/refcount.c:28 refcount_warn_saturate+0x108/0x174 [<b0396f04>] (refcount_warn_saturate) from [<b03c56a4>] (kobject_put+0x90/0x98) [<b03c5614>] (kobject_put) from [<b0447b4c>] (put_device+0x20/0x24) r4:b6700140 [<b0447b2c>] (put_device) from [<b07515e8>] (devm_spi_release_controller+0x3c/0x40) [<b07515ac>] (devm_spi_release_controller) from [<b045343c>] (release_nodes+0x84/0xc4) r5:b6700180 r4:b6700100 [<b04533b8>] (release_nodes) from [<b0454160>] (devres_release_all+0x5c/0x60) r8:b1638c54 r7:b117ad94 r6:b1638c10 r5:b117ad94 r4:b163dc10 [<b0454104>] (devres_release_all) from [<b044e41c>] (__device_release_driver+0x144/0x1ec) r5:b117ad94 r4:b163dc10 [<b044e2d8>] (__device_release_driver) from [<b044f70c>] (device_driver_detach+0x84/0xa0) r9:00000000 r8:00000000 r7:b117ad94 r6:b163dc54 r5:b1638c10 r4:b163dc10 [<b044f688>] (device_driver_detach) from [<b044d274>] (unbind_store+0xe4/0xf8) Instead, determine the devm allocation state as a flag on the controller which is guaranteed to be stable during cleanup. Fixes: 5e844cc37a5c ("spi: Introduce device-managed SPI controller allocation") Signed-off-by: William A. Kennington III <wak@google.com> Link: https://lore.kernel.org/r/20210407095527.2771582-1-wak@google.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	driver core: platform: Declare early_platform_cleanup() prototype	Andy Shevchenko
	[ Upstream commit 1768289b44bae847612751d418fc5c5e680b5e5c ] Compiler is not happy: CC drivers/base/platform.o drivers/base/platform.c:1557:20: warning: no previous prototype for ‘early_platform_cleanup’ [-Wmissing-prototypes] 1557 \| void __weak __init early_platform_cleanup(void) { } \| ^~~~~~~~~~~~~~~~~~~~~~ Declare early_platform_cleanup() prototype in the header to make everyone happy. Fixes: eecd37e105f0 ("drivers: Fix boot problem on SuperH") Cc: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210331150525.59223-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	crypto: poly1305 - fix poly1305_core_setkey() declaration	Arnd Bergmann
	[ Upstream commit 8d195e7a8ada68928f2aedb2c18302a4518fe68e ] gcc-11 points out a mismatch between the declaration and the definition of poly1305_core_setkey(): lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=] 13 \| void poly1305_core_setkey(struct poly1305_core_key key, const u8 raw_key[16]) \| ~~~~~~~~~^~~~~~~~~~~ In file included from lib/crypto/poly1305-donna32.c:11: include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 ’ {aka ‘const unsigned char ’} 21 \| void poly1305_core_setkey(struct poly1305_core_key key, const u8 *raw_key); This is harmless in principle, as the calling conventions are the same, but the more specific prototype allows better type checking in the caller. Change the declaration to match the actual function definition. The poly1305_simd_init() is a bit suspicious here, as it previously had a 32-byte argument type, but looks like it needs to take the 16-byte POLY1305_BLOCK_SIZE array instead. Fixes: 1c08a104360f ("crypto: poly1305 - add new 32 and 64-bit generic versions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	firmware: xilinx: Remove zynqmp_pm_get_eemi_ops() in ↵	Nobuhiro Iwamatsu
	IS_REACHABLE(CONFIG_ZYNQMP_FIRMWARE) [ Upstream commit 79bfe480a0a0b259ab9fddcd2fe52c03542b1196 ] zynqmp_pm_get_eemi_ops() was removed in commit 4db8180ffe7c: "Firmware: xilinx: Remove eemi ops for fpga related APIs", but not in IS_REACHABLE(CONFIG_ZYNQMP_FIRMWARE). Any driver who want to communicate with PMC using EEMI APIs use the functions provided for each function This removed zynqmp_pm_get_eemi_ops() in IS_REACHABLE(CONFIG_ZYNQMP_FIRMWARE), and also modify the documentation for this driver. Fixes: 4db8180ffe7c ("firmware: xilinx: Remove eemi ops for fpga related APIs") Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Link: https://lore.kernel.org/r/20210215155849.2425846-1-iwamatsu@nigauri.org Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14	KVM: Stop looking for coalesced MMIO zones if the bus is destroyed	Sean Christopherson
	commit 5d3c4c79384af06e3c8e25b7770b6247496b4417 upstream. Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev() fails to allocate memory for the new instance of the bus. If it can't instantiate a new bus, unregister_dev() destroys all devices _except_ the target device. But, it doesn't tell the caller that it obliterated the bus and invoked the destructor for all devices that were on the bus. In the coalesced MMIO case, this can result in a deleted list entry dereference due to attempting to continue iterating on coalesced_zones after future entries (in the walk) have been deleted. Opportunistically add curly braces to the for-loop, which encompasses many lines but sneaks by without braces due to the guts being a single if statement. Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()") Cc: stable@vger.kernel.org Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210412222050.876100-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14	Bluetooth: verify AMP hci_chan before amp_destroy	Archie Pusaka
	commit 5c4c8c9544099bb9043a10a5318130a943e32fc3 upstream. hci_chan can be created in 2 places: hci_loglink_complete_evt() if it is an AMP hci_chan, or l2cap_conn_add() otherwise. In theory, Only AMP hci_chan should be removed by a call to hci_disconn_loglink_complete_evt(). However, the controller might mess up, call that function, and destroy an hci_chan which is not initiated by hci_loglink_complete_evt(). This patch adds a verification that the destroyed hci_chan must have been init'd by hci_loglink_complete_evt(). Example crash call trace: Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe3/0x144 lib/dump_stack.c:118 print_address_description+0x67/0x22a mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report mm/kasan/report.c:412 [inline] kasan_report+0x251/0x28f mm/kasan/report.c:396 hci_send_acl+0x3b/0x56e net/bluetooth/hci_core.c:4072 l2cap_send_cmd+0x5af/0x5c2 net/bluetooth/l2cap_core.c:877 l2cap_send_move_chan_cfm_icid+0x8e/0xb1 net/bluetooth/l2cap_core.c:4661 l2cap_move_fail net/bluetooth/l2cap_core.c:5146 [inline] l2cap_move_channel_rsp net/bluetooth/l2cap_core.c:5185 [inline] l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:5464 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:5799 [inline] l2cap_recv_frame+0x1d12/0x51aa net/bluetooth/l2cap_core.c:7023 l2cap_recv_acldata+0x2ea/0x693 net/bluetooth/l2cap_core.c:7596 hci_acldata_packet net/bluetooth/hci_core.c:4606 [inline] hci_rx_work+0x2bd/0x45e net/bluetooth/hci_core.c:4796 process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175 worker_thread+0x4fc/0x670 kernel/workqueue.c:2321 kthread+0x2f0/0x304 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 Allocated by task 38: set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0x8d/0x9a mm/kasan/kasan.c:553 kmem_cache_alloc_trace+0x102/0x129 mm/slub.c:2787 kmalloc include/linux/slab.h:515 [inline] kzalloc include/linux/slab.h:709 [inline] hci_chan_create+0x86/0x26d net/bluetooth/hci_conn.c:1674 l2cap_conn_add.part.0+0x1c/0x814 net/bluetooth/l2cap_core.c:7062 l2cap_conn_add net/bluetooth/l2cap_core.c:7059 [inline] l2cap_connect_cfm+0x134/0x852 net/bluetooth/l2cap_core.c:7381 hci_connect_cfm+0x9d/0x122 include/net/bluetooth/hci_core.h:1404 hci_remote_ext_features_evt net/bluetooth/hci_event.c:4161 [inline] hci_event_packet+0x463f/0x72fa net/bluetooth/hci_event.c:5981 hci_rx_work+0x197/0x45e net/bluetooth/hci_core.c:4791 process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175 worker_thread+0x4fc/0x670 kernel/workqueue.c:2321 kthread+0x2f0/0x304 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 Freed by task 1732: set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free mm/kasan/kasan.c:521 [inline] __kasan_slab_free+0x106/0x128 mm/kasan/kasan.c:493 slab_free_hook mm/slub.c:1409 [inline] slab_free_freelist_hook+0xaa/0xf6 mm/slub.c:1436 slab_free mm/slub.c:3009 [inline] kfree+0x182/0x21e mm/slub.c:3972 hci_disconn_loglink_complete_evt net/bluetooth/hci_event.c:4891 [inline] hci_event_packet+0x6a1c/0x72fa net/bluetooth/hci_event.c:6050 hci_rx_work+0x197/0x45e net/bluetooth/hci_core.c:4791 process_one_work+0x6f8/0xb50 kernel/workqueue.c:2175 worker_thread+0x4fc/0x670 kernel/workqueue.c:2321 kthread+0x2f0/0x304 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 The buggy address belongs to the object at ffff8881d7af9180 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 24 bytes inside of 128-byte region [ffff8881d7af9180, ffff8881d7af9200) The buggy address belongs to the page: page:ffffea00075ebe40 count:1 mapcount:0 mapping:ffff8881da403200 index:0x0 flags: 0x8000000000000200(slab) raw: 8000000000000200 dead000000000100 dead000000000200 ffff8881da403200 raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881d7af9080: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8881d7af9100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff8881d7af9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881d7af9200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881d7af9280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reported-by: syzbot+98228e7407314d2d4ba2@syzkaller.appspotmail.com Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-12	media: v4l2-ctrls: fix reference to freed memory	Hans Verkuil
	commit ac34b79da14d67a9b494f6125186becbd067e225 upstream. When controls are used together with the Request API, then for each request a v4l2_ctrl_handler struct is allocated. This contains the controls that can be set in a request. If a control is not set in the request, then the value used in the most recent previous request must be used, or the current value if it is not found in any outstanding requests. The framework tried to find such a previous request and it would set the 'req' pointer in struct v4l2_ctrl_ref to the v4l2_ctrl_ref of the control in such a previous request. So far, so good. However, when that previous request was applied to the hardware, returned to userspace, and then userspace would re-init or free that request, any 'ref' pointer in still-queued requests would suddenly point to freed memory. This was not noticed before since the drivers that use this expected that each request would always have the controls set, so there was never any need to find a control in older requests. This requirement was relaxed, and now this bug surfaced. It was also made worse by changeset 2fae4d6aabc8 ("media: v4l2-ctrls: v4l2_ctrl_request_complete() should always set ref->req") which increased the chance of this happening. The use of the 'req' pointer in v4l2_ctrl_ref was very fragile, so drop this entirely. Instead add a valid_p_req bool to indicate that p_req contains a valid value for this control. And if it is false, then just use the current value of the control. Note that VIDIOC_G_EXT_CTRLS will always return -EACCES when attempting to get a control from a request until the request is completed. And in that case, all controls in the request will have the control value set (i.e. valid_p_req is true). This means that the whole 'find the most recent previous request containing a control' idea is pointless, and the code can be simplified considerably. The v4l2_g_ext_ctrls_common() function was refactored a bit to make it more understandable. It also avoids updating volatile controls in a completed request since that was already done when the request was completed. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 2fae4d6aabc8 ("media: v4l2-ctrls: v4l2_ctrl_request_complete() should always set ref->req") Fixes: 6fa6f831f095 ("media: v4l2-ctrls: add core request support") Cc: <stable@vger.kernel.org> # for v5.9 and up Tested-by: Alexandre Courbot <acourbot@chromium.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-12	Fix misc new gcc warnings	Linus Torvalds
	commit e7c6e405e171fb33990a12ecfd14e6500d9e5cf2 upstream. It seems like Fedora 34 ends up enabling a few new gcc warnings, notably "-Wstringop-overread" and "-Warray-parameter". Both of them cause what seem to be valid warnings in the kernel, where we have array size mismatches in function arguments (that are no longer just silently converted to a pointer to element, but actually checked). This fixes most of the trivial ones, by making the function declaration match the function definition, and in the case of intel_pm.c, removing the over-specified array size from the argument declaration. At least one 'stringop-overread' warning remains in the i915 driver, but that one doesn't have the same obvious trivial fix, and may or may not actually be indicative of a bug. [ It was a mistake to upgrade one of my machines to Fedora 34 while being busy with the merge window, but if this is the extent of the compiler upgrade problems, things are better than usual - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-12	perf: Rework perf_event_exit_event()	Peter Zijlstra
	[ Upstream commit ef54c1a476aef7eef26fe13ea10dc090952c00f8 ] Make perf_event_exit_event() more robust, such that we can use it from other contexts. Specifically the up and coming remove_on_exec. For this to work we need to address a few issues. Remove_on_exec will not destroy the entire context, so we cannot rely on TASK_TOMBSTONE to disable event_function_call() and we thus have to use perf_remove_from_context(). When using perf_remove_from_context(), there's two races to consider. The first is against close(), where we can have concurrent tear-down of the event. The second is against child_list iteration, which should not find a half baked event. To address this, teach perf_remove_from_context() to special case !ctx->is_active and about DETACH_CHILD. [ elver@google.com: fix racing parent/child exit in sync_child_event(). ] Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210408103605.1676875-2-elver@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	mfd: da9063: Support SMBus and I2C mode	Hubert Streidl
	[ Upstream commit 586478bfc9f7e16504d6f64cf18bcbdf6fd0cbc9 ] By default the PMIC DA9063 2-wire interface is SMBus compliant. This means the PMIC will automatically reset the interface when the clock signal ceases for more than the SMBus timeout of 35 ms. If the I2C driver / device is not capable of creating atomic I2C transactions, a context change can cause a ceasing of the clock signal. This can happen if for example a real-time thread is scheduled. Then the DA9063 in SMBus mode will reset the 2-wire interface. Subsequently a write message could end up in the wrong register. This could cause unpredictable system behavior. The DA9063 PMIC also supports an I2C compliant mode for the 2-wire interface. This mode does not reset the interface when the clock signal ceases. Thus the problem depicted above does not occur. This patch tests for the bus functionality "I2C_FUNC_I2C". It can reasonably be assumed that the bus cannot obey SMBus timings if this functionality is set. SMBus commands most probably are emulated in this case which is prone to the latency issue described above. This patch enables the I2C bus mode if I2C_FUNC_I2C is set or otherwise keeps the default SMBus mode. Signed-off-by: Hubert Streidl <hubert.streidl@de.bosch.com> Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	mfd: intel-m10-bmc: Fix the register access range	Xu Yilun
	[ Upstream commit d9b326b2c3673f939941806146aee38e5c635fd0 ] This patch fixes the max register address of MAX 10 BMC. The range 0x20000000 ~ 0x200000fc are for control registers of the QSPI flash controller, which are not accessible to host. Signed-off-by: Xu Yilun <yilun.xu@intel.com> Reviewed-by: Tom Rix <trix@redhat.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	power: supply: bq27xxx: fix power_avg for newer ICs	Matthias Schiffer
	[ Upstream commit c4d57c22ac65bd503716062a06fad55a01569cac ] On all newer bq27xxx ICs, the AveragePower register contains a signed value; in addition to handling the raw value as unsigned, the driver code also didn't convert it to µW as expected. At least for the BQ28Z610, the reference manual incorrectly states that the value is in units of 1mW and not 10mW. I have no way of knowing whether the manuals of other supported ICs contain the same error, or if there are models that actually use 1mW. At least, the new code shouldn't be less correct than the old version for any device. power_avg is removed from the cache structure, se we don't have to extend it to store both a signed value and an error code. Always getting an up-to-date value may be desirable anyways, as it avoids inconsistent current and power readings when switching between charging and discharging. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	resource: Prevent irqresource_disabled() from erasing flags	Angela Czubak
	[ Upstream commit d08a745729646f407277e904b02991458f20d261 ] Some Chromebooks use hard-coded interrupts in their ACPI tables. This is an excerpt as dumped on Relm: ... Name (_HID, "ELAN0001") // _HID: Hardware ID Name (_DDN, "Elan Touchscreen ") // _DDN: DOS Device Name Name (_UID, 0x05) // _UID: Unique ID Name (ISTP, Zero) Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (BUF0, ResourceTemplate () { I2cSerialBusV2 (0x0010, ControllerInitiated, 0x00061A80, AddressingMode7Bit, "\\_SB.I2C1", 0x00, ResourceConsumer, , Exclusive, ) Interrupt (ResourceConsumer, Edge, ActiveLow, Exclusive, ,, ) { 0x000000B8, } }) Return (BUF0) /* \_SB_.I2C1.ETSA._CRS.BUF0 */ } ... This interrupt is hard-coded to 0xB8 = 184 which is too high to be mapped to IO-APIC, so no triggering information is propagated as acpi_register_gsi() fails and irqresource_disabled() is issued, which leads to erasing triggering and polarity information. Do not overwrite flags as it leads to erasing triggering and polarity information which might be useful in case of hard-coded interrupts. This way the information can be read later on even though mapping to APIC domain failed. Signed-off-by: Angela Czubak <acz@semihalf.com> [ rjw: Changelog rearrangement ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	random: initialize ChaCha20 constants with correct endianness	Eric Biggers
	[ Upstream commit a181e0fdb2164268274453b5b291589edbb9b22d ] On big endian CPUs, the ChaCha20-based CRNG is using the wrong endianness for the ChaCha20 constants. This doesn't matter cryptographically, but technically it means it's not ChaCha20 anymore. Fix it to always use the standard constants. Cc: linux-crypto@vger.kernel.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	usb: webcam: Invalid size of Processing Unit Descriptor	Pawel Laszczak
	[ Upstream commit 6a154ec9ef6762c774cd2b50215c7a8f0f08a862 ] According with USB Device Class Definition for Video Device the Processing Unit Descriptor bLength should be 12 (10 + bmControlSize), but it has 11. Invalid length caused that Processing Unit Descriptor Test Video form CV tool failed. To fix this issue patch adds bmVideoStandards into uvc_processing_unit_descriptor structure. The bmVideoStandards field was added in UVC 1.1 and it wasn't part of UVC 1.0a. Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Pawel Laszczak <pawell@cadence.com> Reviewed-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/20210315071748.29706-1-pawell@gli-login.cadence.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	crypto: api - check for ERR pointers in crypto_destroy_tfm()	Ard Biesheuvel
	[ Upstream commit 83681f2bebb34dbb3f03fecd8f570308ab8b7c2c ] Given that crypto_alloc_tfm() may return ERR pointers, and to avoid crashes on obscure error paths where such pointers are presented to crypto_destroy_tfm() (such as [0]), add an ERR_PTR check there before dereferencing the second argument as a struct crypto_tfm pointer. [0] https://lore.kernel.org/linux-crypto/000000000000de949705bc59e0f6@google.com/ Reported-by: syzbot+12cf5fbfdeba210a89dd@syzkaller.appspotmail.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-12	mmc: core: Fix hanging on I/O during system suspend for removable cards	Ulf Hansson
	commit 17a17bf50612e6048a9975450cf1bd30f93815b5 upstream. The mmc core uses a PM notifier to temporarily during system suspend, turn off the card detection mechanism for removal/insertion of (e)MMC/SD/SDIO cards. Additionally, the notifier may be used to remove an SDIO card entirely, if a corresponding SDIO functional driver don't have the system suspend/resume callbacks assigned. This behaviour has been around for a very long time. However, a recent bug report tells us there are problems with this approach. More precisely, when receiving the PM_SUSPEND_PREPARE notification, we may end up hanging on I/O to be completed, thus also preventing the system from getting suspended. In the end what happens, is that the cancel_delayed_work_sync() in mmc_pm_notify() ends up waiting for mmc_rescan() to complete - and since mmc_rescan() wants to claim the host, it needs to wait for the I/O to be completed first. Typically, this problem is triggered in Android, if there is ongoing I/O while the user decides to suspend, resume and then suspend the system again. This due to that after the resume, an mmc_rescan() work gets punted to the workqueue, which job is to verify that the card remains inserted after the system has resumed. To fix this problem, userspace needs to become frozen to suspend the I/O, prior to turning off the card detection mechanism. Therefore, let's drop the PM notifiers for mmc subsystem altogether and rely on the card detection to be turned off/on as a part of the system_freezable_wq, that we are already using. Moreover, to allow and SDIO card to be removed during system suspend, let's manage this from a ->prepare() callback, assigned at the mmc_host_class level. In this way, we can use the parent device (the mmc_host_class device), to remove the card device that is the child, in the device_prepare() phase. Reported-by: Kiwoong Kim <kwmad.kim@samsung.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20210310152900.149380-1-ulf.hansson@linaro.org Reviewed-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-12	reset: add missing empty function reset_control_rearm()	Jim Quinlan
	commit 48582b2e3b87b794a9845d488af2c76ce055502b upstream. All other functions are defined for when CONFIG_RESET_CONTROLLER is not set. Fixes: 557acb3d2cd9 ("reset: make shared pulsed reset controls re-triggerable") Link: https://lore.kernel.org/r/20210430152156.21162-2-jim2101024@gmail.com Signed-off-by: Jim Quinlan <jim2101024@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-07	bpf: Fix leakage of uninitialized bpf stack under speculation	Daniel Borkmann
	commit 801c6058d14a82179a7ee17a4b532cac6fad067f upstream. The current implemented mechanisms to mitigate data disclosure under speculation mainly address stack and map value oob access from the speculative domain. However, Piotr discovered that uninitialized BPF stack is not protected yet, and thus old data from the kernel stack, potentially including addresses of kernel structures, could still be extracted from that 512 bytes large window. The BPF stack is special compared to map values since it's not zero initialized for every program invocation, whereas map values /are/ zero initialized upon their initial allocation and thus cannot leak any prior data in either domain. In the non-speculative domain, the verifier ensures that every stack slot read must have a prior stack slot write by the BPF program to avoid such data leaking issue. However, this is not enough: for example, when the pointer arithmetic operation moves the stack pointer from the last valid stack offset to the first valid offset, the sanitation logic allows for any intermediate offsets during speculative execution, which could then be used to extract any restricted stack content via side-channel. Given for unprivileged stack pointer arithmetic the use of unknown but bounded scalars is generally forbidden, we can simply turn the register-based arithmetic operation into an immediate-based arithmetic operation without the need for masking. This also gives the benefit of reducing the needed instructions for the operation. Given after the work in 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask"), the aux->alu_limit already holds the final immediate value for the offset register with the known scalar. Thus, a simple mov of the immediate to AX register with using AX as the source for the original instruction is sufficient and possible now in this case. Reported-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-23	Merge tag 'gpio-fixes-for-v5.12' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fix from Bartosz Golaszewski: "Save and restore the sysconfig register in gpio-omap to fix a power-management issue" * tag 'gpio-fixes-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: omap: Save and restore sysconfig
2021-04-21	gpio: omap: Save and restore sysconfig	Tony Lindgren
	As we are using cpu_pm to save and restore context, we must also save and restore the GPIO sysconfig register. This is needed because we are not calling PM runtime functions at all with cpu_pm. We need to save the sysconfig on idle as it's value can get reconfigured by PM runtime and can be different from the init time value. Device specific flags like "ti,no-idle-on-init" can affect the init value. Fixes: b764a5863fd8 ("gpio: omap: Remove custom PM calls and use cpu_pm instead") Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Adam Ford <aford173@gmail.com> Cc: Andreas Kemnade <andreas@kemnade.info> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-04-20	capabilities: require CAP_SETFCAP to map uid 0	Serge E. Hallyn
	cap_setfcap is required to create file capabilities. Since commit 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a process running as uid 0 but without cap_setfcap is able to work around this as follows: unshare a new user namespace which maps parent uid 0 into the child namespace. While this task will not have new capabilities against the parent namespace, there is a loophole due to the way namespaced file capabilities are represented as xattrs. File capabilities valid in userns 1 are distinguished from file capabilities valid in userns 2 by the kuid which underlies uid 0. Therefore the restricted root process can unshare a new self-mapping namespace, add a namespaced file capability onto a file, then use that file capability in the parent namespace. To prevent that, do not allow mapping parent uid 0 if the process which opened the uid_map file does not have CAP_SETFCAP, which is the capability for setting file capabilities. As a further wrinkle: a task can unshare its user namespace, then open its uid_map file itself, and map (only) its own uid. In this case we do not have the credential from before unshare, which was potentially more restricted. So, when creating a user namespace, we record whether the creator had CAP_SETFCAP. Then we can use that during map_write(). With this patch: 1. Unprivileged user can still unshare -Ur ubuntu@caps:~$ unshare -Ur root@caps:~# logout 2. Root user can still unshare -Ur ubuntu@caps:~$ sudo bash root@caps:/home/ubuntu# unshare -Ur root@caps:/home/ubuntu# logout 3. Root user without CAP_SETFCAP cannot unshare -Ur: root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap unable to set CAP_SETFCAP effective capability: Operation not permitted root@caps:/home/ubuntu# unshare -Ur unshare: write failed /proc/self/uid_map: Operation not permitted Note: an alternative solution would be to allow uid 0 mappings by processes without CAP_SETFCAP, but to prevent such a namespace from writing any file capabilities. This approach can be seen at [1]. Background history: commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file capabilities") tried to fix the issue by preventing v3 fscaps to be written to disk when the root uid would map to the same uid in nested user namespaces. This led to regressions for various workloads. For example, see [2]. Ultimately this is a valid use-case we have to support meaning we had to revert this change in 3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")"). Link: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 [1] Link: https://github.com/containers/buildah/issues/3071 [2] Signed-off-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Andrew G. Morgan <morgan@kernel.org> Tested-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-17	Merge tag 'net-5.12-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.12-rc8, including fixes from netfilter, and bpf. BPF verifier changes stand out, otherwise things have slowed down. Current release - regressions: - gro: ensure frag0 meets IP header alignment - Revert "net: stmmac: re-init rx buffers when mac resume back" - ethernet: macb: fix the restore of cmp registers Previous releases - regressions: - ixgbe: Fix NULL pointer dereference in ethtool loopback test - ixgbe: fix unbalanced device enable/disable in suspend/resume - phy: marvell: fix detection of PHY on Topaz switches - make tcp_allowed_congestion_control readonly in non-init netns - xen-netback: Check for hotplug-status existence before watching Previous releases - always broken: - bpf: mitigate a speculative oob read of up to map value size by tightening the masking window - sctp: fix race condition in sctp_destroy_sock - sit, ip6_tunnel: Unregister catch-all devices - netfilter: nftables: clone set element expression template - netfilter: flowtable: fix NAT IPv6 offload mangling - net: geneve: check skb is large enough for IPv4/IPv6 header - netlink: don't call ->netlink_bind with table lock held" * tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) netlink: don't call ->netlink_bind with table lock held MAINTAINERS: update my email bpf: Update selftests to reflect new error states bpf: Tighten speculative pointer arithmetic mask bpf: Move sanitize_val_alu out of op switch bpf: Refactor and streamline bounds check into helper bpf: Improve verifier error messages for users bpf: Rework ptr_limit into alu_limit and add common error path bpf: Ensure off_reg has no mixed signed bounds for all types bpf: Move off_reg into sanitize_ptr_alu bpf: Use correct permission flag for mixed signed bounds arithmetic ch_ktls: do not send snd_una update to TCB in middle ch_ktls: tcb close causes tls connection failure ch_ktls: fix device connection close ch_ktls: Fix kernel panic i40e: fix the panic when running bpf in xdpdrv mode net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta net/mlx5e: Fix setting of RS FEC mode net/mlx5: Fix setting of devlink traps in switchdev mode Revert "net: stmmac: re-init rx buffers when mac resume back" ...