summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-09-27nvme: improve the NVME_CONNECT_AUTHREQ* definitionsChristoph Hellwig
Mark them as unsigned so that we don't need extra casts, and define them relative to cdword0 instead of requiring extra shifts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-09-26blk-cgroup: pass a gendisk to blkcg_schedule_throttleChristoph Hellwig
Pass the gendisk to blkcg_schedule_throttle as part of moving the blk-cgroup infrastructure to be gendisk based. Remove the unused !BLK_CGROUP stub while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220921180501.1539876-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-23ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY supportZiyangZhang
START_USER_RECOVERY and END_USER_RECOVERY are two new control commands to support user recovery feature. After a crash, user should send START_USER_RECOVERY, it will: (1) check if (a)current ublk_device is UBLK_S_DEV_QUIESCED which was set by quiesce_work and (b)chardev is released (2) reinit all ubqs, including: (a) put the task_struct and reset ->ubq_daemon to NULL. (b) reset all ublk_io. (3) reset ub->mm to NULL. Then, user should start a new process and send FETCH_REQ on each ubq_daemon. Finally, user should send END_USER_RECOVERY, it will: (1) wait for all new ubq_daemons getting ready. (2) update ublksrv_pid (3) unquiesce the request queue and expect incoming ublk_queue_rq() (4) convert ub's state to UBLK_S_DEV_LIVE Note: we can handle STOP_DEV between START_USER_RECOVERY and END_USER_RECOVERY. This is helpful to users who cannot start new process after sending START_USER_RECOVERY ctrl-cmd. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-7-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-23ublk_drv: support UBLK_F_USER_RECOVERY_REISSUEZiyangZhang
UBLK_F_USER_RECOVERY_REISSUE implies that: With a dying ubq_daemon, ublk_drv let monitor_work requeues rq issued to userspace(ublksrv) before the ubq_daemon is dying. UBLK_F_USER_RECOVERY_REISSUE is designed for backends which: (1) tolerate double-write since ublk_drv may issue the same rq twice. (2) does not let frontend users get I/O error, such as read-only FS and VM backend. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-6-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-23ublk_drv: define macros for recovery feature and check themZiyangZhang
Define some macros for recovery feature. UBLK_S_DEV_QUIESCED implies that ublk_device is quiesced and is ready for recovery. This state can be observed by userspace. UBLK_F_USER_RECOVERY implies that: (1) ublk_drv enables recovery feature. It won't let monitor_work to automatically abort rqs and release the device. (2) With a dying ubq_daemon, ublk_drv ends(aborts) rqs issued to userspace(ublksrv) before crash. (3) With a dying ubq_daemon, in task work and ublk_queue_rq(), ublk_drv requeues rqs. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-3-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21block: Fix the enum blk_eh_timer_return documentationBart Van Assche
The documentation of the blk_eh_timer_return enumeration values does not reflect correctly how e.g. the SCSI core uses these values. Fix the documentation. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Fixes: 88b0cfad2888 ("block: document the blk_eh_timer_return values") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220920200626.3422296-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-20block: remove PSI accounting from the bio layerChristoph Hellwig
PSI accounting is now done by the VM code, where it should have been since the beginning. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220915094200.139713-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-20mm: add PSI accounting around ->read_folio and ->readahead callsChristoph Hellwig
PSI tries to account for the cost of bringing back in pages discarded by the MM LRU management. Currently the prime place for that is hooked into the bio submission path, which is a rather bad place: - it does not actually account I/O for non-block file systems, of which we have many - it adds overhead and a layering violation to the block layer Add the accounting into the two places in the core MM code that read pages into an address space by calling into ->read_folio and ->readahead so that the entire file system operations are covered, to broaden the coverage and allow removing the accounting in the block layer going forward. As psi_memstall_enter can deal with nested calls this will not lead to double accounting even while the bio annotations are still present. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220915094200.139713-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-12blk-throttle: fix that io throttle can only work for single bioYu Kuai
Test scripts: cd /sys/fs/cgroup/blkio/ echo "8:0 1024" > blkio.throttle.write_bps_device echo $$ > cgroup.procs dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & Test result: 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s The problem is that the second bio is finished after 10s instead of 20s. Root cause: 1) second bio will be flagged: __blk_throtl_bio while (true) { ... if (sq->nr_queued[rw]) -> some bio is throttled already break }; bio_set_flag(bio, BIO_THROTTLED); -> flag the bio 2) flagged bio will be dispatched without waiting: throtl_dispatch_tg tg_may_dispatch tg_with_in_bps_limit if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) *wait = 0; -> wait time is zero return true; commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") support to count split bios for iops limit, thus it adds flagged bio checking in tg_with_in_bps_limit() so that split bios will only count once for bps limit, however, it introduce a new problem that io throttle won't work if multiple bios are throttled. In order to fix the problem, handle iops/bps limit in different ways: 1) for iops limit, there is no flag to record if the bio is throttled, and iops is always applied. 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED, and io throttle will ignore bio with the flag. Noted this patch also remove the code to set flag in __bio_clone(), it's introduced in commit 111be8839817 ("block-throttle: avoid double charge"), and author thinks split bio can be resubmited and throttled again, which is wrong because split bio will continue to dispatch from caller. Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") Cc: <stable@vger.kernel.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-12sbitmap: fix batched wait_cnt accountingKeith Busch
Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-04Revert "sbitmap: fix batched wait_cnt accounting"Jens Axboe
This reverts commit 16ede66973c84f890c03584f79158dd5b2d725f5. This is causing issues with CPU stalls on my test box, revert it for now until we understand what is going on. It looks like infinite looping off sbitmap_queue_wake_up(), but hard to tell with a lot of CPUs hitting this issue and the console scrolling infinitely. Link: https://lore.kernel.org/linux-block/e742813b-ce5c-0d58-205b-1626f639b1bd@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-01sbitmap: fix batched wait_cnt accountingKeith Busch
Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220825145312.1217900-1-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22block: shrink rq_map_data a bitJens Axboe
We don't need full ints for several of these members. Change the page_order and nr_entries to unsigned shorts, and the true/false from_user and null_mapped to booleans. This shrinks the struct from 32 to 24 bytes on 64-bit archs. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22block: Change the return type of blk_mq_map_queues() into voidBart Van Assche
Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22block: sed-opal: Add ioctl to return device statusdougmill@linux.vnet.ibm.com
Provide a mechanism to retrieve basic status information about the device, including the "supported" flag indicating whether SED-OPAL is supported. The information returned is from the various feature descriptors received during the discovery0 step, and so this ioctl does nothing more than perform the discovery0 step and then save the information received. See "struct opal_status" and OPAL_FL_* bits for the status information currently returned. This is necessary to be able to check whether a device is OPAL enabled, set up, locked or unlocked from userspace programs like systemd-cryptsetup and libcryptsetup. Right now we just have to assume the user 'knows' or blindly attempt setup/lock/unlock operations. Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Tested-by: Luca Boccassi <bluca@debian.org> Reviewed-by: Scott Bauer <sbauer@plzdonthack.me> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20220816140713.84893-1-luca.boccassi@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-20Merge tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes that should go into this release: - Small series of patches for ublk (ZiyangZhang) - Remove dead function (Yu) - Fix for running a block queue in case of resource starvation (Yufen)" * tag 'block-6.0-2022-08-19' of git://git.kernel.dk/linux-block: blk-mq: run queue no matter whether the request is the last request blk-mq: remove unused function blk_mq_queue_stopped() ublk_drv: do not add a re-issued request aborted previously to ioucmd's task_work ublk_drv: update comment for __ublk_fail_req() ublk_drv: check ubq_daemon_is_dying() in __ublk_rq_task_work() ublk_drv: update iod->addr for UBLK_IO_NEED_GET_DATA
2022-08-20Merge tag 'ata-6.0-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fixes from Damien Le Moal: - Add a missing command name definition for ata_get_cmd_name(), from me. - A fix to address a performance regression due to the default max_sectors queue limit for ATA devices connected to AHCI adapters being too small, from John. * tag 'ata-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: libata: Set __ATA_BASE_SHT max_sectors ata: libata-eh: Add missing command name
2022-08-21ata: libata: Set __ATA_BASE_SHT max_sectorsJohn Garry
Commit 0568e6122574 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors") inadvertently capped the max_sectors value for some SATA disks to a value which is lower than we would want. For a device which supports LBA48, we would previously have request queue max_sectors_kb and max_hw_sectors_kb values of 1280 and 32767 respectively. For AHCI controllers, the value chosen for shost max sectors comes from the minimum of the SCSI host default max sectors in SCSI_DEFAULT_MAX_SECTORS (1024) and the shost DMA device mapping limit. This means that we would now set the max_sectors_kb and max_hw_sectors_kb values for a disk which supports LBA48 at 512, ignoring DMA mapping limit. As report by Oliver at [0], this caused a performance regression. Fix by picking a large enough max sectors value for ATA host controllers such that we don't needlessly reduce max_sectors_kb for LBA48 disks. [0] https://lore.kernel.org/linux-ide/YvsGbidf3na5FpGb@xsang-OptiPlex-9020/T/#m22d9fc5ad15af66066dd9fecf3d50f1b1ef11da3 Fixes: 0568e6122574 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors") Reported-by: Oliver Sang <oliver.sang@intel.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-08-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK - Tidy-up handling of AArch32 on asymmetric systems x86: - Fix 'missing ENDBR' BUG for fastop functions Generic: - Some cleanup and static analyzer patches - More fixes to KVM_CREATE_VM unwind paths" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device() KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow() x86/kvm: Fix "missing ENDBR" BUG for fastop functions x86/kvm: Simplify FOP_SETCC() x86/ibt, objtool: Add IBT_NOSEAL() KVM: Rename mmu_notifier_* to mmu_invalidate_* KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS KVM: MIPS: remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS KVM: Move coalesced MMIO initialization (back) into kvm_create_vm() KVM: Unconditionally get a ref to /dev/kvm module when creating a VM KVM: Properly unwind VM creation if creating debugfs fails KVM: arm64: Reject 32bit user PSTATE on asymmetric systems KVM: arm64: Treat PMCR_EL1.LC as RES1 on asymmetric systems KVM: arm64: Fix compile error due to sign extension
2022-08-19Merge tag 'bitmap-6.0-rc2' of https://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: "cpumask: UP optimisation fixes follow-up As an older version of the UP optimisation fixes was merged, not all review feedback has been implemented. This implements the feedback received on the merged version [1], and the respin [2], for changes related to <linux/cpumask.h> and lib/cpumask.c" Link: https://lore.kernel.org/lkml/cover.1656777646.git.sander@svanheule.net/ [1] Link: https://lore.kernel.org/lkml/cover.1659077534.git.sander@svanheule.net/ [2] It spent for more than a week with no issues. * tag 'bitmap-6.0-rc2' of https://github.com/norov/linux: lib/cpumask: drop always-true preprocessor guard lib/cpumask: add inline cpumask_next_wrap() for UP cpumask: align signatures of UP implementations
2022-08-19KVM: Rename mmu_notifier_* to mmu_invalidate_*Chao Peng
The motivation of this renaming is to make these variables and related helper functions less mmu_notifier bound and can also be used for non mmu_notifier based page invalidation. mmu_invalidate_* was chosen to better describe the purpose of 'invalidating' a page that those variables are used for. - mmu_notifier_seq/range_start/range_end are renamed to mmu_invalidate_seq/range_start/range_end. - mmu_notifier_retry{_hva} helper functions are renamed to mmu_invalidate_retry{_hva}. - mmu_notifier_count is renamed to mmu_invalidate_in_progress to avoid confusion with mn_active_invalidate_count. - While here, also update kvm_inc/dec_notifier_count() to kvm_mmu_invalidate_begin/end() to match the change for mmu_notifier_count. No functional change intended. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTSChao Peng
KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM internally used (invisible to userspace) and avoids confusion to future private slots that can have different meaning. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-18Merge tag 'net-6.0-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - tcp: fix cleanup and leaks in tcp_read_skb() (the new way BPF socket maps get data out of the TCP stack) - tls: rx: react to strparser initialization errors - netfilter: nf_tables: fix scheduling-while-atomic splat - net: fix suspicious RCU usage in bpf_sk_reuseport_detach() Current release - new code bugs: - mlxsw: ptp: fix a couple of races, static checker warnings and error handling Previous releases - regressions: - netfilter: - nf_tables: fix possible module reference underflow in error path - make conntrack helpers deal with BIG TCP (skbs > 64kB) - nfnetlink: re-enable conntrack expectation events - net: fix potential refcount leak in ndisc_router_discovery() Previous releases - always broken: - sched: cls_route: disallow handle of 0 - neigh: fix possible local DoS due to net iface start/stop loop - rtnetlink: fix module refcount leak in rtnetlink_rcv_msg - sched: fix adding qlen to qcpu->backlog in gnet_stats_add_queue_cpu - virtio_net: fix endian-ness for RSS - dsa: mv88e6060: prevent crash on an unused port - fec: fix timer capture timing in `fec_ptp_enable_pps()` - ocelot: stats: fix races, integer wrapping and reading incorrect registers (the change of register definitions here accounts for bulk of the changed LoC in this PR)" * tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) net: moxa: MAC address reading, generating, validity checking tcp: handle pure FIN case correctly tcp: refactor tcp_read_skb() a bit tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() tcp: fix sock skb accounting in tcp_read_skb() igb: Add lock to avoid data race dt-bindings: Fix incorrect "the the" corrections net: genl: fix error path memory leak in policy dumping stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove() net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_run net/mlx5e: Allocate flow steering storage during uplink initialization net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset net: mscc: ocelot: make struct ocelot_stat_layout array indexable net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work net: mscc: ocelot: turn stats_lock into a spinlock net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it ...
2022-08-18blk-mq: remove unused function blk_mq_queue_stopped()Yu Kuai
blk_mq_queue_stopped() doesn't have any caller, which was found by code coverage test, thus remove it. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220818063555.3741222-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-17net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offsetVladimir Oltean
With so many counter addresses recently discovered as being wrong, it is desirable to at least have a central database of information, rather than two: one through the SYS_COUNT_* registers (used for ndo_get_stats64), and the other through the offset field of struct ocelot_stat_layout elements (used for ethtool -S). The strategy will be to keep the SYS_COUNT_* definitions as the single source of truth, but for that we need to expand our current definitions to cover all registers. Then we need to convert the ocelot region creation logic, and stats worker, to the read semantics imposed by going through SYS_COUNT_* absolute register addresses, rather than offsets of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been SYS_CNT, by the way). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: make struct ocelot_stat_layout array indexableVladimir Oltean
The ocelot counters are 32-bit and require periodic reading, every 2 seconds, by ocelot_port_update_stats(), so that wraparounds are detected. Currently, the counters reported by ocelot_get_stats64() come from the 32-bit hardware counters directly, rather than from the 64-bit accumulated ocelot->stats, and this is a problem for their integrity. The strategy is to make ocelot_get_stats64() able to cherry-pick individual stats from ocelot->stats the way in which it currently reads them out from SYS_COUNT_* registers. But currently it can't, because ocelot->stats is an opaque u64 array that's used only to feed data into ethtool -S. To solve that problem, we need to make ocelot->stats indexable, and associate each element with an element of struct ocelot_stat_layout used by ethtool -S. This makes ocelot_stat_layout a fat (and possibly sparse) array, so we need to change the way in which we access it. We no longer need OCELOT_STAT_END as a sentinel, because we know the array's size (OCELOT_NUM_STATS). We just need to skip the array elements that were left unpopulated for the switch revision (ocelot, felix, seville). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: turn stats_lock into a spinlockVladimir Oltean
ocelot_get_stats64() currently runs unlocked and therefore may collide with ocelot_port_update_stats() which indirectly accesses the same counters. However, ocelot_get_stats64() runs in atomic context, and we cannot simply take the sleepable ocelot->stats_lock mutex. We need to convert it to an atomic spinlock first. Do that as a preparatory change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: fix incorrect ndo_get_stats64 packet countersVladimir Oltean
Reading stats using the SYS_COUNT_* register definitions is only used by ocelot_get_stats64() from the ocelot switchdev driver, however, currently the bucket definitions are incorrect. Separately, on both RX and TX, we have the following problems: - a 256-1023 bucket which actually tracks the 256-511 packets - the 1024-1526 bucket actually tracks the 512-1023 packets - the 1527-max bucket actually tracks the 1024-1526 packets => nobody tracks the packets from the real 1527-max bucket Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS all track the wrong thing. However this doesn't seem to have any consequence, since ocelot_get_stats64() doesn't use these. Even though this problem only manifests itself for the switchdev driver, we cannot split the fix for ocelot and for DSA, since it requires fixing the bucket definitions from enum ocelot_reg, which makes us necessarily adapt the structures from felix and seville as well. Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Florian Westphal says: ==================== netfilter: conntrack and nf_tables bug fixes The following patchset contains netfilter fixes for net. Broken since 5.19: A few ancient connection tracking helpers assume TCP packets cannot exceed 64kb in size, but this isn't the case anymore with 5.19 when BIG TCP got merged, from myself. Regressions since 5.19: 1. 'conntrack -E expect' won't display anything because nfnetlink failed to enable events for expectations, only for normal conntrack events. 2. partially revert change that added resched calls to a function that can be in atomic context. Both broken and fixed up by myself. Broken for several releases (up to original merge of nf_tables): Several fixes for nf_tables control plane, from Pablo. This fixes up resource leaks in error paths and adds more sanity checks for mutually exclusive attributes/flags. Kconfig: NF_CONNTRACK_PROCFS is very old and doesn't provide all info provided via ctnetlink, so it should not default to y. From Geert Uytterhoeven. Selftests: rework nft_flowtable.sh: it frequently indicated failure; the way it tried to detect an offload failure did not work reliably. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: testing: selftests: nft_flowtable.sh: rework test to detect offload failure testing: selftests: nft_flowtable.sh: use random netns names netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag netfilter: nf_tables: really skip inactive sets when allocating name netfilter: nfnetlink: re-enable conntrack expectation events netfilter: nf_tables: fix scheduling-while-atomic splat netfilter: nf_ct_irc: cap packet search space to 4k netfilter: nf_ct_ftp: prefer skb_linearize netfilter: nf_ct_h323: cap packet size at 64k netfilter: nf_ct_sane: remove pseudo skb linearization netfilter: nf_tables: possible module reference underflow in error path netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access ==================== Link: https://lore.kernel.org/r/20220817140015.25843-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: Fix suspicious RCU usage in bpf_sk_reuseport_detach()David Howells
bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags() to obtain the value of sk->sk_user_data, but that function is only usable if the RCU read lock is held, and neither that function nor any of its callers hold it. Fix this by adding a new helper, __locked_read_sk_user_data_with_flags() that checks to see if sk->sk_callback_lock() is held and use that here instead. Alternatively, making __rcu_dereference_sk_user_data_with_flags() use rcu_dereference_checked() might suffice. Without this, the following warning can be occasionally observed: ============================= WARNING: suspicious RCU usage 6.0.0-rc1-build2+ #563 Not tainted ----------------------------- include/net/sock.h:592 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 5 locks held by locktest/29873: #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4 stack backtrace: CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f bpf_sk_reuseport_detach+0x6d/0xa4 reuseport_detach_sock+0x75/0xdd inet_unhash+0xa5/0x1c0 tcp_set_state+0x169/0x20f ? lockdep_sock_is_held+0x3a/0x3a ? __lock_release.isra.0+0x13e/0x220 ? reacquire_held_locks+0x1bb/0x1bb ? hlock_class+0x31/0x96 ? mark_lock+0x9e/0x1af __tcp_close+0x50/0x4b6 tcp_close+0x28/0x70 inet_release+0x8e/0xa7 __sock_release+0x95/0x121 sock_close+0x14/0x17 __fput+0x20f/0x36a task_work_run+0xa3/0xcc exit_to_user_mode_prepare+0x9c/0x14d syscall_exit_to_user_mode+0x18/0x44 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: cf8c1e967224 ("net: refactor bpf_sk_reuseport_detach()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Hawkins Jiawei <yin31149@gmail.com> Link: https://lore.kernel.org/r/166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Most notably this drops the commits that trip up google cloud (turns out, any legacy device). Plus a kerneldoc patch" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio: kerneldocs fixes and enhancements virtio: Revert "virtio: find_vqs() add arg sizes" virtio_vdpa: Revert "virtio_vdpa: support the arg sizes of find_vqs()" virtio_pci: Revert "virtio_pci: support the arg sizes of find_vqs()" virtio-mmio: Revert "virtio_mmio: support the arg sizes of find_vqs()" virtio: Revert "virtio: add helper virtio_find_vqs_ctx_size()" virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"
2022-08-16locking/atomic: Make test_and_*_bit() ordered on failureHector Martin
These operations are documented as always ordered in include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer type use cases where one side needs to ensure a flag is left pending after some shared data was updated rely on this ordering, even in the failure case. This is the case with the workqueue code, which currently suffers from a reproducible ordering violation on Apple M1 platforms (which are notoriously out-of-order) that ends up causing the TTY layer to fail to deliver data to userspace properly under the right conditions. This change fixes that bug. Change the documentation to restrict the "no order on failure" story to the _lock() variant (for which it makes sense), and remove the early-exit from the generic implementation, which is what causes the missing barrier semantics in that case. Without this, the remaining atomic op is fully ordered (including on ARM64 LSE, as of recent versions of the architecture spec). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: e986a0d6cb36 ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs") Fixes: 61e02392d3c7 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()") Signed-off-by: Hector Martin <marcan@marcan.st> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-16virtio: kerneldocs fixes and enhancementsRicardo Cañuelo
Fix variable names in some kerneldocs, naming in others. Add kerneldocs for struct vring_desc and vring_interrupt. Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Message-Id: <20220810094004.1250-2-ricardo.canuelo@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2022-08-16virtio: Revert "virtio: find_vqs() add arg sizes"Michael S. Tsirkin
This reverts commit a10fba0377145fccefea4dc4dd5915b7ed87e546: the proposed API isn't supported on all transports but no effort was made to address this. It might not be hard to fix if we want to: maybe just rename size to size_hint and make sure legacy transports ignore the hint. But it's not sure what the benefit is in any case, so let's drop it. Fixes: a10fba037714 ("virtio: find_vqs() add arg sizes") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220816053602.173815-8-mst@redhat.com>
2022-08-16virtio: Revert "virtio: add helper virtio_find_vqs_ctx_size()"Michael S. Tsirkin
This reverts commit fe3dc04e31aa51f91dc7f741a5f76cc4817eb5b4: the API is now unused and in fact can't be implemented on top of a legacy device. Fixes: fe3dc04e31aa ("virtio: add helper virtio_find_vqs_ctx_size()") Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220816053602.173815-3-mst@redhat.com>
2022-08-15lib/cpumask: add inline cpumask_next_wrap() for UPSander Vanheule
In the uniprocessor case, cpumask_next_wrap() can be simplified, as the number of valid argument combinations is limited: - 'start' can only be 0 - 'n' can only be -1 or 0 The only valid CPU that can then be returned, if any, will be the first one set in the provided 'mask'. For NR_CPUS == 1, include/linux/cpumask.h now provides an inline definition of cpumask_next_wrap(), which will conflict with the one provided by lib/cpumask.c. Make building of lib/cpumask.o again depend on CONFIG_SMP=y (i.e. NR_CPUS > 1) to avoid the re-definition. Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Sander Vanheule <sander@svanheule.net> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-08-15cpumask: align signatures of UP implementationsSander Vanheule
Between the generic version, and their uniprocessor optimised implementations, the return types of cpumask_any_and_distribute() and cpumask_any_distribute() are not identical. Change the UP versions to 'unsigned int', to match the generic versions. Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Sander Vanheule <sander@svanheule.net> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-08-15neighbour: make proxy_queue.qlen limit per-deviceAlexander Mikhalitsyn
Right now we have a neigh_param PROXY_QLEN which specifies maximum length of neigh_table->proxy_queue. But in fact, this limitation doesn't work well because check condition looks like: tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN) The problem is that p (struct neigh_parms) is a per-device thing, but tbl (struct neigh_table) is a system-wide global thing. It seems reasonable to make proxy_queue limit per-device based. v2: - nothing changed in this patch v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-14radix-tree: replace gfp.h inclusion with gfp_types.hYury Norov
Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we have gfp_types.h for this. Fixes powerpc allmodconfig build: In file included from include/linux/nodemask.h:97, from include/linux/mmzone.h:17, from include/linux/gfp.h:7, from include/linux/radix-tree.h:12, from include/linux/idr.h:15, from include/linux/kernfs.h:12, from include/linux/sysfs.h:16, from include/linux/kobject.h:20, from include/linux/pci.h:35, from arch/powerpc/kernel/prom_init.c:24: include/linux/random.h: In function 'add_latent_entropy': >> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'? 25 | add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy)); | ^~~~~~~~~~~~~~ | add_latent_entropy include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in Reported-by: kernel test robot <lkp@intel.com> CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-14Merge tag 'for-linus-6.0-rc1b-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - fix the handling of the "persistent grants" feature negotiation between Xen blkfront and Xen blkback drivers - a cleanup of xen.config and adding xen.config to Xen section in MAINTAINERS - support HVMOP_set_evtchn_upcall_vector, which is more compliant to "normal" interrupt handling than the global callback used up to now - further small cleanups * tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections xen: remove XEN_SCRUB_PAGES in xen.config xen/pciback: Fix comment typo xen/xenbus: fix return type in xenbus_file_read() xen-blkfront: Apply 'feature_persistent' parameter when connect xen-blkback: Apply 'feature_persistent' parameter when connect xen-blkback: fix persistent grants negotiation x86/xen: Add support for HVMOP_set_evtchn_upcall_vector
2022-08-13Merge tag 'timers-urgent-2022-08-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Misc timer fixes: - fix a potential use-after-free bug in posix timers - correct a prototype - address a build warning" * tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Cleanup CPU timers before freeing them during exec time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64 posix-timers: Make do_clock_gettime() static
2022-08-13Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "Mostly small bug fixes and trivial updates. The major new core update is a change to the way device, target and host reference counting is done to try to make it more robust (this change has soaked for a while to try to winkle out any bugs)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: pm8001: Fix typo 'the the' in comment scsi: megaraid_sas: Remove redundant variable cmd_type scsi: FlashPoint: Remove redundant variable bm_int_st scsi: zfcp: Fix missing auto port scan and thus missing target ports scsi: core: Call blk_mq_free_tag_set() earlier scsi: core: Simplify LLD module reference counting scsi: core: Make sure that hosts outlive targets scsi: core: Make sure that targets outlive devices scsi: ufs: ufs-pci: Correct check for RESET DSM scsi: target: core: De-RCU of se_lun and se_lun acl scsi: target: core: Fix race during ACL removal scsi: ufs: core: Correct ufshcd_shutdown() flow scsi: ufs: core: Increase the maximum data buffer size scsi: lpfc: Check the return value of alloc_workqueue()
2022-08-13Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Regression fix for this merge window, fixing a wrong order of arguments for io_req_set_res() for passthru (Dylan) - Fix for the audit code leaking context memory (Peilin) - Ensure that provided buffers are memcg accounted (Pavel) - Correctly handle short zero-copy sends (Pavel) - Sparse warning fixes for the recvmsg multishot command (Dylan) - Error handling fix for passthru (Anuj) - Remove randomization of struct kiocb fields, to avoid it growing in size if re-arranged in such a fashion that it grows more holes or padding (Keith, Linus) - Small series improving type safety of the sqe fields (Stefan) * tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block: io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields io_uring: make io_kiocb_to_cmd() typesafe fs: don't randomize struct kiocb fields io_uring: consistently make use of io_notif_to_data() io_uring: fix error handling for io_uring_cmd io_uring: fix io_recvmsg_prep_multishot sparse warnings io_uring/net: send retry for zerocopy io_uring: mem-account pbuf buckets audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() io_uring: pass correct parameters to io_req_set_res
2022-08-12io_uring: make io_kiocb_to_cmd() typesafeStefan Metzmacher
We need to make sure (at build time) that struct io_cmd_data is not casted to a structure that's larger. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-12fs: don't randomize struct kiocb fieldsKeith Busch
This is a size sensitive structure and randomizing can introduce extra padding that breaks io_uring's fixed size expectations. There are few fields here as it is, half of which need a fixed order to optimally pack, so the randomization isn't providing much. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/io-uring/b6f508ca-b1b2-5f40-7998-e4cff1cf7212@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-12Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - A huge patchset supporting vq resize using the new vq reset capability - Features, fixes, and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (88 commits) vdpa/mlx5: Fix possible uninitialized return value vdpa_sim_blk: add support for discard and write-zeroes vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests vdpa_sim_blk: check if sector is 0 for commands other than read or write vdpa_sim: Implement suspend vdpa op vhost-vdpa: uAPI to suspend the device vhost-vdpa: introduce SUSPEND backend feature bit vdpa: Add suspend operation virtio-blk: Avoid use-after-free on suspend/resume virtio_vdpa: support the arg sizes of find_vqs() vhost-vdpa: Call ida_simple_remove() when failed vDPA: fix 'cast to restricted le16' warnings in vdpa.c vDPA: !FEATURES_OK should not block querying device config space vDPA/ifcvf: support userspace to query features and MQ of a management device vDPA/ifcvf: get_config_size should return a value no greater than dev implementation vhost scsi: Allow user to control num virtqueues vhost-scsi: Fix max number of virtqueues vdpa/mlx5: Support different address spaces for control and data vdpa/mlx5: Implement susupend virtqueue callback ...
2022-08-12Merge tag 'for-v6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: "No core patches, only driver updates: - pwr-mlxbf: new reset driver for Mellanox BlueField - at91-reset: SAMA7G5 support - ab8500: continue refurbishing - misc minor fixes" * tag 'for-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (29 commits) power: supply: olpc_battery: Hold the reference returned by of_find_compatible_node power: supply: ab8500: add missing destroy_workqueue in ab8500_charger_bind power: supply: ab8500: Remove flush_scheduled_work() call. power: supply: ab8500_fg: drop duplicated 'is' in comment power: supply: ab8500: Drop external charger leftovers power: supply: ab8500: Add MAINTAINERS entry dt-bindings: power: reset: qcom,pshold: convert to dtschema power: supply: Fix typo in power_supply_check_supplies power: reset: pwr-mlxbf: change rst_pwr_hid and low_pwr_hid from global to local variables power: reset: pwr-mlxbf: add missing include power: reset: at91-reset: add support for SAMA7G5 power: reset: at91-reset: add reset_controller_dev support power: reset: at91-reset: add at91_reset_data power: reset: at91-reset: document structures and enums dt-bindings: reset: add sama7g5 definitions dt-bindings: reset: atmel,at91sam9260-reset: add sama7g5 bindings dt-bindings: reset: convert Atmel/Microchip reset controller to YAML power: reset: pwr-mlxbf: add BlueField SoC power control driver power: supply: ab8500: Exit maintenance if too low voltage power: supply: ab8500: Respect charge_restart_voltage_uv ...
2022-08-12x86/xen: Add support for HVMOP_set_evtchn_upcall_vectorJane Malalane
Implement support for the HVMOP_set_evtchn_upcall_vector hypercall in order to set the per-vCPU event channel vector callback on Linux and use it in preference of HVM_PARAM_CALLBACK_IRQ. If the per-VCPU vector setup is successful on BSP, use this method for the APs. If not, fallback to the global vector-type callback. Also register callback_irq at per-vCPU event channel setup to trick toolstack to think the domain is enlightened. Suggested-by: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220729070416.23306-1-jane.malalane@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-08-11Merge tag 'net-6.0-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf, can and netfilter. A little larger than usual but it's all fixes, no late features. It's large partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning" * tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits) net: atm: bring back zatm uAPI dpaa2-eth: trace the allocated address instead of page struct net: add missing kdoc for struct genl_multicast_group::flags nfp: fix use-after-free in area_cache_get() MAINTAINERS: use my korg address for mt7601u mlxsw: minimal: Fix deadlock in ports creation bonding: fix reference count leak in balance-alb mode net: usb: qmi_wwan: Add support for Cinterion MV32 bpf: Shut up kern_sys_bpf warning. net/tls: Use RCU API to access tls_ctx->netdev tls: rx: device: don't try to copy too much on detach tls: rx: device: bound the frag walk net_sched: cls_route: remove from list when handle is 0 selftests: forwarding: Fix failing tests with old libnet net: refactor bpf_sk_reuseport_detach() net: fix refcount bug in sk_psock_get (2) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator ...
2022-08-11Merge tag 'acpi-5.20-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These fix up direct references to the fwnode field in struct device and extend ACPI device properties support. Specifics: - Replace direct references to the fwnode field in struct device with dev_fwnode() and device_match_fwnode() (Andy Shevchenko) - Make the ACPI code handling device properties support properties with buffer values (Sakari Ailus)" * tag 'acpi-5.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: property: Fix error handling in acpi_init_properties() ACPI: VIOT: Do not dereference fwnode in struct device ACPI: property: Read buffer properties as integers ACPI: property: Add support for parsing buffer property UUID ACPI: property: Unify integer value reading functions ACPI: property: Switch node property referencing from ifs to a switch ACPI: property: Move property ref argument parsing into a new function ACPI: property: Use acpi_object_type consistently in property ref parsing ACPI: property: Tie data nodes to acpi handles ACPI: property: Return type of acpi_add_nondev_subnodes() should be bool