summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-28blktrace: fix use after free for struct blk_traceblock-5.17-2022-03-04Yu Kuai
When tracing the whole disk, 'dropped' and 'msg' will be created under 'q->debugfs_dir' and 'bt->dir' is NULL, thus blk_trace_free() won't remove those files. What's worse, the following UAF can be triggered because of accessing stale 'dropped' and 'msg': ================================================================== BUG: KASAN: use-after-free in blk_dropped_read+0x89/0x100 Read of size 4 at addr ffff88816912f3d8 by task blktrace/1188 CPU: 27 PID: 1188 Comm: blktrace Not tainted 5.17.0-rc4-next-20220217+ #469 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_address_description.constprop.0.cold+0xab/0x381 ? blk_dropped_read+0x89/0x100 ? blk_dropped_read+0x89/0x100 kasan_report.cold+0x83/0xdf ? blk_dropped_read+0x89/0x100 kasan_check_range+0x140/0x1b0 blk_dropped_read+0x89/0x100 ? blk_create_buf_file_callback+0x20/0x20 ? kmem_cache_free+0xa1/0x500 ? do_sys_openat2+0x258/0x460 full_proxy_read+0x8f/0xc0 vfs_read+0xc6/0x260 ksys_read+0xb9/0x150 ? vfs_write+0x3d0/0x3d0 ? fpregs_assert_state_consistent+0x55/0x60 ? exit_to_user_mode_prepare+0x39/0x1e0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fbc080d92fd Code: ce 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 1 RSP: 002b:00007fbb95ff9cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007fbb95ff9dc0 RCX: 00007fbc080d92fd RDX: 0000000000000100 RSI: 00007fbb95ff9cc0 RDI: 0000000000000045 RBP: 0000000000000045 R08: 0000000000406299 R09: 00000000fffffffd R10: 000000000153afa0 R11: 0000000000000293 R12: 00007fbb780008c0 R13: 00007fbb78000938 R14: 0000000000608b30 R15: 00007fbb780029c8 </TASK> Allocated by task 1050: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 do_blk_trace_setup+0xcb/0x410 __blk_trace_setup+0xac/0x130 blk_trace_ioctl+0xe9/0x1c0 blkdev_ioctl+0xf1/0x390 __x64_sys_ioctl+0xa5/0xe0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 1050: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0x103/0x180 kfree+0x9a/0x4c0 __blk_trace_remove+0x53/0x70 blk_trace_ioctl+0x199/0x1c0 blkdev_common_ioctl+0x5e9/0xb30 blkdev_ioctl+0x1a5/0x390 __x64_sys_ioctl+0xa5/0xe0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88816912f380 which belongs to the cache kmalloc-96 of size 96 The buggy address is located 88 bytes inside of 96-byte region [ffff88816912f380, ffff88816912f3e0) The buggy address belongs to the page: page:000000009a1b4e7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0f flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0000200 ffffea00044f1100 dead000000000002 ffff88810004c780 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88816912f280: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff88816912f300: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc >ffff88816912f380: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff88816912f400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff88816912f480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ================================================================== Fixes: c0ea57608b69 ("blktrace: remove debugfs file dentries from struct blk_trace") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220228034354.4047385-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-24Merge tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme into block-5.17block-5.17-2022-02-24Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.17 - send H2CData PDUs based on MAXH2CDATA (Varun Prakash) - fix passthrough to namespaces with unsupported features (me)" * tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme: nvme-tcp: send H2CData PDUs based on MAXH2CDATA nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info nvme: don't return an error from nvme_configure_metadata
2022-02-23nvme-tcp: send H2CData PDUs based on MAXH2CDATAVarun Prakash
As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3) Maximum Host to Controller Data length (MAXH2CDATA): Specifies the maximum number of PDU-Data bytes per H2CData PDU in bytes. This value is a multiple of dwords and should be no less than 4,096. Current code sets H2CData PDU data_length to r2t_length, it does not check MAXH2CDATA value. Fix this by setting H2CData PDU data_length to min(req->h2cdata_left, queue->maxh2cdata). Also validate MAXH2CDATA value returned by target in ICResp PDU, if it is not a multiple of dword or if it is less than 4096 return -EINVAL from nvme_tcp_init_connection(). Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-23nvme: also mark passthrough-only namespaces ready in nvme_update_ns_infoChristoph Hellwig
Commit e7d65803e2bb ("nvme-multipath: revalidate paths during rescan") introduced the NVME_NS_READY flag, which nvme_path_is_disabled() uses to check if a path can be used or not. We also need to set this flag for devices that fail the ZNS feature validation and which are available through passthrough devices only to that they can be used in multipathing setups. Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan") Reported-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Tested-by: Kanchan Joshi <joshi.k@samsung.com>
2022-02-23nvme: don't return an error from nvme_configure_metadataChristoph Hellwig
When a fabrics controller claims to support an invalidate metadata configuration we already warn and disable metadata support. No need to also return an error during revalidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Tested-by: Kanchan Joshi <joshi.k@samsung.com>
2022-02-22block: clear iocb->private in blkdev_bio_end_io_async()block-5.17-2022-02-23Stefano Garzarella
iocb_bio_iopoll() expects iocb->private to be cleared before releasing the bio. We already do this in blkdev_bio_end_io(), but we forgot in the recently added blkdev_bio_end_io_async(). Fixes: 54a88eb838d3 ("block: add single bio async direct IO helper") Cc: asml.silence@gmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220211090136.44471-1-sgarzare@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-17block/wbt: fix negative inflight counter when remove scsi deviceblock-5.17-2022-02-17Laibin Qiu
Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in wbt_disable_default() when switch elevator to bfq. And when we remove scsi device, wbt will be enabled by wbt_enable_default. If it become false positive between wbt_wait() and wbt_track() when submit write request. The following is the scenario that triggered the problem. T1 T2 T3 elevator_switch_mq bfq_init_queue wbt_disable_default <= Set rwb->enable_state (OFF) Submit_bio blk_mq_make_request rq_qos_throttle <= rwb->enable_state (OFF) scsi_remove_device sd_remove del_gendisk blk_unregister_queue elv_unregister_queue wbt_enable_default <= Set rwb->enable_state (ON) q_qos_track <= rwb->enable_state (ON) ^^^^^^ this request will mark WBT_TRACKED without inflight add and will lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung. Fix this by move wbt_enable_default() from elv_unregister to bfq_exit_queue(). Only re-enable wbt when bfq exit. Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly") Remove oneline stale comment, and kill one oneshot local variable. Signed-off-by: Ming Lei <ming.lei@rehdat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-block/20211214133103.551813-1-qiulaibin@huawei.com/ Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-17block: fix surprise removal for drivers calling blk_set_queue_dyingChristoph Hellwig
Various block drivers call blk_set_queue_dying to mark a disk as dead due to surprise removal events, but since commit 8e141f9eb803 that doesn't work given that the GD_DEAD flag needs to be set to stop I/O. Replace the driver calls to blk_set_queue_dying with a new (and properly documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the only remaining caller. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-17block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kernHaimin Zhang
Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize the buffer of a bio. Signed-off-by: Haimin Zhang <tcs.kernel@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220216084038.15635-1-tcs.kernel@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11block: loop:use kstatfs.f_bsize of backing file to set discard granularityMing Lei
If backing file's filesystem has implemented ->fallocate(), we think the loop device can support discard, then pass sb->s_blocksize as discard_granularity. However, some underlying FS, such as overlayfs, doesn't set sb->s_blocksize, and causes discard_granularity to be set as zero, then the warning in __blkdev_issue_discard() is triggered. Christoph suggested to pass kstatfs.f_bsize as discard granularity, and this way is fine because kstatfs.f_bsize means 'Optimal transfer block size', which still matches with definition of discard granularity. So fix the issue by setting discard_granularity as kstatfs.f_bsize if it is available, otherwise claims discard isn't supported. Cc: Christoph Hellwig <hch@lst.de> Cc: Vivek Goyal <vgoyal@redhat.com> Reported-by: Pei Zhang <pezhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220126035830.296465-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11block: Add handling for zone append command in blk_complete_requestPankaj Raghav
Zone append command needs special handling to update the bi_sector field in the bio struct with the actual position of the data in the device. It is stored in __sector field of the request struct. Fixes: 5581a5ddfe8d ("block: add completion handler for fast path") Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Adam Manzanares <a.manzanares@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220211093425.43262-2-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11loop: revert "make autoclear operation asynchronous"block-5.17-2022-02-11Tetsuo Handa
The kernel test robot is reporting that xfstest which does umount ext2 on xfs umount xfs sequence started failing, for commit 322c4293ecc58110 ("loop: make autoclear operation asynchronous") removed a guarantee that fput() of backing file is processed before lo_release() from close() returns to user mode. And syzbot is reporting that deferring destroy_workqueue() from __loop_clr_fd() to a WQ context did not help [1]. Revert that commit. Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1] Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reported-by: syzbot <syzbot+831661966588c802aae9@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20220211071554.3424-1-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-10Merge tag 'nvme-5.17-2022-02-10' of git://git.infradead.org/nvme into block-5.17Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.17 - nvme-tcp: fix bogus request completion when failing to send AER (Sagi Grimberg) - add the missing nvme_complete_req tracepoint for batched completion (Bean Huo)" * tag 'nvme-5.17-2022-02-10' of git://git.infradead.org/nvme: nvme-tcp: fix bogus request completion when failing to send AER nvme: add nvme_complete_req tracepoint for batched completion
2022-02-09nvme-tcp: fix bogus request completion when failing to send AERSagi Grimberg
AER is not backed by a real request, hence we should not incorrectly assume that when failing to send a nvme command, it is a normal request but rather check if this is an aer and if so complete the aer (similar to the normal completion path). Cc: stable@vger.kernel.org Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-09nvme: add nvme_complete_req tracepoint for batched completionBean Huo
Add NVMe request completion trace in nvme_complete_batch_req() because nvme:nvme_complete_req tracepoint is missing in case of request batched completion. Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-03block: bio-integrity: Advance seed correctly for larger interval sizesblock-5.17-2022-02-04Martin K. Petersen
Commit 309a62fa3a9e ("bio-integrity: bio_integrity_advance must update integrity seed") added code to update the integrity seed value when advancing a bio. However, it failed to take into account that the integrity interval might be larger than the 512-byte block layer sector size. This broke bio splitting on PI devices with 4KB logical blocks. The seed value should be advanced by bio_integrity_intervals() and not the number of sectors. Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org Fixes: 309a62fa3a9e ("bio-integrity: bio_integrity_advance must update integrity seed") Tested-by: Dmitry Ivanov <dmitry.ivanov2@hpe.com> Reported-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220204034209.4193-1-martin.petersen@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-03Merge tag 'nvme-5.17-2022-02-03' of git://git.infradead.org/nvme into block-5.17Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.17 - fix a use-after-free in rdm and tcp controller reset (Sagi Grimberg) - fix the state check in nvmf_ctlr_matches_baseopts (Uday Shankar)" * tag 'nvme-5.17-2022-02-03' of git://git.infradead.org/nvme: nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts() nvme-rdma: fix possible use-after-free in transport error_recovery work nvme-tcp: fix possible use-after-free in transport error_recovery work nvme: fix a possible use-after-free in controller reset during load
2022-02-03Merge branch 'md-fixes' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.17 Pull MD fix from Song: "Please consider pulling the following fix on top of your block-5.17 branch. It fixes a NULL ptr deref case with nowait." * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix NULL pointer deref with nowait but no mddev->queue
2022-02-03nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()Uday Shankar
Controller deletion/reset, immediately followed by or concurrent with a reconnect, is hard failing the connect attempt resulting in a complete loss of connectivity to the controller. In the connect request, fabrics looks for an existing controller with the same address components and aborts the connect if a controller already exists and the duplicate connect option isn't set. The match routine filters out controllers that are dead or dying, so they don't interfere with the new connect request. When NVME_CTRL_DELETING_NOIO was added, it missed updating the state filters in the nvmf_ctlr_matches_baseopts() routine. Thus, when in this new state, it's seen as a live controller and fails the connect request. Correct by adding the DELETING_NIO state to the match checks. Fixes: ecca390e8056 ("nvme: fix deadlock in disconnect during scan_work and/or ana_work") Cc: <stable@vger.kernel.org> # v5.7+ Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-02md: fix NULL pointer deref with nowait but no mddev->queueSong Liu
Leon reported NULL pointer deref with nowait support: [ 15.123761] device-mapper: raid: Loading target version 1.15.1 [ 15.124185] device-mapper: raid: Ignoring chunk size parameter for RAID 1 [ 15.124192] device-mapper: raid: Choosing default region size of 4MiB [ 15.129524] BUG: kernel NULL pointer dereference, address: 0000000000000060 [ 15.129530] #PF: supervisor write access in kernel mode [ 15.129533] #PF: error_code(0x0002) - not-present page [ 15.129535] PGD 0 P4D 0 [ 15.129538] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 15.129541] CPU: 5 PID: 494 Comm: ldmtool Not tainted 5.17.0-rc2-1-mainline #1 9fe89d43dfcb215d2731e6f8851740520778615e [ 15.129546] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F36e 10/14/2021 [ 15.129549] RIP: 0010:blk_queue_flag_set+0x7/0x20 [ 15.129555] Code: 00 00 00 0f 1f 44 00 00 48 8b 35 e4 e0 04 02 48 8d 57 28 bf 40 01 \ 00 00 e9 16 c1 be ff 66 0f 1f 44 00 00 0f 1f 44 00 00 89 ff <f0> 48 0f ab 7e 60 \ 31 f6 89 f7 c3 66 66 2e 0f 1f 84 00 00 00 00 00 [ 15.129559] RSP: 0018:ffff966b81987a88 EFLAGS: 00010202 [ 15.129562] RAX: ffff8b11c363a0d0 RBX: ffff8b11e294b070 RCX: 0000000000000000 [ 15.129564] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001d [ 15.129566] RBP: ffff8b11e294b058 R08: 0000000000000000 R09: 0000000000000000 [ 15.129568] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b11e294b070 [ 15.129570] R13: 0000000000000000 R14: ffff8b11e294b000 R15: 0000000000000001 [ 15.129572] FS: 00007fa96e826780(0000) GS:ffff8b18deb40000(0000) knlGS:0000000000000000 [ 15.129575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.129577] CR2: 0000000000000060 CR3: 000000010b8ce000 CR4: 00000000003506e0 [ 15.129580] Call Trace: [ 15.129582] <TASK> [ 15.129584] md_run+0x67c/0xc70 [md_mod 1e470c1b6bcf1114198109f42682f5a2740e9531] [ 15.129597] raid_ctr+0x134a/0x28ea [dm_raid 6a645dd7519e72834bd7e98c23497eeade14cd63] [ 15.129604] ? dm_split_args+0x63/0x150 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129615] dm_table_add_target+0x188/0x380 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129625] table_load+0x13b/0x370 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129635] ? dev_suspend+0x2d0/0x2d0 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129644] ctl_ioctl+0x1bd/0x460 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129655] dm_ctl_ioctl+0xa/0x20 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129663] __x64_sys_ioctl+0x8e/0xd0 [ 15.129667] do_syscall_64+0x5c/0x90 [ 15.129672] ? syscall_exit_to_user_mode+0x23/0x50 [ 15.129675] ? do_syscall_64+0x69/0x90 [ 15.129677] ? do_syscall_64+0x69/0x90 [ 15.129679] ? syscall_exit_to_user_mode+0x23/0x50 [ 15.129682] ? do_syscall_64+0x69/0x90 [ 15.129684] ? do_syscall_64+0x69/0x90 [ 15.129686] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 15.129689] RIP: 0033:0x7fa96ecd559b [ 15.129692] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c \ c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff \ ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48 [ 15.129696] RSP: 002b:00007ffcaf85c258 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 15.129699] RAX: ffffffffffffffda RBX: 00007fa96f1b48f0 RCX: 00007fa96ecd559b [ 15.129701] RDX: 00007fa97017e610 RSI: 00000000c138fd09 RDI: 0000000000000003 [ 15.129702] RBP: 00007fa96ebab583 R08: 00007fa97017c9e0 R09: 00007ffcaf85bf27 [ 15.129704] R10: 0000000000000001 R11: 0000000000000206 R12: 00007fa97017e610 [ 15.129706] R13: 00007fa97017e640 R14: 00007fa97017e6c0 R15: 00007fa97017e530 [ 15.129709] </TASK> This is caused by missing mddev->queue check for setting QUEUE_FLAG_NOWAIT Fix this by moving the QUEUE_FLAG_NOWAIT logic to under mddev->queue check. Fixes: f51d46d0e7cb ("md: add support for REQ_NOWAIT") Reported-by: Leon Möller <jkhsjdhjs@totally.rip> Tested-by: Leon Möller <jkhsjdhjs@totally.rip> Cc: Vishal Verma <vverma@digitalocean.com> Signed-off-by: Song Liu <song@kernel.org>
2022-02-02block: fix DIO handling regressions in blkdev_read_iter()Ilya Dryomov
Commit ceaa762527f4 ("block: move direct_IO into our own read_iter handler") introduced several regressions for bdev DIO: 1. read spanning EOF always returns 0 instead of the number of bytes read. This is because "count" is assigned early and isn't updated when the iterator is truncated: $ lsblk -o name,size /dev/vdb NAME SIZE vdb 1G $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb read 0/4194304 bytes at offset 1070596096 0.000000 bytes, 0 ops; 0.0007 sec (0.000000 bytes/sec and 0.0000 ops/sec) instead of $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb read 3145728/4194304 bytes at offset 1070596096 3 MiB, 1 ops; 0.0007 sec (3.865 GiB/sec and 1319.2612 ops/sec) 2. truncated iterator isn't reexpanded 3. iterator isn't reverted on blkdev_direct_IO() error 4. zero size read no longer skips atime update Fixes: ceaa762527f4 ("block: move direct_IO into our own read_iter handler") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220201100420.25875-1-idryomov@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02nvme-rdma: fix possible use-after-free in transport error_recovery workSagi Grimberg
While nvme_rdma_submit_async_event_work is checking the ctrl and queue state before preparing the AER command and scheduling io_work, in order to fully prevent a race where this check is not reliable the error recovery work must flush async_event_work before continuing to destroy the admin queue after setting the ctrl state to RESETTING such that there is no race .submit_async_event and the error recovery handler itself changing the ctrl state. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2022-02-02nvme-tcp: fix possible use-after-free in transport error_recovery workSagi Grimberg
While nvme_tcp_submit_async_event_work is checking the ctrl and queue state before preparing the AER command and scheduling io_work, in order to fully prevent a race where this check is not reliable the error recovery work must flush async_event_work before continuing to destroy the admin queue after setting the ctrl state to RESETTING such that there is no race .submit_async_event and the error recovery handler itself changing the ctrl state. Tested-by: Chris Leech <cleech@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2022-02-02nvme: fix a possible use-after-free in controller reset during loadSagi Grimberg
Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl readiness for AER submission. This may lead to a use-after-free condition that was observed with nvme-tcp. The race condition may happen in the following scenario: 1. driver executes its reset_ctrl_work 2. -> nvme_stop_ctrl - flushes ctrl async_event_work 3. ctrl sends AEN which is received by the host, which in turn schedules AEN handling 4. teardown admin queue (which releases the queue socket) 5. AEN processed, submits another AER, calling the driver to submit 6. driver attempts to send the cmd ==> use-after-free In order to fix that, add ctrl state check to validate the ctrl is actually able to accept the AER submission. This addresses the above race in controller resets because the driver during teardown should: 1. change ctrl state to RESETTING 2. flush async_event_work (as well as other async work elements) So after 1,2, any other AER command will find the ctrl state to be RESETTING and bail out without submitting the AER. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2022-01-28dm: properly fix redundant bio-based IO accountingblock-5.17-2022-01-28Mike Snitzer
Record the start_time for a bio but defer the starting block core's IO accounting until after IO is submitted using bio_start_io_acct_time(). This approach avoids the need to mess around with any of the individual IO stats in response to a bio_split() that follows bio submission. Reported-by: Bud Brown <bubrown@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Depends-on: e45c47d1f94e ("block: add bio_start_io_acct_time() to control start_time") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-4-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-28dm: revert partial fix for redundant bio-based IO accountingMike Snitzer
Reverts a1e1cb72d9649 ("dm: fix redundant IO accounting for bios that need splitting") because it was too narrow in scope (only addressed redundant 'sectors[]' accounting and not ios, nsecs[], etc). Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-28block: add bio_start_io_acct_time() to control start_timeMike Snitzer
bio_start_io_acct_time() interface is like bio_start_io_acct() that allows start_time to be passed in. This gives drivers the ability to defer starting accounting until after IO is issued (but possibily not entirely due to bio splitting). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-27blk-mq: Fix wrong wakeup batch configuration which will cause hangLaibin Qiu
Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened") will recalculate wake_batch when incrementing or decrementing active_queues to avoid wake_batch > hctx_max_depth. At the same time, in order to not affect performance as much as possible, the minimum wakeup batch is set to 4. But when the QD is small (such as QD=1), if inc or dec active_queues increases wakeup batch, that can lead to a hang: Fix this problem with the following strategies: QD : >= 32 | < 32 --------------------------------- wakeup batch: 8~4 | 3~1 Fixes: 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened") Link: https://lore.kernel.org/linux-block/78cafe94-a787-e006-8851-69906f0c2128@huawei.com/T/#t Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Link: https://lore.kernel.org/r/20220127100047.1763746-1-qiulaibin@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-27Merge tag 'nvme-5.17-2022-01-27' of git://git.infradead.org/nvme into block-5.17Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.17 - add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs (Wu Zheng) - remove the unneeded ret variable in nvmf_dev_show (Changcheng Deng)" * tag 'nvme-5.17-2022-01-27' of git://git.infradead.org/nvme: nvme-fabrics: remove the unneeded ret variable in nvmf_dev_show nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
2022-01-27nvme-fabrics: remove the unneeded ret variable in nvmf_dev_showChangcheng Deng
Remove unneeded variable and directly return 0. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-01-27nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDsWu Zheng
The Intel P4500/P4600 SSDs do not report a subsystem NQN despite claiming compliance to a standards version where reporting one is required. Add the IGNORE_DEV_SUBNQN quirk to not fail the initialization of a second such SSDs in a system. Signed-off-by: Zheng Wu <wu.zheng@intel.com> Signed-off-by: Ye Jinhe <jinhe.ye@intel.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-01-26blk-mq: fix missing blk_account_io_done() in error pathYu Kuai
If blk_mq_request_issue_directly() failed from blk_insert_cloned_request(), the request will be accounted start. Currently, blk_insert_cloned_request() is only called by dm, and such request won't be accounted done by dm. In normal path, io will be accounted start from blk_mq_bio_to_request(), when the request is allocated, and such io will be accounted done from __blk_mq_end_request_acct() whether it succeeded or failed. Thus add blk_account_io_done() to fix the problem. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220126012132.3111551-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-23block: fix memory leak in disk_register_independent_access_rangesMiaoqian Lin
kobject_init_and_add() takes reference even when it fails. According to the doc of kobject_init_and_add() If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Fix this issue by adding kobject_put(). Callback function blk_ia_ranges_sysfs_release() in kobject_put() can handle the pointer "iars" properly. Fixes: a2247f19ee1c ("block: Add independent access ranges support") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220120101025.22411-1-linmq006@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-23Merge tag 'powerpc-5.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - A series of bpf fixes, including an oops fix and some codegen fixes. - Fix a regression in syscall_get_arch() for compat processes. - Fix boot failure on some 32-bit systems with KASAN enabled. - A couple of other build/minor fixes. Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa, Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin. * tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Mask SRR0 before checking against the masked NIP powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64 powerpc/32s: Fix kasan_init_region() for KASAN powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32 powerpc/audit: Fix syscall_get_arch() powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06 tools/bpf: Rename 'struct event' to avoid naming conflict powerpc/bpf: Update ldimm64 instructions during extra pass powerpc32/bpf: Fix codegen for bpf-to-bpf calls bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
2022-01-23Merge tag 'irq_urgent_for_v5.17_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Borislav Petkov: "A single use-after-free fix in the PCI MSI irq domain allocation path" * tag 'irq_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Prevent UAF in error path
2022-01-23Merge tag 'sched_urgent_for_v5.17_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: "A bunch of fixes: forced idle time accounting, utilization values propagation in the sched hierarchies and other minor cleanups and improvements" * tag 'sched_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/sched: Remove dl_boosted flag comment sched: Avoid double preemption in __cond_resched_*lock*() sched/fair: Fix all kernel-doc warnings sched/core: Accounting forceidle time for all tasks except idle task sched/pelt: Relax the sync of load_sum with load_avg sched/pelt: Relax the sync of runnable_sum with runnable_avg sched/pelt: Continue to relax the sync of util_sum with util_avg sched/pelt: Relax the sync of util_sum with util_avg psi: Fix uaf issue when psi trigger is destroyed while being polled
2022-01-23Merge tag 'perf_urgent_for_v5.17_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Add support for accessing the general purpose counters on Alder Lake via MMIO - Add new LBR format v7 support which is v5 modulo TSX - Fix counter enumeration on Alder Lake hybrids - Overhaul how context time updates are done and get rid of perf_event::shadow_ctx_time. - The usual amount of fixes: event mask correction, supported event types reporting, etc. * tag 'perf_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/perf: Avoid warning for Arch LBR without XSAVE perf/x86/intel/uncore: Add IMC uncore support for ADL perf/x86/intel/lbr: Add static_branch for LBR INFO flags perf/x86/intel/lbr: Support LBR format V7 perf/x86/rapl: fix AMD event handling perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX perf/x86/intel: Add a quirk for the calculation of the number of counters on Alder Lake perf: Fix perf_event_read_local() time
2022-01-23Linux 5.17-rc1v5.17-rc1Linus Torvalds
2022-01-23Merge tag 'perf-tools-for-v5.17-2022-01-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools updates from Arnaldo Carvalho de Melo: - Fix printing 'phys_addr' in 'perf script'. - Fix failure to add events with 'perf probe' in ppc64 due to not removing leading dot (ppc64 ABIv1). - Fix cpu_map__item() python binding building. - Support event alias in form foo-bar-baz, add pmu-events and parse-event tests for it. - No need to setup affinities when starting a workload or attaching to a pid. - Use path__join() to compose a path instead of ad-hoc snprintf() equivalent. - Override attr->sample_period for non-libpfm4 events. - Use libperf cpumap APIs instead of accessing the internal state directly. - Sync x86 arch prctl headers and files changed by the new set_mempolicy_home_node syscall with the kernel sources. - Remove duplicate include in cpumap.h. - Remove redundant err variable. * tag 'perf-tools-for-v5.17-2022-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf tools: Remove redundant err variable perf test: Add parse-events test for aliases with hyphens perf test: Add pmu-events test for aliases with hyphens perf parse-events: Support event alias in form foo-bar-baz perf evsel: Override attr->sample_period for non-libpfm4 events perf cpumap: Remove duplicate include in cpumap.h perf cpumap: Migrate to libperf cpumap api perf python: Fix cpu_map__item() building perf script: Fix printing 'phys_addr' failure issue tools headers UAPI: Sync files changed by new set_mempolicy_home_node syscall tools headers UAPI: Sync x86 arch prctl headers with the kernel sources perf machine: Use path__join() to compose a path instead of snprintf(dir, '/', filename) perf evlist: No need to setup affinities when disabling events for pid targets perf evlist: No need to setup affinities when enabling events for pid targets perf stat: No need to setup affinities when starting a workload perf affinity: Allow passing a NULL arg to affinity__cleanup() perf probe: Fix ppc64 'perf probe add events failed' case
2022-01-23Merge tag 'trace-v5.17-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Fix s390 breakage from sorting mcount tables. The latest merge of the tracing tree sorts the mcount table at build time. But s390 appears to do things differently (like always) and replaces the sorted table back to the original unsorted one. As the ftrace algorithm depends on it being sorted, bad things happen when it is not, and s390 experienced those bad things. Add a new config to tell the boot if the mcount table is sorted or not, and allow s390 to opt out of it" * tag 'trace-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix assuming build time sort works for s390
2022-01-23ftrace: Fix assuming build time sort works for s390Steven Rostedt (Google)
To speed up the boot process, as mcount_loc needs to be sorted for ftrace to work properly, sorting it at build time is more efficient than boot up and can save milliseconds of time. Unfortunately, this change broke s390 as it will modify the mcount_loc location after the sorting takes place and will put back the unsorted locations. Since the sorting is skipped at boot up if it is believed that it was sorted at run time, ftrace can crash as its algorithms are dependent on the list being sorted. Add a new config BUILDTIME_MCOUNT_SORT that is set when BUILDTIME_TABLE_SORT but not if S390 is set. Use this config to determine if sorting should take place at boot up. Link: https://lore.kernel.org/all/yt9dee51ctfn.fsf@linux.ibm.com/ Fixes: 72b3942a173c ("scripts: ftrace - move the sort-processing in ftrace_init") Reported-by: Sven Schnelle <svens@linux.ibm.com> Tested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-23Merge tag 'kbuild-fixes-v5.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Bring include/uapi/linux/nfc.h into the UAPI compile-test coverage - Revert the workaround of CONFIG_CC_IMPLICIT_FALLTHROUGH - Fix build errors in certs/Makefile * tag 'kbuild-fixes-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: certs: Fix build error when CONFIG_MODULE_SIG_KEY is empty certs: Fix build error when CONFIG_MODULE_SIG_KEY is PKCS#11 URI Revert "Makefile: Do not quote value for CONFIG_CC_IMPLICIT_FALLTHROUGH" usr/include/Makefile: add linux/nfc.h to the compile-test coverage
2022-01-23Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first_*_bit() instead of find_next_*_bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
2022-01-22perf tools: Remove redundant err variableMinghao Chi
Return value from perf_event__process_tracing_data() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20220112080109.666800-1-chi.minghao@zte.com.cn Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf test: Add parse-events test for aliases with hyphensJohn Garry
Add a test which allows us to test parsing an event alias with hyphens. Since these events typically do not exist on most host systems, add the alias to the fake pmu. Function perf_pmu__test_parse_init() has terms added to match known test aliases. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi115@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1642432215-234089-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf test: Add pmu-events test for aliases with hyphensJohn Garry
Add a test for aliases with hyphens in the name to ensure that the pmu-events tables are as expects. There should be no reason why these sort of aliases would be treated differently, but no harm in checking. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi115@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1642432215-234089-3-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf parse-events: Support event alias in form foo-bar-bazJohn Garry
Event aliasing for events whose name in the form foo-bar-baz is not supported, while foo-bar, foo_bar_baz, and other combinations are, i.e. two hyphens are not supported. The HiSilicon D06 platform has events in such form: $ ./perf list sdir-home-migrate List of pre-defined events (to be used in -e): uncore hha: sdir-home-migrate [Unit: hisi_sccl,hha] $ sudo ./perf stat -e sdir-home-migrate event syntax error: 'sdir-home-migrate' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event>event selector. use 'perf list' to list available events To support, add an extra PMU event symbol type for "baz", and add a new rule in the bison file. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi115@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1642432215-234089-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf evsel: Override attr->sample_period for non-libpfm4 eventsGerman Gomez
A previous patch preventing "attr->sample_period" values from being overridden in pfm events changed a related behaviour in arm-spe. Before said patch: perf record -c 10000 -e arm_spe_0// -- sleep 1 Would yield an SPE event with period=10000. After the patch, the period in "-c 10000" was being ignored because the arm-spe code initializes sample_period to a non-zero value. This patch restores the previous behaviour for non-libpfm4 events. Fixes: ae5dcc8abe31 (“perf record: Prevent override of attr->sample_period for libpfm4 events”) Reported-by: Chase Conklin <chase.conklin@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220118144054.2541-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf cpumap: Remove duplicate include in cpumap.hLv Ruyi
Remove all but the first include of stdbool.h from cpumap.h. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117083730.863200-1-lv.ruyi@zte.com.cn Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf cpumap: Migrate to libperf cpumap apiIan Rogers
Switch from directly accessing the perf_cpu_map to using the appropriate libperf API when possible. Using the API simplifies the job of refactoring use of perf_cpu_map. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: http://lore.kernel.org/lkml/20220122045811.3402706-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>