summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2020-07-02nvme: fix a crash in nvme_mpath_add_diskChristoph Hellwig
For private namespaces ns->head_disk is NULL, so add a NULL check before updating the BDI capabilities. Fixes: b2ce4d90690b ("nvme-multipath: set bdi capabilities once") Reported-by: Avinash M N <Avinash.M.N@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
2020-07-02nvme: fix identify error status silent ignoreSagi Grimberg
Commit 59c7c3caaaf8 intended to only silently ignore non retry-able errors (DNR bit set) such that we can still identify misbehaving controllers, and in the other hand propagate retry-able errors (DNR bit cleared) so we don't wrongly abandon a namespace just because it happens to be temporarily inaccessible. The goal remains the same as the original commit where this was introduced but unfortunately had the logic backwards. Fixes: 59c7c3caaaf8 ("nvme: fix possible hang when ns scanning fails during error recovery") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-30virtio-blk: free vblk-vqs in error path of virtblk_probe()block-5.8-2020-07-01Hou Tao
Else there will be memory leak if alloc_disk() fails. Fixes: 6a27b656fc02 ("block: virtio-blk: support multi virt queues per virtio-blk device") Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24nvme-multipath: fix bogus request queue reference putSagi Grimberg
The mpath disk node takes a reference on the request mpath request queue when adding live path to the mpath gendisk. However if we connected to an inaccessible path device_add_disk is not called, so if we disconnect and remove the mpath gendisk we endup putting an reference on the request queue that was never taken [1]. Fix that to check if we ever added a live path (using NVME_NS_HEAD_HAS_DISK flag) and if not, clear the disk->queue reference. [1]: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 1372 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 CPU: 1 PID: 1372 Comm: nvme Tainted: G O 5.7.0-rc2+ #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb29e8053bdc0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8b7a2f4fc060 RCX: 0000000000000007 RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8b7a3ec99980 RBP: ffff8b7a2f4fc000 R08: 00000000000002e1 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: fffffffffffffff2 R14: ffffb29e8053bf08 R15: ffff8b7a320e2da0 FS: 00007f135d4ca800(0000) GS:ffff8b7a3ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005651178c0c30 CR3: 000000003b650005 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: disk_release+0xa2/0xc0 device_release+0x28/0x80 kobject_put+0xa5/0x1b0 nvme_put_ns_head+0x26/0x70 [nvme_core] nvme_put_ns+0x30/0x60 [nvme_core] nvme_remove_namespaces+0x9b/0xe0 [nvme_core] nvme_do_delete_ctrl+0x43/0x5c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write+0xc1/0x1a0 vfs_write+0xb6/0x1a0 ksys_write+0x5f/0xe0 do_syscall_64+0x52/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: Anton Eidelman <anton@lightbitslabs.com> Tested-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-multipath: fix deadlock due to head->lockAnton Eidelman
In the following scenario scan_work and ana_work will deadlock: When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. Since nvme_mpath_set_live() holds ns->head->lock, an ana_work on ANY ctrl will not be able to complete nvme_mpath_set_live() on the same ns->head, which is required in order to update the new accessible path and remove NVME_NS_ANA_PENDING.. Therefore IO never completes: deadlock [1]. Fix: Move device_add_disk out of the head->lock and protect it with an atomic test_and_set for a new NVME_NS_HEAD_HAS_DISK bit. [1]: kernel: INFO: task kworker/u8:2:160 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:2 D 0 160 2 0x80004000 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_update_ns_ana_state+0x22/0x60 [nvme_core] kernel: nvme_update_ana_state+0xca/0xe0 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_read_ana_log+0x76/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 kernel: INFO: task kworker/u8:4:439 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:4 D 0 439 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0xbe/0x100 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x256/0x390 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme: don't protect ns mutation with ns->head->lockSagi Grimberg
Right now ns->head->lock is protecting namespace mutation which is wrong and unneeded. Move it to only protect against head mutations. While we're at it, remove unnecessary ns->head reference as we already have head pointer. The problem with this is that the head->lock spans mpath disk node I/O that may block under some conditions (if for example the controller is disconnecting or the path became inaccessible), The locking scheme does not allow any other path to enable itself, preventing blocked I/O to complete and forward-progress from there. This is a preparation patch for the fix in a subsequent patch where the disk I/O will also be done outside the head->lock. Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-multipath: fix deadlock between ana_work and scan_workAnton Eidelman
When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. In order to recover and complete the IO ana_work on the same ctrl should be able to update the path state and remove NVME_NS_ANA_PENDING. The deadlock occurs because scan_work keeps holding ana_lock, so ana_work hangs [1]. Fix: Now nvme_mpath_add_disk() uses nvme_parse_ana_log() to obtain a copy of the ANA group desc, and then calls nvme_update_ns_ana_state() without holding ana_lock. [1]: kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? select_task_rq_fair+0x1aa/0x5c0 kernel: ? kvm_sched_clock_read+0x11/0x20 kernel: ? sched_clock+0x9/0x10 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_read_ana_log+0x3a/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x35/0x40 Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme: fix possible deadlock when I/O is blockedSagi Grimberg
Revert fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns") When adding a new namespace to the head disk (via nvme_mpath_set_live) we will see partition scan which triggers I/O on the mpath device node. This process will usually be triggered from the scan_work which holds the scan_lock. If I/O blocks (if we got ana change currently have only available paths but none are accessible) this can deadlock on the head disk bd_mutex as both partition scan I/O takes it, and head disk revalidation takes it to check for resize (also triggered from scan_work on a different path). See trace [1]. The mpath disk revalidation was originally added to detect online disk size change, but this is no longer needed since commit cb224c3af4df ("nvme: Convert to use set_capacity_revalidate_and_notify") which already updates resize info without unnecessarily revalidating the disk (the mpath disk doesn't even implement .revalidate_disk fop). [1]: -- kernel: INFO: task kworker/u65:9:494 blocked for more than 241 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u65:9 D 0 494 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: revalidate_disk+0x63/0xa0 kernel: __nvme_revalidate_disk+0xfe/0x110 [nvme_core] kernel: nvme_revalidate_disk+0xa4/0x160 [nvme_core] kernel: ? evict+0x14c/0x1b0 kernel: revalidate_disk+0x2b/0xa0 kernel: nvme_validate_ns+0x49/0x940 [nvme_core] kernel: ? blk_mq_free_request+0xd2/0x100 kernel: ? __nvme_submit_sync_cmd+0xbe/0x1e0 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x1f/0x40 ... kernel: INFO: task kworker/u65:1:2630 blocked for more than 241 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u65:1 D 0 2630 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? file_fdatawait_range+0x30/0x30 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: ? kmem_cache_alloc_trace+0x19c/0x230 kernel: efi_partition+0x1e6/0x708 kernel: ? vsnprintf+0x39e/0x4e0 kernel: ? snprintf+0x49/0x60 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: ? nvme_update_ns_ana_state+0x60/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: ? blk_mq_free_request+0xd2/0x100 kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x1f/0x40 -- Fixes: fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns") Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-rdma: assign completion vector correctlyMax Gurtovoy
The completion vector index that is given during CQ creation can't exceed the number of support vectors by the underlying RDMA device. This violation currently can accure, for example, in case one will try to connect with N regular read/write queues and M poll queues and the sum of N + M > num_supported_vectors. This will lead to failure in establish a connection to remote target. Instead, in that case, share a completion vector between queues. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-loop: initialize tagset numa value to the value of the ctrlMax Gurtovoy
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-tcp: initialize tagset numa value to the value of the ctrlMax Gurtovoy
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-pci: initialize tagset numa value to the value of the ctrlMax Gurtovoy
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme-pci: override the value of the controller's numa nodeMax Gurtovoy
Set the node value according to the PCI device numa node. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme: set initial value for controller's numa nodeMax Gurtovoy
Initialize the node to NUMA_NO_NODE value. Transports that are aware of numa node affinity can override it (e.g. RDMA transport set the affinity according to the RDMA HCA). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-18loop: replace kill_bdev with invalidate_bdevZheng Bin
When a filesystem is mounted on a loop device and on a loop ioctl LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting destroyed. kill_bdev truncate_inode_pages truncate_inode_pages_range do_invalidatepage block_invalidatepage discard_buffer -->clear BH_Mapped flag sb_bread __bread_gfp bh = __getblk_gfp -->discard_buffer clear BH_Mapped flag __bread_slow submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON Fixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-14bcache: pr_info() format clean up in bcache_device_init()Coly Li
scripts/checkpatch.pl reports following warning for patch ("bcache: check and adjust logical block size for backing devices"), WARNING: quoted string split across lines #146: FILE: drivers/md/bcache/super.c:896: + pr_info("%s: sb/logical block size (%u) greater than page size " + "(%lu) falling back to device logical block size (%u)", There are two things to fix up, - The kernel message print should be in a single line. - pr_info() won't automatically add new line since v5.8, a '\n' should be added. This patch just does the above cleanup in bcache_device_init(). Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-14bcache: use delayed kworker fo asynchronous devices registrationColy Li
This patch changes the asynchronous registration kworker to a delayed kworker. There is probability queue_work() queues the async registration kworker to the same CPU (even though very little), then the process which writing sysfs interface to reigster bcache device may won't return immeidately. queue_delayed_work() in this patch will delay 10 jiffies before insert the kworker to run queue, which makes sure the registering process may always returns to user space in time. Fixes: 9e23ccf8f0a22 ("bcache: asynchronous devices registration") Signed-off-by: Coly Li <colyli@suse.de> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-14bcache: check and adjust logical block size for backing devicesMauricio Faria de Oliveira
It's possible for a block driver to set logical block size to a value greater than page size incorrectly; e.g. bcache takes the value from the superblock, set by the user w/ make-bcache. This causes a BUG/NULL pointer dereference in the path: __blkdev_get() -> set_init_blocksize() // set i_blkbits based on ... -> bdev_logical_block_size() -> queue_logical_block_size() // ... this value -> bdev_disk_changed() ... -> blkdev_readpage() -> block_read_full_page() -> create_page_buffers() // size = 1 << i_blkbits -> create_empty_buffers() // give size/take pointer -> alloc_page_buffers() // return NULL .. BUG! Because alloc_page_buffers() is called with size > PAGE_SIZE, thus it initializes head = NULL, skips the loop, return head; then create_empty_buffers() gets (and uses) the NULL pointer. This has been around longer than commit ad6bf88a6c19 ("block: fix an integer overflow in logical block size"); however, it increased the range of values that can trigger the issue. Previously only 8k/16k/32k (on x86/4k page size) would do it, as greater values overflow unsigned short to zero, and queue_ logical_block_size() would then use the default of 512. Now the range with unsigned int is much larger, and users w/ the 512k value, which happened to be zero'ed previously and work fine, started to hit this issue -- as the zero is gone, and queue_logical_block_size() does return 512k (>PAGE_SIZE.) Fix this by checking the bcache device's logical block size, and if it's greater than page size, fallback to the backing/ cached device's logical page size. This doesn't affect cache devices as those are still checked for block/page size in read_super(); only the backing/cached devices are not. Apparently it's a regression from commit 2903381fce71 ("bcache: Take data offset from the bdev superblock."), moving the check into BCACHE_SB_VERSION_CDEV only. Now that we have superblocks of backing devices out there with this larger value, we cannot refuse to load them (i.e., have a similar check in _BDEV.) Ideally perhaps bcache should use all values from the backing device (physical/logical/io_min block size)? But for now just fix the problematic case. Test-case: # IMG=/root/disk.img # dd if=/dev/zero of=$IMG bs=1 count=0 seek=1G # DEV=$(losetup --find --show $IMG) # make-bcache --bdev $DEV --block 8k < see dmesg > Before: # uname -r 5.7.0-rc7 [ 55.944046] BUG: kernel NULL pointer dereference, address: 0000000000000000 ... [ 55.949742] CPU: 3 PID: 610 Comm: bcache-register Not tainted 5.7.0-rc7 #4 ... [ 55.952281] RIP: 0010:create_empty_buffers+0x1a/0x100 ... [ 55.966434] Call Trace: [ 55.967021] create_page_buffers+0x48/0x50 [ 55.967834] block_read_full_page+0x49/0x380 [ 55.972181] do_read_cache_page+0x494/0x610 [ 55.974780] read_part_sector+0x2d/0xaa [ 55.975558] read_lba+0x10e/0x1e0 [ 55.977904] efi_partition+0x120/0x5a6 [ 55.980227] blk_add_partitions+0x161/0x390 [ 55.982177] bdev_disk_changed+0x61/0xd0 [ 55.982961] __blkdev_get+0x350/0x490 [ 55.983715] __device_add_disk+0x318/0x480 [ 55.984539] bch_cached_dev_run+0xc5/0x270 [ 55.986010] register_bcache.cold+0x122/0x179 [ 55.987628] kernfs_fop_write+0xbc/0x1a0 [ 55.988416] vfs_write+0xb1/0x1a0 [ 55.989134] ksys_write+0x5a/0xd0 [ 55.989825] do_syscall_64+0x43/0x140 [ 55.990563] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 55.991519] RIP: 0033:0x7f7d60ba3154 ... After: # uname -r 5.7.0.bcachelbspgsz [ 31.672460] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512) [ 31.675133] bcache: register_bdev() registered backing device loop0 # grep ^ /sys/block/bcache0/queue/*_block_size /sys/block/bcache0/queue/logical_block_size:512 /sys/block/bcache0/queue/physical_block_size:8192 Reported-by: Ryan Finnie <ryan@finnie.org> Reported-by: Sebastian Marsching <sebastian@marsching.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-14bcache: fix potential deadlock problem in btree_gc_coalesceZhiqiang Liu
coccicheck reports: drivers/md//bcache/btree.c:1538:1-7: preceding lock on line 1417 In btree_gc_coalesce func, if the coalescing process fails, we will goto to out_nocoalesce tag directly without releasing new_nodes[i]->write_lock. Then, it will cause a deadlock when trying to acquire new_nodes[i]-> write_lock for freeing new_nodes[i] before return. btree_gc_coalesce func details as follows: if alloc new_nodes[i] fails: goto out_nocoalesce; // obtain new_nodes[i]->write_lock mutex_lock(&new_nodes[i]->write_lock) // main coalescing process for (i = nodes - 1; i > 0; --i) [snipped] if coalescing process fails: // Here, directly goto out_nocoalesce // tag will cause a deadlock goto out_nocoalesce; [snipped] // release new_nodes[i]->write_lock mutex_unlock(&new_nodes[i]->write_lock) // coalesing succ, return return; out_nocoalesce: btree_node_free(new_nodes[i]) // free new_nodes[i] // obtain new_nodes[i]->write_lock mutex_lock(&new_nodes[i]->write_lock); // set flag for reuse clear_bit(BTREE_NODE_dirty, &ew_nodes[i]->flags); // release new_nodes[i]->write_lock mutex_unlock(&new_nodes[i]->write_lock); To fix the problem, we add a new tag 'out_unlock_nocoalesce' for releasing new_nodes[i]->write_lock before out_nocoalesce tag. If coalescing process fails, we will go to out_unlock_nocoalesce tag for releasing new_nodes[i]->write_lock before free new_nodes[i] in out_nocoalesce tag. (Coly Li helps to clean up commit log format.) Fixes: 2a285686c109816 ("bcache: btree locking rework") Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix cfg80211 deadlock, from Johannes Berg. 2) RXRPC fails to send norigications, from David Howells. 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from Geliang Tang. 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu. 5) The ucc_geth driver needs __netdev_watchdog_up exported, from Valentin Longchamp. 6) Fix hashtable memory leak in dccp, from Wang Hai. 7) Fix how nexthops are marked as FDB nexthops, from David Ahern. 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni. 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien. 10) Fix link speed reporting in iavf driver, from Brett Creeley. 11) When a channel is used for XSK and then reused again later for XSK, we forget to clear out the relevant data structures in mlx5 which causes all kinds of problems. Fix from Maxim Mikityanskiy. 12) Fix memory leak in genetlink, from Cong Wang. 13) Disallow sockmap attachments to UDP sockets, it simply won't work. From Lorenz Bauer. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) net: ethernet: ti: ale: fix allmulti for nu type ale net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init net: atm: Remove the error message according to the atomic context bpf: Undo internal BPF_PROBE_MEM in BPF insns dump libbpf: Support pre-initializing .bss global variables tools/bpftool: Fix skeleton codegen bpf: Fix memlock accounting for sock_hash bpf: sockmap: Don't attach programs to UDP sockets bpf: tcp: Recv() should return 0 when the peer socket is closed ibmvnic: Flush existing work items before device removal genetlink: clean up family attributes allocations net: ipa: header pad field only valid for AP->modem endpoint net: ipa: program upper nibbles of sequencer type net: ipa: fix modem LAN RX endpoint id net: ipa: program metadata mask differently ionic: add pcie_print_link_status rxrpc: Fix race between incoming ACK parser and retransmitter net/mlx5: E-Switch, Fix some error pointer dereferences net/mlx5: Don't fail driver on failure to create debugfs net/mlx5e: CT: Fix ipv6 nat header rewrite actions ...
2020-06-13net: ethernet: ti: ale: fix allmulti for nu type aleGrygorii Strashko
On AM65xx MCU CPSW2G NUSS and 66AK2E/L NUSS allmulti setting does not allow unregistered mcast packets to pass. This happens, because ALE VLAN entries on these SoCs do not contain port masks for reg/unreg mcast packets, but instead store indexes of ALE_VLAN_MASK_MUXx_REG registers which intended for store port masks for reg/unreg mcast packets. This path was missed by commit 9d1f6447274f ("net: ethernet: ti: ale: fix seeing unreg mcast packets with promisc and allmulti disabled"). Hence, fix it by taking into account ALE type in cpsw_ale_set_allmulti(). Fixes: 9d1f6447274f ("net: ethernet: ti: ale: fix seeing unreg mcast packets with promisc and allmulti disabled") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-13net: ethernet: ti: am65-cpsw-nuss: fix ale parameters initGrygorii Strashko
The ALE parameters structure is created on stack, so it has to be reset before passing to cpsw_ale_create() to avoid garbage values. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-13Merge tag 'kbuild-v5.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: treewide: replace '---help---' in Kconfig files with 'help' kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables samples: binderfs: really compile this sample and fix build issues
2020-06-13Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "This is the set of changes collected since just before the merge window opened. It's mostly minor fixes in drivers. The one non-driver set is the three optical disk (sr) changes where two are error path fixes and one is a helper conversion. The big driver change is the hpsa compat_alloc_userspace rework by Al so he can kill the remaining user. This has been tested and acked by the maintainer" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits) scsi: acornscsi: Fix an error handling path in acornscsi_probe() scsi: storvsc: Remove memset before memory freeing in storvsc_suspend() scsi: cxlflash: Remove an unnecessary NULL check scsi: ibmvscsi: Don't send host info in adapter info MAD after LPM scsi: sr: Fix sr_probe() missing deallocate of device minor scsi: sr: Fix sr_probe() missing mutex_destroy scsi: st: Convert convert get_user_pages() --> pin_user_pages() scsi: target: Rename target_setup_cmd_from_cdb() to target_cmd_parse_cdb() scsi: target: Fix NULL pointer dereference scsi: target: Initialize LUN in transport_init_se_cmd() scsi: target: Factor out a new helper, target_cmd_init_cdb() scsi: hpsa: hpsa_ioctl(): Tidy up a bit scsi: hpsa: Get rid of compat_alloc_user_space() scsi: hpsa: Don't bother with vmalloc for BIG_IOCTL_Command_struct scsi: hpsa: Lift {BIG_,}IOCTL_Command_struct copy{in,out} into hpsa_ioctl() scsi: ufs: Remove redundant urgent_bkop_lvl initialization scsi: ufs: Don't update urgent bkops level when toggling auto bkops scsi: qedf: Remove redundant initialization of variable rc scsi: mpt3sas: Fix memset() in non-RDPQ mode scsi: iscsi: Fix reference count leak in iscsi_boot_create_kobj ...
2020-06-13Merge branch 'i2c/for-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "I2C has quite some patches for you this time. I hope it is the move to per-driver-maintainers which is now showing results. We will see. The big news is two new drivers (Nuvoton NPCM and Qualcomm CCI), larger refactoring of the Designware, Tegra, and PXA drivers, the Cadence driver supports being a slave now, and there is support to instanciate SPD eeproms for well-known cases (which will be user-visible because the i801 driver supports it), and some devm_platform_ioremap_resource() conversions which blow up the diffstat. Note that I applied the Nuvoton driver quite late, so some minor fixup patches arrived during the merge window. I chose to apply them right away because they were trivial" * 'i2c/for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (109 commits) i2c: Drop stray comma in MODULE_AUTHOR statements i2c: npcm7xx: npcm_i2caddr[] can be static MAINTAINERS: npcm7xx: Add maintainer for Nuvoton NPCM BMC i2c: npcm7xx: Fix a couple of error codes in probe i2c: icy: Fix build with CONFIG_AMIGA_PCMCIA=n i2c: npcm7xx: Remove unnecessary parentheses i2c: npcm7xx: Add support for slave mode for Nuvoton i2c: npcm7xx: Add Nuvoton NPCM I2C controller driver dt-bindings: i2c: npcm7xx: add NPCM I2C controller i2c: pxa: don't error out if there's no pinctrl i2c: add 'single-master' property to generic bindings i2c: designware: Add Baikal-T1 System I2C support i2c: designware: Move reg-space remapping into a dedicated function i2c: designware: Retrieve quirk flags as early as possible i2c: designware: Convert driver to using regmap API i2c: designware: Discard Cherry Trail model flag i2c: designware: Add Baytrail sem config DW I2C platform dependency i2c: designware: slave: Set DW I2C core module dependency i2c: designware: Use `-y` to build multi-object modules dt-bindings: i2c: dw: Add Baikal-T1 SoC I2C controller ...
2020-06-13Merge tag 'media/v5.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull more media updates from Mauro Carvalho Chehab: - a set of atomisp patches. They remove several abstraction layers, and fixes clang and gcc warnings (that were hidden via some macros that were disabling 4 or 5 types of warnings there). There are also some important fixes and sensor auto-detection on newer BIOSes via ACPI _DCM tables. - some fixes * tag 'media/v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (95 commits) media: rkvdec: Fix H264 scaling list order media: v4l2-ctrls: Unset correct HEVC loop filter flag media: videobuf2-dma-contig: fix bad kfree in vb2_dma_contig_clear_max_seg_size media: v4l2-subdev.rst: correct information about v4l2 events media: s5p-mfc: Properly handle dma_parms for the allocated devices media: medium: cec: Make MEDIA_CEC_SUPPORT default to n if !MEDIA_SUPPORT media: cedrus: Implement runtime PM media: cedrus: Program output format during each run media: atomisp: improve ACPI/DMI detection logs media: Revert "media: atomisp: add Asus Transform T101HA ACPI vars" media: Revert "media: atomisp: Add some ACPI detection info" media: atomisp: improve sensor detection code to use _DSM table media: atomisp: get rid of an iomem abstraction layer media: atomisp: get rid of a string_support.h abstraction layer media: atomisp: use strscpy() instead of less secure variants media: atomisp: set DFS to MAX if sensor doesn't report fps media: atomisp: use different dfs failed messages media: atomisp: change the detection of ISP2401 at runtime media: atomisp: use macros from intel-family.h media: atomisp: don't set hpll_freq twice with different values ...
2020-06-13Merge tag 'libnvdimm-for-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Small collection of cleanups to rework usage of ->queuedata and the GUID api" * tag 'libnvdimm-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/pmem: stop using ->queuedata nvdimm/btt: stop using ->queuedata nvdimm/blk: stop using ->queuedata libnvdimm: Replace guid_copy() with import_guid() where it makes sense
2020-06-13Merge tag 'ras-core-2020-06-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Thomas Gleixner: "RAS updates from Borislav Petkov: - Unmap a whole guest page if an MCE is encountered in it to avoid follow-on MCEs leading to the guest crashing, by Tony Luck. This change collided with the entry changes and the merge resolution would have been rather unpleasant. To avoid that the entry branch was merged in before applying this. The resulting code did not change over the rebase. - AMD MCE error thresholding machinery cleanup and hotplug sanitization, by Thomas Gleixner. - Change the MCE notifiers to denote whether they have handled the error and not break the chain early by returning NOTIFY_STOP, thus giving the opportunity for the later handlers in the chain to see it. By Tony Luck. - Add AMD family 0x17, models 0x60-6f support, by Alexander Monakov. - Last but not least, the usual round of fixes and improvements" * tag 'ras-core-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/mce/dev-mcelog: Fix -Wstringop-truncation warning about strncpy() x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned EDAC/amd64: Add AMD family 17h model 60h PCI IDs hwmon: (k10temp) Add AMD family 17h model 60h PCI match x86/amd_nb: Add AMD family 17h model 60h PCI IDs x86/mcelog: Add compat_ioctl for 32-bit mcelog support x86/mce: Drop bogus comment about mce.kflags x86/mce: Fixup exception only for the correct MCEs EDAC: Drop the EDAC report status checks x86/mce: Add mce=print_all option x86/mce: Change default MCE logger to check mce->kflags x86/mce: Fix all mce notifiers to update the mce->kflags bitmask x86/mce: Add a struct mce.kflags field x86/mce: Convert the CEC to use the MCE notifier x86/mce: Rename "first" function as "early" x86/mce/amd, edac: Remove report_gart_errors x86/mce/amd: Make threshold bank setting hotplug robust x86/mce/amd: Cleanup threshold device remove path x86/mce/amd: Straighten CPU hotplug path x86/mce/amd: Sanitize thresholding device creation hotplug path ...
2020-06-13Merge tag 'x86-entry-2020-06-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry updates from Thomas Gleixner: "The x86 entry, exception and interrupt code rework This all started about 6 month ago with the attempt to move the Posix CPU timer heavy lifting out of the timer interrupt code and just have lockless quick checks in that code path. Trivial 5 patches. This unearthed an inconsistency in the KVM handling of task work and the review requested to move all of this into generic code so other architectures can share. Valid request and solved with another 25 patches but those unearthed inconsistencies vs. RCU and instrumentation. Digging into this made it obvious that there are quite some inconsistencies vs. instrumentation in general. The int3 text poke handling in particular was completely unprotected and with the batched update of trace events even more likely to expose to endless int3 recursion. In parallel the RCU implications of instrumenting fragile entry code came up in several discussions. The conclusion of the x86 maintainer team was to go all the way and make the protection against any form of instrumentation of fragile and dangerous code pathes enforcable and verifiable by tooling. A first batch of preparatory work hit mainline with commit d5f744f9a2ac ("Pull x86 entry code updates from Thomas Gleixner") That (almost) full solution introduced a new code section '.noinstr.text' into which all code which needs to be protected from instrumentation of all sorts goes into. Any call into instrumentable code out of this section has to be annotated. objtool has support to validate this. Kprobes now excludes this section fully which also prevents BPF from fiddling with it and all 'noinstr' annotated functions also keep ftrace off. The section, kprobes and objtool changes are already merged. The major changes coming with this are: - Preparatory cleanups - Annotating of relevant functions to move them into the noinstr.text section or enforcing inlining by marking them __always_inline so the compiler cannot misplace or instrument them. - Splitting and simplifying the idtentry macro maze so that it is now clearly separated into simple exception entries and the more interesting ones which use interrupt stacks and have the paranoid handling vs. CR3 and GS. - Move quite some of the low level ASM functionality into C code: - enter_from and exit to user space handling. The ASM code now calls into C after doing the really necessary ASM handling and the return path goes back out without bells and whistels in ASM. - exception entry/exit got the equivivalent treatment - move all IRQ tracepoints from ASM to C so they can be placed as appropriate which is especially important for the int3 recursion issue. - Consolidate the declaration and definition of entry points between 32 and 64 bit. They share a common header and macros now. - Remove the extra device interrupt entry maze and just use the regular exception entry code. - All ASM entry points except NMI are now generated from the shared header file and the corresponding macros in the 32 and 64 bit entry ASM. - The C code entry points are consolidated as well with the help of DEFINE_IDTENTRY*() macros. This allows to ensure at one central point that all corresponding entry points share the same semantics. The actual function body for most entry points is in an instrumentable and sane state. There are special macros for the more sensitive entry points, e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF. They allow to put the whole entry instrumentation and RCU handling into safe places instead of the previous pray that it is correct approach. - The INT3 text poke handling is now completely isolated and the recursion issue banned. Aside of the entry rework this required other isolation work, e.g. the ability to force inline bsearch. - Prevent #DB on fragile entry code, entry relevant memory and disable it on NMI, #MC entry, which allowed to get rid of the nested #DB IST stack shifting hackery. - A few other cleanups and enhancements which have been made possible through this and already merged changes, e.g. consolidating and further restricting the IDT code so the IDT table becomes RO after init which removes yet another popular attack vector - About 680 lines of ASM maze are gone. There are a few open issues: - An escape out of the noinstr section in the MCE handler which needs some more thought but under the aspect that MCE is a complete trainwreck by design and the propability to survive it is low, this was not high on the priority list. - Paravirtualization When PV is enabled then objtool complains about a bunch of indirect calls out of the noinstr section. There are a few straight forward ways to fix this, but the other issues vs. general correctness were more pressing than parawitz. - KVM KVM is inconsistent as well. Patches have been posted, but they have not yet been commented on or picked up by the KVM folks. - IDLE Pretty much the same problems can be found in the low level idle code especially the parts where RCU stopped watching. This was beyond the scope of the more obvious and exposable problems and is on the todo list. The lesson learned from this brain melting exercise to morph the evolved code base into something which can be validated and understood is that once again the violation of the most important engineering principle "correctness first" has caused quite a few people to spend valuable time on problems which could have been avoided in the first place. The "features first" tinkering mindset really has to stop. With that I want to say thanks to everyone involved in contributing to this effort. Special thanks go to the following people (alphabetical order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra, Vitaly Kuznetsov, and Will Deacon" * tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits) x86/entry: Force rcu_irq_enter() when in idle task x86/entry: Make NMI use IDTENTRY_RAW x86/entry: Treat BUG/WARN as NMI-like entries x86/entry: Unbreak __irqentry_text_start/end magic x86/entry: __always_inline CR2 for noinstr lockdep: __always_inline more for noinstr x86/entry: Re-order #DB handler to avoid *SAN instrumentation x86/entry: __always_inline arch_atomic_* for noinstr x86/entry: __always_inline irqflags for noinstr x86/entry: __always_inline debugreg for noinstr x86/idt: Consolidate idt functionality x86/idt: Cleanup trap_init() x86/idt: Use proper constants for table size x86/idt: Add comments about early #PF handling x86/idt: Mark init only functions __init x86/entry: Rename trace_hardirqs_off_prepare() x86/entry: Clarify irq_{enter,exit}_rcu() x86/entry: Remove DBn stacks x86/entry: Remove debug IDT frobbing x86/entry: Optimize local_db_save() for virt ...
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-12ibmvnic: Flush existing work items before device removalThomas Falcon
Ensure that all scheduled work items have completed before continuing with device removal and after further event scheduling has been halted. This patch fixes a bug where a scheduled driver reset event is processed following device removal. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-12Merge tag 'thermal-v5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Add the hwmon support on the i.MX SC (Anson Huang) - Thermal framework cleanups (self-encapsulation, pointless stubs, private structures) (Daniel Lezcano) - Use the PM QoS frequency changes for the devfreq cooling device (Matthias Kaehlcke) - Remove duplicate error messages from platform_get_irq() error handling (Markus Elfring) - Add support for the bandgap sensors (Keerthy) - Statically initialize .get_mode/.set_mode ops (Andrzej Pietrasiewicz) - Add Renesas R-Car maintainer entry (Niklas Söderlund) - Fix error checking after calling ti_bandgap_get_sensor_data() for the TI SoC thermal (Sudip Mukherjee) - Add latency constraint for the idle injection, the DT binding and the change the registering function (Daniel Lezcano) - Convert the thermal framework binding to the Yaml schema (Amit Kucheria) - Replace zero-length array with flexible-array on i.MX 8MM (Gustavo A. R. Silva) - Thermal framework cleanups (alphabetic order for heads, replace module.h by export.h, make file naming consistent) (Amit Kucheria) - Merge tsens-common into the tsens driver (Amit Kucheria) - Fix platform dependency for the Qoriq driver (Geert Uytterhoeven) - Clean up the rcar_thermal_update_temp() function in the rcar thermal driver (Niklas Söderlund) - Fix the TMSAR register for the TMUv2 on the Qoriq platform (Yuantian Tang) - Export GDDV, OEM vendor variables, and don't require IDSP for the int340x thermal driver - trivial conflicts fixed (Matthew Garrett) * tag 'thermal-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (48 commits) thermal/int340x_thermal: Don't require IDSP to exist thermal/int340x_thermal: Export OEM vendor variables thermal/int340x_thermal: Export GDDV thermal: qoriq: Update the settings for TMUv2 thermal: rcar_thermal: Clean up rcar_thermal_update_temp() thermal: qoriq: Add platform dependencies drivers: thermal: tsens: Merge tsens-common.c into tsens.c thermal/of: Rename of-thermal.c thermal/governors: Prefix all source files with gov_ thermal/drivers/user_space: Sort headers alphabetically thermal/drivers/of-thermal: Sort headers alphabetically thermal/drivers/cpufreq_cooling: Replace module.h with export.h thermal/drivers/cpufreq_cooling: Sort headers alphabetically thermal/drivers/clock_cooling: Include export.h thermal/drivers/clock_cooling: Sort headers alphabetically thermal/drivers/thermal_hwmon: Include export.h thermal/drivers/thermal_hwmon: Sort headers alphabetically thermal/drivers/thermal_helpers: Include export.h thermal/drivers/thermal_helpers: Sort headers alphabetically thermal/core: Replace module.h with export.h ...
2020-06-12Merge tag 'pwm/for-5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "Nothing too exciting for this cycle. A couple of fixes across the board, and Lee volunteered to help with patch review" * tag 'pwm/for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: Add missing "CONFIG_" prefix MAINTAINERS: Add Lee Jones as reviewer for the PWM subsystem pwm: imx27: Fix rounding behavior pwm: rockchip: Simplify rockchip_pwm_get_state() pwm: img: Call pm_runtime_put() in pm_runtime_get_sync() failed case pwm: tegra: Support dynamic clock frequency configuration pwm: jz4740: Add support for the JZ4725B pwm: jz4740: Make PWM start with the active part pwm: jz4740: Enhance precision in calculation of duty cycle pwm: jz4740: Drop dependency on MACH_INGENIC pwm: lpss: Fix get_state runtime-pm reference handling pwm: sun4i: Support direct clock output on Allwinner A64 pwm: Add support for Azoteq IQS620A PWM generator dt-bindings: pwm: rcar: add r8a77961 support pwm: Add missing '\n' in log messages
2020-06-12Merge tag 'iommu-drivers-move-v5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu driver directory structure cleanup from Joerg Roedel: "Move the Intel and AMD IOMMU drivers into their own subdirectory. Both drivers consist of several files by now and giving them their own directory unclutters the IOMMU top-level directory a bit" * tag 'iommu-drivers-move-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Move Intel IOMMU driver into subdirectory iommu/amd: Move AMD IOMMU driver into subdirectory
2020-06-12Merge tag 'for-linus-5.8b-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - several smaller cleanups - a fix for a Xen guest regression with CPU offlining - a small fix in the xen pvcalls backend driver - an update of MAINTAINERS * tag 'for-linus-5.8b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE xen/pci: Get rid of verbose_request and use dev_dbg() instead xenbus: Use dev_printk() when possible xen-pciback: Use dev_printk() when possible xen: enable BALLOON_MEMORY_HOTPLUG by default xen: expand BALLOON_MEMORY_HOTPLUG description xen/pvcalls: Make pvcalls_back_global static xen/cpuhotplug: Fix initial CPU offlining for PV(H) guests xen-platform: Constify dev_pm_ops xen/pvcalls-back: test for errors when calling backend_connect()
2020-06-11Merge tag 'locking-kcsan-2020-06-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull the Kernel Concurrency Sanitizer from Thomas Gleixner: "The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which relies on compile-time instrumentation, and uses a watchpoint-based sampling approach to detect races. The feature was under development for quite some time and has already found legitimate bugs. Unfortunately it comes with a limitation, which was only understood late in the development cycle: It requires an up to date CLANG-11 compiler CLANG-11 is not yet released (scheduled for June), but it's the only compiler today which handles the kernel requirements and especially the annotations of functions to exclude them from KCSAN instrumentation correctly. These annotations really need to work so that low level entry code and especially int3 text poke handling can be completely isolated. A detailed discussion of the requirements and compiler issues can be found here: https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/ We came to the conclusion that trying to work around compiler limitations and bugs again would end up in a major trainwreck, so requiring a working compiler seemed to be the best choice. For Continous Integration purposes the compiler restriction is manageable and that's where most xxSAN reports come from. For a change this limitation might make GCC people actually look at their bugs. Some issues with CSAN in GCC are 7 years old and one has been 'fixed' 3 years ago with a half baken solution which 'solved' the reported issue but not the underlying problem. The KCSAN developers also ponder to use a GCC plugin to become independent, but that's not something which will show up in a few days. Blocking KCSAN until wide spread compiler support is available is not a really good alternative because the continuous growth of lockless optimizations in the kernel demands proper tooling support" * tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits) compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining compiler.h: Move function attributes to compiler_types.h compiler.h: Avoid nested statement expression in data_race() compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE() kcsan: Update Documentation to change supported compilers kcsan: Remove 'noinline' from __no_kcsan_or_inline kcsan: Pass option tsan-instrument-read-before-write to Clang kcsan: Support distinguishing volatile accesses kcsan: Restrict supported compilers kcsan: Avoid inserting __tsan_func_entry/exit if possible ubsan, kcsan: Don't combine sanitizer with kcov on clang objtool, kcsan: Add kcsan_disable_current() and kcsan_enable_current_nowarn() kcsan: Add __kcsan_{enable,disable}_current() variants checkpatch: Warn about data_race() without comment kcsan: Use GFP_ATOMIC under spin lock Improve KCSAN documentation a bit kcsan: Make reporting aware of KCSAN tests kcsan: Fix function matching in report kcsan: Change data_race() to no longer require marking racing accesses kcsan: Move kcsan_{disable,enable}_current() to kcsan-checks.h ...
2020-06-11net: ipa: header pad field only valid for AP->modem endpointAlex Elder
Only QMAP endpoints should be configured to find a pad size field within packet headers. They are found in the first byte of the QMAP header (and the hardware fills only the 6 bits in that byte that constitute the pad_len field). The RMNet driver assumes the pad_len field is valid for received packets, so we want to ensure the pad_len field is filled in that case. That driver also assumes the length in the QMAP header includes the pad bytes. The RMNet driver does *not* pad the packets it sends, so the pad_len field can be ignored. Fix ipa_endpoint_init_hdr_ext() so it only marks the pad field offset valid for QMAP RX endpoints, and in that case indicates that the length field in the header includes the pad bytes. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11net: ipa: program upper nibbles of sequencer typeAlex Elder
The upper two nibbles of the sequencer type were not used for SDM845, and were assumed to be 0. But for SC7180 they are used, and so they must be programmed by ipa_endpoint_init_seq(). Fix this bug. IPA_SEQ_PKT_PROCESS_NO_DEC_NO_UCP_DMAP doesn't have a descriptive comment, so add one. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11net: ipa: fix modem LAN RX endpoint idAlex Elder
The endpoint id assigned to the modem LAN RX endpoint for the SC7180 SoC is incorrect. The erroneous value might have been copied from SDM845 and never updated. The correct endpoint id to use for this SoC is 11. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11net: ipa: program metadata mask differentlyAlex Elder
The way the mask value is programmed for QMAP RX endpoints was based on some wrong assumptions about the way metadata containing the QMAP mux_id value is formatted. The metadata value supplied by the modem is *not* in QMAP format, and in fact contains the mux_id we want in its (big endian) low-order byte. That byte must be written by the IPA into offset 1 of the QMAP header it inserts before the received packet. QMAP TX endpoints *do* use a QMAP header as the metadata sent with each packet. The modem assumes this, and based on that assumes the mux_id is in the second byte. To match those assumptions we must program the modem TX (QMAP) endpoint HDR register to indicate the metadata will be found at offset 0 in the message header. The previous configuration managed to work, but it was not working correctly. This patch fixes a bug whose symptom was receipt of messages containing the wrong QMAP mux_id. In fixing this, get rid of ipa_rmnet_mux_id_metadata_mask(), which was more or less defined so there was a separate place to explain what was happening as we generated the mask value. Instead, put a longer description of how this works above ipa_endpoint_init_hdr(), and define the metadata mask to use as a simple constant. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11ionic: add pcie_print_link_statusShannon Nelson
Print the PCIe link information for our device. Fixes: 77f972a7077d ("ionic: remove support for mgmt device") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2020-06-11 This series contains fixes to the iavf driver. Brett fixes the supported link speeds in the iavf driver, which was only able to report speeds that the i40e driver supported and was missing the speeds supported by the ice driver. In addition, fix how 2.5 and 5.0 GbE speeds are reported. Alek fixes a enum comparison that was comparing two different enums that may have different values, so update the comparison to use matching enums. Paul increases the time to complete a reset to allow for 128 VFs to complete a reset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Pull updates from Andrew Morton: "A few fixes and stragglers. Subsystems affected by this patch series: mm/memory-failure, ocfs2, lib/lzo, misc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: amdgpu: a NULL ->mm does not mean a thread is a kthread lib/lzo: fix ambiguous encoding bug in lzo-rle ocfs2: fix build failure when TCP/IP is disabled mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread mm/memory-failure: prioritize prctl(PR_MCE_KILL) over vm.memory_failure_early_kill
2020-06-11amdgpu: a NULL ->mm does not mean a thread is a kthreadChristoph Hellwig
Use the proper API instead. Fixes: 70539bd795002 ("drm/amd: Update MEC HQD loading code for KFD") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Felipe Balbi <balbi@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de Link: http://lkml.kernel.org/r/20200404094101.672954-2-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11Merge tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Some followup fixes for this merge window. In particular: - Seqcount write missing preemption disable for stats (Ahmed) - blktrace fixes (Chaitanya) - Redundant initializations (Colin) - Various small NVMe fixes (Chaitanya, Christoph, Daniel, Max, Niklas, Rikard) - loop flag bug regression fix (Martijn) - blk-mq tagging fixes (Christoph, Ming)" * tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block: umem: remove redundant initialization of variable ret pktcdvd: remove redundant initialization of variable ret nvmet: fail outstanding host posted AEN req nvme-pci: use simple suspend when a HMB is enabled nvme-fc: don't call nvme_cleanup_cmd() for AENs nvmet-tcp: constify nvmet_tcp_ops nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops nvme: do not call del_gendisk() on a disk that was never added blk-mq: fix blk_mq_all_tag_iter blk-mq: split out a __blk_mq_get_driver_tag helper blktrace: fix endianness for blk_log_remap() blktrace: fix endianness in get_pdu_int() blktrace: use errno instead of bi_status block: nr_sects_write(): Disable preemption on seqcount write block: remove the error argument to the block_bio_complete tracepoint loop: Fix wrong masking of status flags block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed
2020-06-11net/mlx5: E-Switch, Fix some error pointer dereferencesDan Carpenter
We can't leave "counter" set to an error pointer. Otherwise either it will lead to an error pointer dereference later in the function or it leads to an error pointer dereference when we call mlx5_fc_destroy(). Fixes: 07bab9502641d ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-11net/mlx5: Don't fail driver on failure to create debugfsLeon Romanovsky
Clang warns: drivers/net/ethernet/mellanox/mlx5/core/main.c:1278:6: warning: variable 'err' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!priv->dbg_root) { ^~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1303:9: note: uninitialized use occurs here return err; ^~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1278:2: note: remove the 'if' if its condition is always false if (!priv->dbg_root) { ^~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1259:9: note: initialize the variable 'err' to silence this warning int err; ^ = 0 1 warning generated. The check of returned value of debugfs_create_dir() is wrong because by the design debugfs failures should never fail the driver and the check itself was wrong too. The kernel compiled without CONFIG_DEBUG_FS will return ERR_PTR(-ENODEV) and not NULL as expected. Fixes: 11f3b84d7068 ("net/mlx5: Split mdev init and pci init") Link: https://github.com/ClangBuiltLinux/linux/issues/1042 Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-11net/mlx5e: CT: Fix ipv6 nat header rewrite actionsOz Shlomo
Set the ipv6 word fields according to the hardware definitions. Fixes: ac991b48d43c ("net/mlx5e: CT: Offload established flows") Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-11net/mlx5: Fix devlink objects and devlink device unregister sequenceParav Pandit
Current below problems exists. 1. devlink device is registered by mlx5_load_one(). But it is not unregistered by mlx5_unload_one(). This is incorrect. 2. Above issue leads to, When mlx5 PCI device is removed, currently devlink device is unregistered before devlink ports are unregistered in below ladder diagram. remove_one() mlx5_devlink_unregister() [..] devlink_unregister() <- ports are still registered! mlx5_unload_one() mlx5_unregister_device() mlx5_remove_device() mlx5e_remove() mlx5e_devlink_port_unregister() devlink_port_unregister() 3. Condition checking for registering and unregister device are not symmetric either in these routines. Hence, fix the sequence by having load and unload routines symmetric and in right order. i.e. (a) register devlink device followed by registering devlink ports (b) unregister devlink ports followed by devlink device Do this based on boot and cleanup flags instead of different conditions. Fixes: c6acd629eec7 ("net/mlx5e: Add support for devlink-port in non-representors mode") Fixes: f60f315d339e ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-11net/mlx5: Disable reload while removing the deviceParav Pandit
While unregistration is in progress, user might be reloading the interface. This can race with unregistration in below flow which uses the resources which are getting disabled by reload flow. Hence, disable the devlink reloading first when removing the device. CPU0 CPU1 ---- ---- local_pci_remove() devlink_mutex remove_one() devlink_nl_cmd_reload() mlx5_unregister_device() devlink_reload() ops->reload_down() mlx5_unload_one() Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>