summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2021-06-24Merge tag 'nvme-5.14-2021-06-22' of git://git.infradead.org/nvme into ↵for-5.14/drivers-2021-06-29Jens Axboe
for-5.14/drivers Pull NVMe updates from Christoph "nvme updates for Linux 5.14: - move the ACPI StorageD3 code to drivers/acpi/ and add quirks for certain AMD CPUs (Mario Limonciello) - zoned device support for nvmet (Chaitanya Kulkarni) - fix the rules for changing the serial number in nvmet (Noam Gottlieb) - various small fixes and cleanups (Dan Carpenter, JK Kim, Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert Uytterhoeven, Daniel Wagner)" * tag 'nvme-5.14-2021-06-22' of git://git.infradead.org/nvme: (38 commits) nvmet: use NVMET_MAX_NAMESPACES to set nn value nvme.h: add missing nvme_lba_range_type endianness annotations nvme: remove zeroout memset call for struct nvme-pci: remove zeroout memset call for struct nvmet: remove zeroout memset call for struct nvmet: add ZBD over ZNS backend support nvmet: add Command Set Identifier support nvmet: add nvmet_req_bio put helper for backends nvmet: add req cns error complete helper block: export blk_next_bio() nvmet: remove local variable nvmet: use nvme status value directly nvmet: use u32 type for the local variable nsid nvmet: use u32 for nvmet_subsys max_nsid nvmet: use req->cmd directly in file-ns fast path nvmet: use req->cmd directly in bdev-ns fast path nvmet: make ver stable once connection established nvmet: allow mn change if subsys not discovered nvmet: make sn stable once connection was established nvmet: change sn size and check validity ...
2021-06-21nvmet: use NVMET_MAX_NAMESPACES to set nn valueChaitanya Kulkarni
For Spec regarding MNAN value:- If the controller supports Asymmetric Namespace Access Reporting, then this field shall be set to a non-zero value that is less than or equal to the NN value. Instead of using subsys->max_nsid that gets calculated dynamically, use NVMET_MAX_NAMESPACES value to report NN. This way we will maintain the MNAN value spec compliant with NN. Without this patch, code results in the following error :- [337976.409142] nvme nvme1: Invalid MNAN value 1024 Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-18loop: Fix missing discard support when using LOOP_CONFIGUREKristian Klausen
Without calling loop_config_discard() the discard flag and parameters aren't set/updated for the loop device and worst-case they could indicate discard support when it isn't the case (ex: if the LOOP_SET_STATUS ioctl was used with a different file prior to LOOP_CONFIGURE). Cc: <stable@vger.kernel.org> # 5.8.x- Fixes: 3448914e8cc5 ("loop: Add LOOP_CONFIGURE ioctl") Signed-off-by: Kristian Klausen <kristian@klausen.dk> Link: https://lore.kernel.org/r/20210618115157.31452-1-kristian@klausen.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-17nvme: remove zeroout memset call for structChaitanya Kulkarni
Declare and initialize structure variables to zero values so that we can remove zeroout memset calls in the host/core.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-pci: remove zeroout memset call for structChaitanya Kulkarni
Declare and initialize structure variables to zero values so that we can remove zeroout memset calls in the host/pci.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: remove zeroout memset call for structChaitanya Kulkarni
Declare and initialize structure variables to zero values so that we can remove zeroout memset calls in the target/rdma.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: add ZBD over ZNS backend supportChaitanya Kulkarni
NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to communicate with a non-volatile memory subsystem using zones for NVMe protocol-based controllers. NVMeOF already support the ZNS NVMe Protocol compliant devices on the target in the passthru mode. There are generic zoned block devices like Shingled Magnetic Recording (SMR) HDDs that are not based on the NVMe protocol. This patch adds ZNS backend support for non-ZNS zoned block devices as NVMeOF targets. This support includes implementing the new command set NVME_CSI_ZNS, adding different command handlers for ZNS command set such as NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append, NVMe Zone Management Send and NVMe Zone Management Receive. With the new command set identifier, we also update the target command effects logs to reflect the ZNS compliant commands. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: add Command Set Identifier supportChaitanya Kulkarni
NVMe TP 4056 allows controllers to support different command sets. NVMeoF target currently only supports namespaces that contain traditional logical blocks that may be randomly read and written. In some applications there is a value in exposing namespaces that contain logical blocks that have special access rules (e.g. sequentially write required namespace such as Zoned Namespace (ZNS)). In order to support the Zoned Block Devices (ZBD) backend, controllers need to have support for ZNS Command Set Identifier (CSI). In this preparation patch, we adjust the code such that it can now support the default command set identifier. We update the namespace data structure to store the CSI value which defaults to NVME_CSI_NVM that represents traditional logical blocks namespace type. The CSI support is required to implement the ZBD backend for NVMeOF with host side NVMe ZNS interface, since ZNS commands belong to the different command set than the default one. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: add nvmet_req_bio put helper for backendsChaitanya Kulkarni
In current code there exists two backends which are using inline bio optimization, that adds a duplicate code for freeing the bio. For Zoned Block Device backend we also use the same optimzation and it will lead to having duplicate code in the three backends: generic bdev, passsthru, and generic zns. Add a helper function to avoid duplicate code and update the respective backends. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: add req cns error complete helperChaitanya Kulkarni
We report error and complete the request when identify cns value is not handled in nvmet_execute_identify(). This error reporting is also needed for Zone Block Device backend for NVMeOF target. Add a helper nvmet_req_cns_error_compplete() to report an error and complete the request when idenitfy command cns not handled value. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: remove local variableChaitanya Kulkarni
In function errno_to_nvme_status() we store the value of the NVMe status into the local variable and don't do anything useful with that but just return. Remove the local variable and return the value directly from switch. This also removed extra break statements. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: use nvme status value directlyChaitanya Kulkarni
There is no point in keeping the status variable that is used only once in the function nvmet_async_events_failall(). Remove the variable and use the value directly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: use u32 type for the local variable nsidChaitanya Kulkarni
In function nvmet_max_nsid() we calculate the max nsid by iterating over the XArray and store it in the variable nsid that has type of unsigned long. Since the value of this function is stored into the subsys->max_nsid which is of type u32, change the local variable nsid type and the return type of the same function to u32. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: use u32 for nvmet_subsys max_nsidChaitanya Kulkarni
Use u32 type for the nsid_max member of the nvmet_subsys structure. This avoids the type confusion when updating the subsys->nax_nsid from ns->nsid. This also matches the nvmet_ns->nsid member. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: use req->cmd directly in file-ns fast pathChaitanya Kulkarni
The function nvmet_file_parse_io_cmd() is called from the fast path. The local variable to that function cmd is only used once. Remove the local variable and use req->cmd directly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: use req->cmd directly in bdev-ns fast pathChaitanya Kulkarni
The function nvmet_bdev_parse_io_cmd() is called from the fast path. The local variable to that function cmd is only used once. Remove the local variable and use req->cmd directly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: make ver stable once connection establishedNoam Gottlieb
Once some host has connected to the nvmf target, make sure that the version number is stable and cannot be changed. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: allow mn change if subsys not discoveredNoam Gottlieb
Currently, once the subsystem's model_number is set for the first time there is no way to change it. However, as long as no connection was established to nvmf target, there is no reason for such restriction and we should allow to change the subsystem's model_number as many times as needed. In addition, in order to simplfy the changes and make the model number flow more similar to the rest of the attributes in the Identify Controller data structure, we set a default value for the model number at the initiation of the subsystem. Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: make sn stable once connection was establishedNoam Gottlieb
Once some host has connected to the target, make sure that the serial number is stable and cannot be changed. Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: change sn size and check validityNoam Gottlieb
According to the NVM specification, the serial_number should be 20 bytes (bytes 23:04 of the Identify Controller data structure), and should contain only ASCII characters. In accordance, the serial_number size is changed to 20 bytes and before any attempt to store a new value in serial_number we check that the input is valid - i.e. contains only ASCII characters, is not empty and does not exceed 20 bytes. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst()Hannes Reinecke
When parsing a request in nvmet_fc_handle_fcp_rqst() we should not check for invalid target ports; if we do the command is aborted from the fcp layer, causing the host to assume a transport error. Rather we should still forward this request to the nvmet layer, which will then correctly fail the command with an appropriate error status. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-fabrics: remove memset in connect io qChaitanya Kulkarni
Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-fabrics: remove memset in connect admin qChaitanya Kulkarni
Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-fabrics: remove memset in nvmf_reg_write32()Chaitanya Kulkarni
Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-fabrics: remove memset in nvmf_reg_read64()Chaitanya Kulkarni
Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-tcp: use ctrl sgl check helperChaitanya Kulkarni
Use the helper to check NVMe controller's SGL support. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-pci: use ctrl sgl check helperChaitanya Kulkarni
Use the helper to check NVMe controller's SGL support. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-fc: use ctrl sgl check helperChaitanya Kulkarni
Use the helper to check NVMe controller's SGL support. Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme: add a helper to check ctrl sgl supportChaitanya Kulkarni
For various transports such as fc/tcp/pci it is common to check if NVMe SGLs are supported or not by the controller. In this preparation patch we add a helper to avoid the open coding of such checks in the various transport. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-pci: remove trailing lines for helpersChaitanya Kulkarni
Remove the extra white line at the end of the functions. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-pci: fix var. type for increasing cq_headJK Kim
nvmeq->cq_head is compared with nvmeq->q_depth and changed the value and cq_phase for handling the next cq db. but, nvmeq->q_depth's type is u32 and max. value is 0x10000 when CQP.MSQE is 0xffff and io_queue_depth is 0x10000. current temp. variable for comparing with nvmeq->q_depth is overflowed when previous nvmeq->cq_head is 0xffff. in this case, nvmeq->cq_phase is not updated. so, fix data type for temp. variable to u32. Signed-off-by: JK Kim <jongkang.kim2@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme-tcp: fix error codes in nvme_tcp_setup_ctrl()Dan Carpenter
These error paths currently return success but they should return -EOPNOTSUPP. Fixes: 73ffcefcfca0 ("nvme-tcp: check sgl supported by target") Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme: factor out a nvme_validate_passthru_nsid helperChaitanya Kulkarni
Add a helper nvme_validate_passthru_nsid() to validate the nsid that removes the nsid validation and error message print code from nvme_user_cmd() and nvme_user_cmd64(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme: fix grammar in the CONFIG_NVME_MULTIPATH kconfig help textGeert Uytterhoeven
Fix a singular/plural mismatch in the CONFIG_NVME_MULTIPATH help text. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme: remove superfluous bio_set_dev in nvme_requeue_workDaniel Wagner
Commit ce86dad222e9 ("nvme-multipath: reset bdev to ns head when failover") moved the reset code where the bio is added to the requeue_list for the failover path. But it left the original bio_set_dev in nvme_requeue_work. There is a second path to nvme_requee_work. It is via nvme_ns_head_submit_bio. Though we don't have to set bio->bi_bdev for this path either, as it points to the correct bdev already. Let's remove the bio_set_dev. It's updating the bio->bi_bdev with the same pointer and thus it's unnecessary. Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme: verify MNAN value if ANA is enabledDaniel Wagner
The controller is required to have a non-zero MNAN value if it supports ANA: If the controller supports Asymmetric Namespace Access Reporting, then this field shall be set to a non-zero value that is less than or equal to the NN value. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16ACPI: Add quirks for AMD Renoir/Lucienne CPUs to force the D3 hintMario Limonciello
AMD systems from Renoir and Lucienne require that the NVME controller is put into D3 over a Modern Standby / suspend-to-idle cycle. This is "typically" accomplished using the `StorageD3Enable` property in the _DSD, but this property was introduced after many of these systems launched and most OEM systems don't have it in their BIOS. On AMD Renoir without these drives going into D3 over suspend-to-idle the resume will fail with the NVME controller being reset and a trace like this in the kernel logs: ``` [ 83.556118] nvme nvme0: I/O 161 QID 2 timeout, aborting [ 83.556178] nvme nvme0: I/O 162 QID 2 timeout, aborting [ 83.556187] nvme nvme0: I/O 163 QID 2 timeout, aborting [ 83.556196] nvme nvme0: I/O 164 QID 2 timeout, aborting [ 95.332114] nvme nvme0: I/O 25 QID 0 timeout, reset controller [ 95.332843] nvme nvme0: Abort status: 0x371 [ 95.332852] nvme nvme0: Abort status: 0x371 [ 95.332856] nvme nvme0: Abort status: 0x371 [ 95.332859] nvme nvme0: Abort status: 0x371 [ 95.332909] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16 [ 95.332936] nvme 0000:03:00.0: PM: failed to resume async: error -16 ``` The Microsoft documentation for StorageD3Enable mentioned that Windows has a hardcoded allowlist for D3 support, which was used for these platforms. Introduce quirks to hardcode them for Linux as well. As this property is now "standardized", OEM systems using AMD Cezanne and newer APU's have adopted this property, and quirks like this should not be necessary. CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> CC: Alexander Deucher <Alexander.Deucher@amd.com> CC: Prike Liang <prike.liang@amd.com> Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16ACPI: Check StorageD3Enable _DSD property in ACPI codeMario Limonciello
Although first implemented for NVME, this check may be usable by other drivers as well. Microsoft's specification explicitly mentions that is may be usable by SATA and AHCI devices. Google also indicates that they have used this with SDHCI in a downstream kernel tree that a user can plug a storage device into. Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Suggested-by: Keith Busch <kbusch@kernel.org> CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> CC: Alexander Deucher <Alexander.Deucher@amd.com> CC: Rafael J. Wysocki <rjw@rjwysocki.net> CC: Prike Liang <prike.liang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-15Merge branch 'md-next' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.14/drivers Pull MD changes from Song: "1) iostats rewrite by Guoqing Jiang; 2) raid5 lock contention optimization by Gal Ofri." * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/raid5: avoid device_lock in read_one_chunk() md: add comments in md_integrity_register md: check level before create and exit io_acct_set md: Constify attribute_group structs md: mark some personalities as deprecated md/raid10: enable io accounting md/raid1: enable io accounting md/raid1: rename print_msg with r1bio_existed md/raid5: avoid redundant bio clone in raid5_read_one_chunk md/raid5: move checking badblock before clone bio in raid5_read_one_chunk md: add io accounting for raid0 and raid5 md: revert io stats accounting
2021-06-15floppy: Fix fall-through warning for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Link: https://lore.kernel.org/linux-hardening/47bcd36a-6524-348b-e802-0691d1b3c429@kernel.dk/ Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Denis Efremov <efremov@linux.com>
2021-06-15floppy: cleanup: remove redundant assignment to nr_sectorsJiapeng Chong
Variable nr_sectors is set to zero but this value is never read as it is overwritten later on, hence it is a redundant assignment and can be removed. Clean up the following clang-analyzer warning: drivers/block/floppy.c:2333:2: warning: Value stored to 'nr_sectors' is never read [clang-analyzer-deadcode.DeadStores]. Link: https://lore.kernel.org/r/1619774805-121562-1-git-send-email-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Denis Efremov <efremov@linux.com>
2021-06-14md/raid5: avoid device_lock in read_one_chunk()Gal Ofri
There is a lock contention on device_lock in read_one_chunk(). device_lock is taken to sync conf->active_aligned_reads and conf->quiesce. read_one_chunk() takes the lock, then waits for quiesce=0 (resumed) before incrementing active_aligned_reads. raid5_quiesce() takes the lock, sets quiesce=2 (in-progress), then waits for active_aligned_reads to be zero before setting quiesce=1 (suspended). Introduce a fast (lockless) path in read_one_chunk(): activate aligned read without taking device_lock. In case quiesce starts while activating the aligned-read in fast path, deactivate it and revert to old behavior (take device_lock and wait for quiesce to finish). Add smp store/load in raid5_quiesce()/read_one_chunk() respectively to gaurantee that read_one_chunk() does not miss an ongoing quiesce. My setups: 1. 8 local nvme drives (each up to 250k iops). 2. 8 ram disks (brd). Each setup with raid6 (6+2), 1024 io threads on a 96 cpu-cores (48 per socket) system. Record both iops and cpu spent on this contention with rand-read-4k. Record bw with sequential-read-128k. Note: in most cases cpu is still busy but due to "new" bottlenecks. nvme: | iops | cpu | bw ----------------------------------------------- without patch | 1.6M | ~50% | 5.5GB/s with patch | 2M (throttled) | 0% | 16GB/s (throttled) ram (brd): | iops | cpu | bw ----------------------------------------------- without patch | 2M | ~80% | 24GB/s with patch | 4M | 0% | 55GB/s CC: Song Liu <song@kernel.org> CC: Neil Brown <neilb@suse.de> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Gal Ofri <gal.ofri@storing.io> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: add comments in md_integrity_registerGuoqing Jiang
Given it is not obvious for the error handling, let's try to add some comments here to make it clear. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: check level before create and exit io_acct_setGuoqing Jiang
The bio_set (io_acct_set) is used by personalities to clone bio and trace the timestamp of bio. Some personalities such as raid1/10 don't need the bio_set, so add check to not create it unconditionally. Also update the comment for md_account_bio to make it more clear. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: Constify attribute_group structsRikard Falkeborn
The attribute_group structs are never modified, they're only passed to sysfs_create_group() and sysfs_remove_group(). Make them const to allow the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: mark some personalities as deprecatedGuoqing Jiang
Mark the three personalities (linear, fault and multipath) as deprecated because: 1. people can use dm multipath or nvme multipath. 2. linear is already deprecated in MODULE_ALIAS. 3. no one actively using fault. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid10: enable io accountingGuoqing Jiang
For raid10, we record the start time between split bio and clone bio, and finish the accounting in the final endio. Also introduce start_time in r10bio accordingly. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid1: enable io accountingGuoqing Jiang
For raid1, we record the start time between split bio and clone bio, and finish the accounting in the final endio. Also introduce start_time in r1bio accordingly. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid1: rename print_msg with r1bio_existedGuoqing Jiang
The caller of raid1_read_request could pass NULL or a valid pointer for "struct r1bio *r1_bio", so it actually means whether r1_bio is existed or not. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid5: avoid redundant bio clone in raid5_read_one_chunkGuoqing Jiang
After enable io accounting, chunk read bio could be cloned twice which is not good. To avoid such inefficiency, let's clone align_bio from io_acct_set too, then we need only call md_account_bio in make_request unconditionally. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>