summaryrefslogtreecommitdiff
path: root/drivers/nvdimm/pmem.c
AgeCommit message (Collapse)Author
2022-05-16pmem: implement pmem_recovery_write()Jane Chu
The recovery write thread started out as a normal pwrite thread and when the filesystem was told about potential media error in the range, filesystem turns the normal pwrite to a dax_recovery_write. The recovery write consists of clearing media poison, clearing page HWPoison bit, reenable page-wide read-write permission, flush the caches and finally write. A competing pread thread will be held off during the recovery process since data read back might not be valid, and this is achieved by clearing the badblock records after the recovery write is complete. Competing recovery write threads are already serialized by writer lock held by dax_iomap_rw(). Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/165247997655.53156.8381418704988035976.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16pmem: refactor pmem_clear_poison()Jane Chu
Refactor the pmem_clear_poison() function such that the common shared code between the typical write path and the recovery write path is factored out. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/20220422224508.440670-7-jane.chu@oracle.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16dax: add .recovery_write dax_operationJane Chu
Introduce dax_recovery_write() operation. The function is used to recover a dax range that contains poison. Typical use case is when a user process receives a SIGBUS with si_code BUS_MCEERR_AR indicating poison(s) in a dax range, in response, the user process issues a pwrite() to the page-aligned dax range, thus clears the poison and puts valid data in the range. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16dax: introduce DAX_RECOVERY_WRITE dax access modeJane Chu
Up till now, dax_direct_access() is used implicitly for normal access, but for the purpose of recovery write, dax range with poison is requested. To make the interface clear, introduce enum dax_access_mode { DAX_ACCESS, DAX_RECOVERY_WRITE, } where DAX_ACCESS is used for normal dax access, and DAX_RECOVERY_WRITE is used for dax recovery write. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16mce: fix set_mce_nospec to always unmap the whole pageJane Chu
The set_memory_uc() approach doesn't work well in all cases. As Dan pointed out when "The VMM unmapped the bad page from guest physical space and passed the machine check to the guest." "The guest gets virtual #MC on an access to that page. When the guest tries to do set_memory_uc() and instructs cpa_flush() to do clean caches that results in taking another fault / exception perhaps because the VMM unmapped the page from the guest." Since the driver has special knowledge to handle NP or UC, mark the poisoned page with NP and let driver handle it when it comes down to repair. Please refer to discussions here for more details. https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/ Now since poisoned page is marked as not-present, in order to avoid writing to a not-present page and trigger kernel Oops, also fix pmem_do_write(). Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/165272615484.103830.2563950688772226611.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18dax: remove the copy_from_iter and copy_to_iter methodsChristoph Hellwig
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18dax: remove the DAXDEV_F_SYNC flagChristoph Hellwig
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()Christoph Hellwig
These two wrappers are never used. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211215084508.435401-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: remove dax_capableChristoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: simplify the dax_device <-> gendisk associationChristoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that want to associate their gendisk with a dax_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-10Merge tag 'libnvdimm-for-5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm update from Dan Williams: "A single cleanup that precedes some deeper PMEM/DAX reworks that did not settle in time for v5.16: - Continue the cleanup of the dax api in preparation for a dax-device block-device divorce" * tag 'libnvdimm-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/pmem: move dax_attribute_group from dax to pmem
2021-11-09Merge tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block driver updates from Jens Axboe: - Last series adding error handling support for add_disk() in drivers. After this one, and once the SCSI side has been merged, we can finally annotate add_disk() as must_check. (Luis) - bcache fixes (Coly) - zram fixes (Ming) - ataflop locking fix (Tetsuo) - nbd fixes (Ye, Yu) - MD merge via Song - Cleanup (Yang) - sysfs fix (Guoqing) - Misc fixes (Geert, Wu, luo) * tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block: (34 commits) bcache: Revert "bcache: use bvec_virt" ataflop: Add missing semicolon to return statement floppy: address add_disk() error handling on probe ataflop: address add_disk() error handling on probe block: update __register_blkdev() probe documentation ataflop: remove ataflop_probe_lock mutex mtd/ubi/block: add error handling support for add_disk() block/sunvdc: add error handling support for add_disk() z2ram: add error handling support for add_disk() nvdimm/pmem: use add_disk() error handling nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned nvdimm/blk: add error handling support for add_disk() nvdimm/blk: avoid calling del_gendisk() on early failures nvdimm/btt: add error handling support for add_disk() nvdimm/btt: use goto error labels on btt_blk_init() loop: Remove duplicate assignments drbd: Fix double free problem in drbd_create_device nvdimm/btt: do not call del_gendisk() if not needed bcache: fix use-after-free problem in bcache_device_free() zram: replace fsync_bdev with sync_blockdev ...
2021-11-04nvdimm/pmem: use add_disk() error handlingLuis Chamberlain
Now that device_add_disk() supports returning an error, use that. We must unwind alloc_dax() on error. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-7-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-04nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assignedLuis Chamberlain
Prior to devm being able to use pmem_release_disk() there are other failure which can occur for which we must account for and release the disk for. Address those few cases. Fixes: 3dd60fb9d95d ("nvdimm/pmem: stop using q_usage_count as external pgmap refcount") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-6-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-01Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...
2021-10-25nvdimm/pmem: stop using q_usage_count as external pgmap refcountChristoph Hellwig
Originally all DAX access when through block_device operations and thus needed a queue reference. But since commit cccbce671582 ("filesystem-dax: convert to dax_direct_access()") all this happens at the DAX device level which uses its own refcounting. Having the external refcount thus wasn't needed but has otherwise been harmless for long time. But now that "block: drain file system I/O on del_gendisk" waits for q_usage_count to reach 0 in del_gendisk this whole scheme can't work anymore (and pmem is the only driver abusing q_usage_count like that). So switch to the internal reference and remove the unbalanced blk_freeze_queue_start that is taken care of by del_gendisk. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019073641.2323410-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-18block: switch polling to be bio basedChristoph Hellwig
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-27nvdimm/pmem: fix creating the dax groupChristoph Hellwig
The recent block layer refactoring broke the way how the pmem driver abused device_add_disk. Fix this by properly passing the attribute groups to device_add_disk. Fixes: 52b85909f85d ("block: fold register_disk into device_add_disk") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210922173431.2454024-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-27nvdimm/pmem: move dax_attribute_group from dax to pmemChristoph Hellwig
dax_attribute_group is only used by the pmem driver, and can avoid the completely pointless lookup by the disk name if moved there. This leaves just a single caller of dax_get_by_host, so move dax_get_by_host into the same ifdef block as that caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210922173431.2454024-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbindsumiyawang
There is a use after free crash when the pmem driver tears down its mapping while I/O is still inbound. This is triggered by driver unbind, "ndctl destroy-namespace", while I/O is in flight. Fix the sequence of blk_cleanup_queue() vs memunmap(). The crash signature is of the form: BUG: unable to handle page fault for address: ffffc90080200000 CPU: 36 PID: 9606 Comm: systemd-udevd Call Trace: ? pmem_do_bvec+0xf9/0x3a0 ? xas_alloc+0x55/0xd0 pmem_rw_page+0x4b/0x80 bdev_read_page+0x86/0xb0 do_mpage_readpage+0x5d4/0x7a0 ? lru_cache_add+0xe/0x10 mpage_readpages+0xf9/0x1c0 ? bd_link_disk_holder+0x1a0/0x1a0 blkdev_readpages+0x1d/0x20 read_pages+0x67/0x1a0 ndctl Call Trace in vmcore: PID: 23473 TASK: ffff88c4fbbe8000 CPU: 1 COMMAND: "ndctl" __schedule schedule blk_mq_freeze_queue_wait blk_freeze_queue blk_cleanup_queue pmem_release_queue devm_action_release release_nodes devres_release_all device_release_driver_internal device_driver_detach unbind_store Cc: <stable@vger.kernel.org> Signed-off-by: sumiyawang <sumiyawang@tencent.com> Reviewed-by: yongduan <yongduan@tencent.com> Link: https://lore.kernel.org/r/1629632949-14749-1-git-send-email-sumiyawang@tencent.com Fixes: 50f44ee7248a ("mm/devm_memremap_pages: fix final page put race") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09libnvdimm/pmem: Fix blk_cleanup_disk() usageDan Williams
The queue_to_disk() helper can not be used after del_gendisk() communicate @disk via the pgmap->owner. Otherwise, queue_to_disk() returns NULL resulting in the splat below. Kernel attempted to read user page (330) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000330 Faulting instruction address: 0xc000000000906344 Oops: Kernel access of bad area, sig: 11 [#1] [..] NIP [c000000000906344] pmem_pagemap_cleanup+0x24/0x40 LR [c0000000004701d4] memunmap_pages+0x1b4/0x4b0 Call Trace: [c000000022cbb9c0] [c0000000009063c8] pmem_pagemap_kill+0x28/0x40 (unreliable) [c000000022cbb9e0] [c0000000004701d4] memunmap_pages+0x1b4/0x4b0 [c000000022cbba90] [c0000000008b28a0] devm_action_release+0x30/0x50 [c000000022cbbab0] [c0000000008b39c8] release_nodes+0x2f8/0x3e0 [c000000022cbbb60] [c0000000008ac440] device_release_driver_internal+0x190/0x2b0 [c000000022cbbba0] [c0000000008a8450] unbind_store+0x130/0x170 Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Fixes: 87eb73b2ca7c ("nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk") Link: http://lore.kernel.org/r/DFB75BA8-603F-4A35-880B-C5B23EF8FA7D@linux.vnet.ibm.com Cc: Christoph Hellwig <hch@lst.de> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/162310994435.1571616.334551212901820961.stgit@dwillia2-desk3.amr.corp.intel.com [axboe: fold in compile warning fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01nvme-multipath: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig
Convert the nvme-multipath driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig
Convert the nvdimm-pmem driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01block: automatically enable GENHD_FL_EXT_DEVTChristoph Hellwig
Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated without an explicit number of minors. This is what all new block drivers should do, so make sure it is the default without boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-06include: remove pagemap.h from blkdev.hMatthew Wilcox (Oracle)
My UEK-derived config has 1030 files depending on pagemap.h before this change. Afterwards, just 326 files need to be rebuilt when I touch pagemap.h. I think blkdev.h is probably included too widely, but untangling that dependency is harder and this solves my problem. x86 allmodconfig builds, but there may be implicit include problems on other architectures. Link: https://lkml.kernel.org/r/20210309195747.283796-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> [nvdimm] Acked-by: Jens Axboe <axboe@kernel.dk> [block] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Martin K. Petersen <martin.petersen@oracle.com> [scsi] Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-08libnvdimm: Notify disk drivers to revalidate region read-onlyDan Williams
Previous kernels allowed the BLKROSET to override the disk's read-only status. With that situation fixed the pmem driver needs to rely on notification events to reevaluate the disk read-only status after the host region has been marked read-write. Recall that when libnvdimm determines that the persistent memory has lost persistence (for example lack of energy to flush from DRAM to FLASH on an NVDIMM-N device) it marks the region read-only, but that state can be overridden by the user via: echo 0 > /sys/bus/nd/devices/regionX/read_only ...to date there is no notification that the region has restored persistence, so the user override is the only recovery. Fixes: 52f019d43c22 ("block: add a hard-readonly flag to struct gendisk") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/161534060720.528671.2341213328968989192.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-24Merge tag 'libnvdimm-for-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and device-dax updates from Dan Williams: - Fix the error code polarity for the device-dax/mapping attribute - For the device-dax and libnvdimm bus implementations stop implementing a useless return code for the remove() callback. - Miscellaneous cleanups * tag 'libnvdimm-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax-device: Make remove callback return void device-dax: Drop an empty .remove callback device-dax: Fix error path in dax_driver_register device-dax: Properly handle drivers without remove callback device-dax: Prevent registering drivers without probe callback libnvdimm: Make remove callback return void libnvdimm/dimm: Simplify nvdimm_remove() device-dax: Fix default return code of range_parse()
2021-02-21Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-16libnvdimm: Make remove callback return voidUwe Kleine-König
All drivers return 0 in their remove callback and the driver core ignores the return value of nvdimm_bus_remove() anyhow. So simplify by changing the driver remove callback to return void and return 0 unconditionally to the upper layer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210212171043.2136580-2-u.kleine-koenig@pengutronix.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-01-24block: store a block_device pointer in struct bioChristoph Hellwig
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-11libnvdimm/pmem: Remove unused headerJianpeng Ma
'commit a8b456d01cd6 ("bdi: remove BDI_CAP_SYNCHRONOUS_IO")' forgot remove the related header file. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20201229002635.42555-1-jianpeng.ma@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-10-13mm/memremap_pages: support multiple ranges per invocationDan Williams
In support of device-dax growing the ability to front physically dis-contiguous ranges of memory, update devm_memremap_pages() to track multiple ranges with a single reference counter and devm instance. Convert all [devm_]memremap_pages() users to specify the number of ranges they are mapping in their 'struct dev_pagemap' instance. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.co Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13mm/memremap_pages: convert to 'struct range'Dan Williams
The 'struct resource' in 'struct dev_pagemap' is only used for holding resource span information. The other fields, 'name', 'flags', 'desc', 'parent', 'sibling', and 'child' are all unused wasted space. This is in preparation for introducing a multi-range extension of devm_memremap_pages(). The bulk of this change is unwinding all the places internal to libnvdimm that used 'struct resource' unnecessarily, and replacing instances of 'struct dev_pagemap'.res with 'struct dev_pagemap'.range. P2PDMA had a minor usage of the resource flags field, but only to report failures with "%pR". That is replaced with an open coded print of the range. [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()] Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen] Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...
2020-10-06x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()Dan Williams
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-09-24bdi: remove BDI_CAP_SYNCHRONOUS_IOChristoph Hellwig
BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to decided if ->rw_page can be used on a block device. Just check up for the method instead. The only complication is that zram needs a second set of block_device_operations as it can switch between modes that actually support ->rw_page and those who don't. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02nvdimm: simplify revalidate_disk handlingChristoph Hellwig
The nvdimm block driver abuse revalidate_disk in a strange way, and totally unrelated to what other drivers do. Simplify this by just calling nvdimm_revalidate_disk (which seems rather misnamed) from the probe routines, as the additional bdev size revalidation is pointless at this point, and remove the revalidate_disk methods given that it can only be triggered from add_disk, which is right before the manual calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-14mm: add thp_sizeMatthew Wilcox (Oracle)
This function returns the number of bytes in a THP. It is like page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE is disabled. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-01block: move ->make_request_fn to struct block_device_operationsChristoph Hellwig
The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-13Merge tag 'libnvdimm-for-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Small collection of cleanups to rework usage of ->queuedata and the GUID api" * tag 'libnvdimm-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/pmem: stop using ->queuedata nvdimm/btt: stop using ->queuedata nvdimm/blk: stop using ->queuedata libnvdimm: Replace guid_copy() with import_guid() where it makes sense
2020-06-08asm-generic: don't include <linux/mm.h> in cacheflush.hChristoph Hellwig
This seems to lead to some crazy include loops when using asm-generic/cacheflush.h on more architectures, so leave it to the arch header for now. [hch@lst.de: fix warning] Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <will@kernel.org> Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-27nvdimm: use bio_{start,end}_io_acctChristoph Hellwig
Switch dm to use the nicer bio accounting helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-13nvdimm/pmem: stop using ->queuedataChristoph Hellwig
In preparation for removing queuedata as an argument to make_request_fn() drop the dependency ->queuedata. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200508161517.252308-16-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-04-08Merge tag 'libnvdimm-for-5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "There were multiple touches outside of drivers/nvdimm/ this round to add cross arch compatibility to the devm_memremap_pages() interface, enhance numa information for persistent memory ranges, and add a zero_page_range() dax operation. This cycle I switched from the patchwork api to Konstantin's b4 script for collecting tags (from x86, PowerPC, filesystem, and device-mapper folks), and everything looks to have gone ok there. This has all appeared in -next with no reported issues. Summary: - Add support for region alignment configuration and enforcement to fix compatibility across architectures and PowerPC page size configurations. - Introduce 'zero_page_range' as a dax operation. This facilitates filesystem-dax operation without a block-device. - Introduce phys_to_target_node() to facilitate drivers that want to know resulting numa node if a given reserved address range was onlined. - Advertise a persistence-domain for of_pmem and papr_scm. The persistence domain indicates where cpu-store cycles need to reach in the platform-memory subsystem before the platform will consider them power-fail protected. - Promote numa_map_to_online_node() to a cross-kernel generic facility. - Save x86 numa information to allow for node-id lookups for reserved memory ranges, deploy that capability for the e820-pmem driver. - Pick up some miscellaneous minor fixes, that missed v5.6-final, including a some smatch reports in the ioctl path and some unit test compilation fixups. - Fixup some flexible-array declarations" * tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits) dax: Move mandatory ->zero_page_range() check in alloc_dax() dax,iomap: Add helper dax_iomap_zero() to zero a range dax: Use new dax zero page method for zeroing a page dm,dax: Add dax zero_page_range operation s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver dax, pmem: Add a dax operation zero_page_range pmem: Add functions for reading/writing page to/from pmem libnvdimm: Update persistence domain value for of_pmem and papr_scm device tools/test/nvdimm: Fix out of tree build libnvdimm/region: Fix build error libnvdimm/region: Replace zero-length array with flexible-array member libnvdimm/label: Replace zero-length array with flexible-array member ACPI: NFIT: Replace zero-length array with flexible-array member libnvdimm/region: Introduce an 'align' attribute libnvdimm/region: Introduce NDD_LABELING libnvdimm/namespace: Enforce memremap_compat_align() libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid libnvdimm: Out of bounds read in __nd_ioctl() acpi/nfit: improve bounds checking for 'func' mm/memremap_pages: Introduce memremap_compat_align() ...
2020-04-02dax: Move mandatory ->zero_page_range() check in alloc_dax()Vivek Goyal
zero_page_range() dax operation is mandatory for dax devices. Right now that check happens in dax_zero_page_range() function. Dan thinks that's too late and its better to do the check earlier in alloc_dax(). I also modified alloc_dax() to return pointer with error code in it in case of failure. Right now it returns NULL and caller assumes failure happened due to -ENOMEM. But with this ->zero_page_range() check, I need to return -EINVAL instead. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-04-02dax, pmem: Add a dax operation zero_page_rangeVivek Goyal
Add a dax operation zero_page_range, to zero a page. This will also clear any known poison in the page being zeroed. As of now, zeroing of one page is allowed in a single call. There are no callers which are trying to zero more than a page in a single call. Once we grow the callers which zero more than a page in single call, we can add that support. Primary reason for not doing that yet is that this will add little complexity in dm implementation where a range might be spanning multiple underlying targets and one will have to split the range into multiple sub ranges and call zero_page_range() on individual targets. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: https://lore.kernel.org/r/20200228163456.1587-3-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-04-02pmem: Add functions for reading/writing page to/from pmemVivek Goyal
This splits pmem_do_bvec() into pmem_do_read() and pmem_do_write(). pmem_do_write() will be used by pmem zero_page_range() as well. Hence sharing the same code. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: https://lore.kernel.org/r/20200228163456.1587-2-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-03-27block: simplify queue allocationChristoph Hellwig
Current make_request based drivers use either blk_alloc_queue_node or blk_alloc_queue to allocate a queue, and then set up the make_request_fn function pointer and a few parameters using the blk_queue_make_request helper. Simplify this by passing the make_request pointer to blk_alloc_queue, and while at it merge the _node variant into the main helper by always passing a node_id, and remove the superfluous gfp_mask parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-31mm: Cleanup __put_devmap_managed_page() vs ->page_free()Dan Williams
After the removal of the device-public infrastructure there are only 2 ->page_free() call backs in the kernel. One of those is a device-private callback in the nouveau driver, the other is a generic wakeup needed in the DAX case. In the hopes that all ->page_free() callbacks can be migrated to common core kernel functionality, move the device-private specific actions in __put_devmap_managed_page() under the is_device_private_page() conditional, including the ->page_free() callback. For the other page types just open-code the generic wakeup. Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA case. Link: http://lkml.kernel.org/r/20200107224558.2362728-4-jhubbard@nvidia.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-17libnvdimm/pmem: Delete include of nd-core.hDan Williams
The entire point of nd-core.h is to hide functionality that no leaf driver should touch. In fact, the commit that added it had no need to include it. Fixes: 06e8ccdab15f ("acpi: nfit: Add support for detect platform...") Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>