Age | Commit message (Collapse) | Author |
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
Pull block updates from Jens Axboe:
- ublk updates:
- Add support for updating the size of a ublk instance
- Zero-copy improvements
- Auto-registering of buffers for zero-copy
- Series simplifying and improving GET_DATA and request lookup
- Series adding quiesce support
- Lots of selftests additions
- Various cleanups
- NVMe updates via Christoph:
- add per-node DMA pools and use them for PRP/SGL allocations
(Caleb Sander Mateos, Keith Busch)
- nvme-fcloop refcounting fixes (Daniel Wagner)
- support delayed removal of the multipath node and optionally
support the multipath node for private namespaces (Nilay Shroff)
- support shared CQs in the PCI endpoint target code (Wilfred
Mallawa)
- support admin-queue only authentication (Hannes Reinecke)
- use the crc32c library instead of the crypto API (Eric Biggers)
- misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
Reinecke, Leon Romanovsky, Gustavo A. R. Silva)
- MD updates via Yu:
- Fix that normal IO can be starved by sync IO, found by mkfs on
newly created large raid5, with some clean up patches for bdev
inflight counters
- Clean up brd, getting rid of atomic kmaps and bvec poking
- Add loop driver specifically for zoned IO testing
- Eliminate blk-rq-qos calls with a static key, if not enabled
- Improve hctx locking for when a plug has IO for multiple queues
pending
- Remove block layer bouncing support, which in turn means we can
remove the per-node bounce stat as well
- Improve blk-throttle support
- Improve delay support for blk-throttle
- Improve brd discard support
- Unify IO scheduler switching. This should also fix a bunch of lockdep
warnings we've been seeing, after enabling lockdep support for queue
freezing/unfreezeing
- Add support for block write streams via FDP (flexible data placement)
on NVMe
- Add a bunch of block helpers, facilitating the removal of a bunch of
duplicated boilerplate code
- Remove obsolete BLK_MQ pci and virtio Kconfig options
- Add atomic/untorn write support to blktrace
- Various little cleanups and fixes
* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
selftests: ublk: add test for UBLK_F_QUIESCE
ublk: add feature UBLK_F_QUIESCE
selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
traceevent/block: Add REQ_ATOMIC flag to block trace events
ublk: run auto buf unregisgering in same io_ring_ctx with registering
io_uring: add helper io_uring_cmd_ctx_handle()
ublk: remove io argument from ublk_auto_buf_reg_fallback()
ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
selftests: ublk: support UBLK_F_AUTO_BUF_REG
ublk: support UBLK_AUTO_BUF_REG_FALLBACK
ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
ublk: prepare for supporting to register request buffer automatically
ublk: convert to refcount_t
selftests: ublk: make IO & device removal test more stressful
nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
nvme: introduce multipath_always_on module param
nvme-multipath: introduce delayed removal of the multipath head node
nvme-pci: derive and better document max segments limits
nvme-pci: use struct_size for allocation struct nvme_dev
...
|
|
Convert the __bio_add_page(..., virt_to_page(), ...) pattern to the
bio_add_virt_nofail helper implementing it, and do the same for the
similar pattern using bio_add_page for adding the first segment after
a bio allocation as that can't fail either.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250507120451.4000627-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we use the 'B' mode and we have an invalit table line,
cancel_delayed_work_sync would trigger a warning. This commit avoids the
warning.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- dm-crypt: switch to using the crc32 library
- dm-verity, dm-integrity, dm-crypt: documentation improvement
- dm-vdo fixes
- dm-stripe: enable inline crypto passthrough
- dm-integrity: set ti->error on memory allocation failure
- dm-bufio: remove unused return value
- dm-verity: do forward error correction on metadata I/O errors
- dm: fix unconditional IO throttle caused by REQ_PREFLUSH
- dm cache: prevent BUG_ON by blocking retries on failed device resumes
- dm cache: support shrinking the origin device
- dm: restrict dm device size to 2^63-512 bytes
- dm-delay: support zoned devices
- dm-verity: support block number limits for different ioprio classes
- dm-integrity: fix non-constant-time tag verification (security bug)
- dm-verity, dm-ebs: fix prefetch-vs-suspend race
* tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits)
dm-ebs: fix prefetch-vs-suspend race
dm-verity: fix prefetch-vs-suspend race
dm-integrity: fix non-constant-time tag verification
dm-verity: support block number limits for different ioprio classes
dm-delay: support zoned devices
dm: restrict dm device size to 2^63-512 bytes
dm cache: support shrinking the origin device
dm cache: prevent BUG_ON by blocking retries on failed device resumes
dm vdo indexer: reorder uds_request to reduce padding
dm: fix unconditional IO throttle caused by REQ_PREFLUSH
dm vdo: rework processing of loaded refcount byte arrays
dm vdo: remove remaining ring references
dm-verity: do forward error correction on metadata I/O errors
dm-bufio: remove unused return value
dm-integrity: set ti->error on memory allocation failure
dm: Enable inline crypto passthrough for striped target
dm vdo slab-depot: read refcount blocks in large chunks at load time
dm vdo vio-pool: allow variable-sized metadata vios
dm vdo vio-pool: support pools with multiple data blocks per vio
dm vdo vio-pool: add a pool pointer to pooled_vio
...
|
|
When using dm-integrity in standalone mode with a keyed hmac algorithm,
integrity tags are calculated and verified internally.
Using plain memcmp to compare the stored and computed tags may leak the
position of the first byte mismatch through side-channel analysis,
allowing to brute-force expected tags in linear time (e.g., by counting
single-stepping interrupts in confidential virtual machine environments).
Co-developed-by: Luca Wilke <work@luca-wilke.com>
Signed-off-by: Luca Wilke <work@luca-wilke.com>
Signed-off-by: Jo Van Bulck <jo.vanbulck@cs.kuleuven.be>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
|
|
Pull block updates from Jens Axboe:
- Fixes for integrity handling
- NVMe pull request via Keith:
- Secure concatenation for TCP transport (Hannes)
- Multipath sysfs visibility (Nilay)
- Various cleanups (Qasim, Baruch, Wang, Chen, Mike, Damien, Li)
- Correct use of 64-bit BARs for pci-epf target (Niklas)
- Socket fix for selinux when used in containers (Peijie)
- MD pull request via Yu:
- fix recovery can preempt resync (Li Nan)
- fix md-bitmap IO limit (Su Yue)
- fix raid10 discard with REQ_NOWAIT (Xiao Ni)
- fix raid1 memory leak (Zheng Qixing)
- fix mddev uaf (Yu Kuai)
- fix raid1,raid10 IO flags (Yu Kuai)
- some refactor and cleanup (Yu Kuai)
- Series cleaning up and fixing bugs in the bad block handling code
- Improve support for write failure simulation in null_blk
- Various lock ordering fixes
- Fixes for locking for debugfs attributes
- Various ublk related fixes and improvements
- Cleanups for blk-rq-qos wait handling
- blk-throttle fixes
- Fixes for loop dio and sync handling
- Fixes and cleanups for the auto-PI code
- Block side support for hardware encryption keys in blk-crypto
- Various cleanups and fixes
* tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux: (105 commits)
nvmet: replace max(a, min(b, c)) by clamp(val, lo, hi)
nvme-tcp: fix selinux denied when calling sock_sendmsg
nvmet: pci-epf: Always configure BAR0 as 64-bit
nvmet: Remove duplicate uuid_copy
nvme: zns: Simplify nvme_zone_parse_entry()
nvmet: pci-epf: Remove redundant 'flush_workqueue()' calls
nvmet-fc: Remove unused functions
nvme-pci: remove stale comment
nvme-fc: Utilise min3() to simplify queue count calculation
nvme-multipath: Add visibility for queue-depth io-policy
nvme-multipath: Add visibility for numa io-policy
nvme-multipath: Add visibility for round-robin io-policy
nvmet: add tls_concat and tls_key debugfs entries
nvmet-tcp: support secure channel concatenation
nvmet: Add 'sq' argument to alloc_ctrl_args
nvme-fabrics: reset admin connection for secure concatenation
nvme-tcp: request secure channel concatenation
nvme-keyring: add nvme_tls_psk_refresh()
nvme: add nvme_auth_derive_tls_psk()
nvme: add nvme_auth_generate_digest()
...
|
|
Many of the fields in struct bio_integrity_payload are only needed for
the default integrity buffer in the block layer, and the variable
sized array at the end of the structure makes it very hard to embed
into caller allocated structures.
Reduce struct bio_integrity_payload to the minimal structure needed in
common code and create two separate containing structures for the
automatically generated payload and the caller allocated payload.
The latter is a simple wrapper for struct bio_integrity_payload and
the bvecs, while the former contains the additional fields moved out
of struct bio_integrity_payload.
Always use a dedicated mempool for automatic integrity metadata
instead of depending on bio_set that is submitter controlled and thus
often doesn't have the mempool initialized and stop using mempools for
the submitter buffers as they aren't in the NOIO I/O submission path
where we need to guarantee forward progress.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20250225154449.422989-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The dm-integrity target didn't set the error string when memory
allocation failed. This patch fixes it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
|
|
The Inline mode does not use a journal; it makes no sense to print
journal information in DM table. Print it only if the journal is used.
The same applies to interleave_sectors (unused for Inline mode).
Also, add comments for arg_count, as the current calculation
is quite obscure.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
In Inline mode, the journal is unused, and journal_sectors is zero.
Calculating the journal watermark requires dividing by journal_sectors,
which should be done only if the journal is configured.
Otherwise, a simple table query (dmsetup table) can cause OOPS.
This bug did not show on some systems, perhaps only due to
compiler optimization.
On my 32-bit testing machine, this reliably crashes with the following:
: Oops: divide error: 0000 [#1] PREEMPT SMP
: CPU: 0 UID: 0 PID: 2450 Comm: dmsetup Not tainted 6.14.0-rc2+ #959
: EIP: dm_integrity_status+0x2f8/0xab0 [dm_integrity]
...
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: fb0987682c62 ("dm-integrity: introduce the Inline mode")
Cc: stable@vger.kernel.org # 6.11+
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- Misc VDO fixes
- Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio()
- Dm-delay: Improve kernel documentation
- Dm-crypt: Allow to specify the integrity key size as an option
- Dm-bufio: Remove pointless NULL check
- Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR;
use __assign_bit
- Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix
smatch warning
- Dm-integrity: Support recalculation in the 'I' mode
- Revert "dm: requeue IO if mapping table not yet available"
- Dm-crypt: Small refactoring to make the code more readable
- Dm-cache: Remove pointless error check
- Dm: Fix spelling errors
- Dm-verity: Restart or panic on an I/O error if restart or panic was
requested
- Dm-verity: Fallback to platform keyring also if key in trusted
keyring is rejected
* tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
dm verity: fallback to platform keyring also if key in trusted keyring is rejected
dm-verity: restart or panic on an I/O error
dm: fix spelling errors
dm-cache: remove pointless error check
dm vdo: handle unaligned discards correctly
dm vdo indexer: Convert comma to semicolon
dm-crypt: Use common error handling code in crypt_set_keyring_key()
dm-crypt: Use up_read() together with key_put() only once in crypt_set_keyring_key()
Revert "dm: requeue IO if mapping table not yet available"
dm-integrity: check mac_size against HASH_MAX_DIGESTSIZE in sb_mac()
dm-integrity: support recalculation in the 'I' mode
dm integrity: Convert comma to semicolon
dm integrity: fix gcc 5 warning
dm: Make use of __assign_bit() API
dm integrity: Remove extra unlikely helper
dm: Convert to use ERR_CAST()
dm bufio: Remove NULL check of list_entry()
dm-crypt: Allow to specify the integrity key size as option
dm: Remove unused declaration and empty definition "dm_zone_map_bio"
dm delay: enhance kernel documentation
...
|
|
sb_mac() verifies that the superblock + MAC don't exceed 512 bytes.
Because the superblock is currently 64 bytes, this really verifies
mac_size <= 448. This confuses smatch into thinking that mac_size may
be as large as 448, which is inconsistent with the later code that
assumes the MAC fits in a buffer of size HASH_MAX_DIGESTSIZE (64).
In fact mac_size <= HASH_MAX_DIGESTSIZE is guaranteed by the crypto API,
as that is the whole point of HASH_MAX_DIGESTSIZE. But, let's be
defensive and explicitly check for this. This suppresses the false
positive smatch warning. It does not fix an actual bug.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202409061401.44rtN1bh-lkp@intel.com/
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
There's a race condition when accessing the variable
ic->sb->recalc_sector. The function integrity_recalc writes to this
variable when it makes some progress and the function
dm_integrity_map_continue may read this variable concurrently.
One problem is that on 32-bit architectures the 64-bit variable is not
read and written atomically - it may be possible to read garbage if read
races with write.
Another problem is that memory accesses to this variable are not guarded
with memory barriers.
This commit fixes the race - it moves reading ic->sb->recalc_sector to an
earlier place where we hold &ic->endio_wait.lock.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
|
|
In the kernel 6.11, dm-integrity was enhanced with an inline ('I') mode.
This mode uses devices with non-power-of-2 sector size. The extra
metadata after each sector are used to hold the integrity hash.
This commit enhances the inline mode, so that there is automatic
recalculation of the integrity hashes when the 'reclaculate' parameter is
used. It allows us to activate the device instantly, and the
recalculation is done on background.
If the device is deactivated while recalculation is in progress, it will
remember the point where it stopped and it will continue from this point
when activated again.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
This commit fixes gcc 5 warning "logical not is only applied to the left
hand side of comparison"
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: fb0987682c62 ("dm-integrity: introduce the Inline mode")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
In IS_ERR, the unlikely is used for the input parameter,
so these is no need to use it again outside.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant
expressions in VM code") added the simpler MIN_T/MAX_T macros in order
to avoid some excessive expansion from the rather complicated regular
min/max macros.
The complexity of those macros stems from two issues:
(a) trying to use them in situations that require a C constant
expression (in static initializers and for array sizes)
(b) the type sanity checking
and MIN_T/MAX_T avoids both of these issues.
Now, in the whole (long) discussion about all this, it was pointed out
that the whole type sanity checking is entirely unnecessary for
min_t/max_t which get a fixed type that the comparison is done in.
But that still leaves min_t/max_t unnecessarily complicated due to
worries about the C constant expression case.
However, it turns out that there really aren't very many cases that use
min_t/max_t for this, and we can just force-convert those.
This does exactly that.
Which in turn will then allow for much simpler implementations of
min_t()/max_t(). All the usual "macros in all upper case will evaluate
the arguments multiple times" rules apply.
We should do all the same things for the regular min/max() vs MIN/MAX()
cases, but that has the added complexity of various drivers defining
their own local versions of MIN/MAX, so that needs another level of
fixes first.
Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This commit introduces a new 'I' mode for dm-integrity.
The 'I' mode may be selected if the underlying device has non-power-of-2
sector size. In this mode, dm-integrity will store integrity data
directly in device's sectors and it will not use journal.
This mode improves performance and reduces flash wear because there would
be no journal writes.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Remove use of the blk_limits_io_{min,opt} and assign the values directly
to the queue_limits structure. For the io_opt this is a completely
mechanical change, for io_min it removes flooring the limit to the
physical and logical block size in the particular caller. But as
blk_validate_limits will do the same later when actually applying the
limits, there still is no change in overall behavior.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
Move the integrity information into the queue limits so that it can be
set atomically with other queue limits, and that the sysfs changes to
the read_verify and write_generate flags are properly synchronized.
This also allows to provide a more useful helper to stack the integrity
fields, although it still is separate from the main stacking function
as not all stackable devices want to inherit the integrity settings.
Even with that it greatly simplifies the code in md and dm.
Note that the integrity field is moved as-is into the queue limits.
While there are good arguments for removing the separate blk_integrity
structure, this would cause a lot of churn and might better be done at a
later time if desired. However the integrity field in the queue_limits
structure is now unconditional so that various ifdefs can be avoided or
replaced with IS_ENABLED(). Given that tiny size of it that seems like
a worthwhile trade off.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240613084839.1044015-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the block layer built-in nop profile instead of duplicating it.
Tested by:
$ dd if=/dev/urandom of=key.bin bs=512 count=1
$ cryptsetup luksFormat -q --type luks2 --integrity hmac-sha256 \
--integrity-no-wipe /dev/nvme0n1 key.bin
$ cryptsetup luksOpen /dev/nvme0n1 luks-integrity --key-file key.bin
and then doing mkfs.xfs and simple I/O on the mount file system.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240613084839.1044015-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
dm-integrity could set discard_granularity lower than the logical block
size. This could result in failures when sending discard requests to
dm-integrity.
This fix is needed for kernels prior to 6.10.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Eric Wheeler <linux-integrity@lists.ewheeler.net>
Cc: stable@vger.kernel.org # <= 6.9
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Depending on the value of CONFIG_HZ, clang complains about a pointless
comparison:
drivers/md/dm-integrity.c:4085:12: error: result of comparison of
constant 42949672950 with expression of type
'unsigned int' is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
if (val >= (uint64_t)UINT_MAX * 1000 / HZ) {
As the check remains useful for other configurations, shut up the
warning by adding a second type cast to uint64_t.
Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
It is possible to set up dm-integrity with smaller sector size than
the logical sector size of the underlying device. In this situation,
dm-integrity guarantees that the outgoing bios have the same alignment as
incoming bios (so, if you create a filesystem with 4k block size,
dm-integrity would send 4k-aligned bios to the underlying device).
This guarantee was broken when integrity_recheck was implemented.
integrity_recheck sends bio that is aligned to ic->sectors_per_block. So
if we set up integrity with 512-byte sector size on a device with logical
block size 4k, we would be sending unaligned bio. This triggered a bug in
one of our internal tests.
This commit fixes it by determining the actual alignment of the
incoming bio and then makes sure that the outgoing bio in
integrity_recheck has the same alignment.
Fixes: c88f5e553fe3 ("dm-integrity: recheck the integrity tag after a failure")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Memory for the "checksums" pointer will leak if the data is rechecked
after checksum failure (because the associated kfree won't happen due
to 'goto skip_io').
Fix this by freeing the checksums memory before recheck, and just use
the "checksum_onstack" memory for storing checksum during recheck.
Fixes: c88f5e553fe3 ("dm-integrity: recheck the integrity tag after a failure")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Fix DM core's IO submission (which include dm-io and dm-bufio) such
that a bio's IO priority is propagated. Work focused on enabling both
DM crypt and verity targets to retain the appropriate IO priority
- Fix DM raid reshape logic to not allow an empty flush bio to be
requeued due to false concern about the bio, which doesn't have a
data payload, accessing beyond the end of the device
- Fix DM core's internal resume so that it properly calls both presume
and resume methods, which fixes the potential for a postsuspend and
resume imbalance
- Update DM verity target to set DM_TARGET_SINGLETON flag because it
doesn't make sense to have a DM table with a mix of targets that
include dm-verity
- Small cleanups in DM crypt, thin, and integrity targets
- Fix references to dm-devel mailing list to use latest list address
* tag 'for-6.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: call the resume method on internal suspend
dm raid: fix false positive for requeue needed during reshape
dm-integrity: set max_integrity_segments in dm_integrity_io_hints
dm: update relevant MODULE_AUTHOR entries to latest dm-devel mailing list
dm ioctl: update DM_DRIVER_EMAIL to new dm-devel mailing list
dm verity: set DM_TARGET_SINGLETON feature flag
dm crypt: Fix IO priority lost when queuing write bios
dm verity: Fix IO priority lost when reading FEC and hash
dm bufio: Support IO priority
dm io: Support IO priority
dm crypt: remove redundant state settings after waking up
dm thin: add braces around conditional code that spans lines
|
|
Set max_integrity_segments with the other queue limits instead
of updating it later. This also uncovered that the driver is trying
to set the limit to UINT_MAX while max_integrity_segments is an
unsigned short, so fix it up to use USHRT_MAX instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
The newly added integrity_recheck() function has another larger stack
allocation, just like its caller integrity_metadata(). When it gets
inlined, the combination of the two exceeds the warning limit for 32-bit
architectures and possibly risks an overflow when this is called from
a deep call chain through a file system:
drivers/md/dm-integrity.c:1767:13: error: stack frame size (1048) exceeds limit (1024) in 'integrity_metadata' [-Werror,-Wframe-larger-than]
1767 | static void integrity_metadata(struct work_struct *w)
Since the caller at this point is done using its checksum buffer,
just reuse the same buffer in the new function to avoid the double
allocation.
[Mikulas: add "noinline" to integrity_recheck and verity_recheck.
These functions are only called on error, so they shouldn't bloat the
stack frame or code size of the caller.]
Fixes: c88f5e553fe3 ("dm-integrity: recheck the integrity tag after a failure")
Fixes: 9177f3c0dea6 ("dm-verity: recheck the hash after a failure")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Some IO will dispatch from kworker with different io_context settings
than the submitting task, we may need to specify a priority to avoid
losing priority.
Add IO priority parameter to dm_io() and update all callers.
Co-developed-by: Yibin Ding <yibin.ding@unisoc.com>
Signed-off-by: Yibin Ding <yibin.ding@unisoc.com>
Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
If a userspace process reads (with O_DIRECT) multiple blocks into the same
buffer, dm-integrity reports an error [1]. The error is reported in a log
and it may cause RAID leg being kicked out of the array.
This commit fixes dm-integrity, so that if integrity verification fails,
the data is read again into a kernel buffer (where userspace can't modify
it) and the integrity tag is rechecked. If the recheck succeeds, the
content of the kernel buffer is copied into the user buffer; if the
recheck fails, an integrity error is reported.
[1] https://people.redhat.com/~mpatocka/testcases/blk-auth-modify/read2.c
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
__bio_for_each_segment assumes that the first struct bio_vec argument
doesn't change - it calls "bio_advance_iter_single((bio), &(iter),
(bvl).bv_len)" to advance the iterator. Unfortunately, the dm-integrity
code changes the bio_vec with "bv.bv_len -= pos". When this code path
is taken, the iterator would be out of sync and dm-integrity would
report errors. This happens if the machine is out of memory and
"kmalloc" fails.
Fix this bug by making a copy of "bv" and changing the copy instead.
Fixes: 7eada909bfd7 ("dm: add integrity target")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Simplify sb_mac() by using crypto_shash_digest() instead of an
init+update+final sequence. This should also improve performance.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
If the statement "recalc_tags = kvmalloc(recalc_tags_size, GFP_NOIO);"
fails, we call "vfree(recalc_buffer)" and we jump to the label "oom".
If the condition "recalc_sectors >= 1U << ic->sb->log2_sectors_per_block"
is false, we jump to the label "free_ret" and call "vfree(recalc_buffer)"
again, on an already released memory block.
Fix the bug by setting "recalc_buffer = NULL" after freeing it.
Fixes: da8b4fc1f63a ("dm integrity: only allocate recalculate buffer when needed")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Update DM crypt to allocate compound pages if possible
- Fix DM crypt target's crypt_ctr_cipher_new return value on invalid
AEAD cipher
- Fix DM flakey testing target's write bio corruption feature to
corrupt the data of a cloned bio instead of the original
- Add random_read_corrupt and random_write_corrupt features to DM
flakey target
- Fix ABBA deadlock in DM thin metadata by resetting associated bufio
client rather than destroying and recreating it
- A couple other small DM thinp cleanups
- Update DM core to support disabling block core IO stats accounting
and optimize away code that isn't needed if stats are disabled
- Other small DM core cleanups
- Improve DM integrity target to not require so much memory on 32 bit
systems. Also only allocate the recalculate buffer as needed (and
increasingly reduce its size on allocation failure)
- Update DM integrity to use %*ph for printing hexdump of a small
buffer. Also update DM integrity documentation
- Various DM core ioctl interface hardening. Now more careful about
alignment of structures and processing of input passed to the kernel
from userspace.
Also disallow the creation of DM devices named "control", "." or ".."
- Eliminate GFP_NOIO workarounds for __vmalloc and kvmalloc in DM
core's ioctl and bufio code
* tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
dm: get rid of GFP_NOIO workarounds for __vmalloc and kvmalloc
dm integrity: scale down the recalculate buffer if memory allocation fails
dm integrity: only allocate recalculate buffer when needed
dm integrity: reduce vmalloc space footprint on 32-bit architectures
dm ioctl: Refuse to create device named "." or ".."
dm ioctl: Refuse to create device named "control"
dm ioctl: Avoid double-fetch of version
dm ioctl: structs and parameter strings must not overlap
dm ioctl: Avoid pointer arithmetic overflow
dm ioctl: Check dm_target_spec is sufficiently aligned
Documentation: dm-integrity: Document an example of how the tunables relate.
Documentation: dm-integrity: Document default values.
Documentation: dm-integrity: Document the meaning of "buffer".
Documentation: dm-integrity: Fix minor grammatical error.
dm integrity: Use %*ph for printing hexdump of a small buffer
dm thin: disable discards for thin-pool if no_discard_passdown
dm: remove stale/redundant dm_internal_{suspend,resume} prototypes in dm.h
dm: skip dm-stats work in alloc_io() unless needed
dm: avoid needless dm_io access if all IO accounting is disabled
dm: support turning off block-core's io stats accounting
...
|
|
If memory allocation fails, try to reduce the size of the recalculate
buffer and continue with that smaller buffer.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
dm-integrity preallocated 8MiB buffer for recalculating in the
constructor and freed it in the destructor. This wastes memory when
the user has many dm-integrity devices.
Fix dm-integrity so that the buffer is only allocated when
recalculation is in progress; allocate the buffer at the beginning of
integrity_recalc() and free it at the end.
Note that integrity_recalc() doesn't hold any locks when allocating
the buffer, so it shouldn't cause low-memory deadlock.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
It was reported that dm-integrity runs out of vmalloc space on 32-bit
architectures. On x86, there is only 128MiB vmalloc space and dm-integrity
consumes it quickly because it has a 64MiB journal and 8MiB recalculate
buffer.
Fix this by reducing the size of the journal to 4MiB and the size of
the recalculate buffer to 1MiB, so that multiple dm-integrity devices
can be created and activated on 32-bit architectures.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
The kernel already has a helper to print a hexdump of a small
buffer via pointer extension. Use that instead of open coded
variant.
In long term it helps to kill pr_cont() or at least narrow down
its use.
Note, the format is slightly changed, i.e. the trailing space is
always printed. Also the IV dump is limited by 64 bytes which seems
fine.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
BACKGROUND
==========
When multiple work items are queued to a workqueue, their execution order
doesn't match the queueing order. They may get executed in any order and
simultaneously. When fully serialized execution - one by one in the queueing
order - is needed, an ordered workqueue should be used which can be created
with alloc_ordered_workqueue().
However, alloc_ordered_workqueue() was a later addition. Before it, an
ordered workqueue could be obtained by creating an UNBOUND workqueue with
@max_active==1. This originally was an implementation side-effect which was
broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered"). Because there were users that depended on the ordered execution,
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
made workqueue allocation path to implicitly promote UNBOUND workqueues w/
@max_active==1 to ordered workqueues.
While this has worked okay, overloading the UNBOUND allocation interface
this way creates other issues. It's difficult to tell whether a given
workqueue actually needs to be ordered and users that legitimately want a
min concurrency level wq unexpectedly gets an ordered one instead. With
planned UNBOUND workqueue updates to improve execution locality and more
prevalence of chiplet designs which can benefit from such improvements, this
isn't a state we wanna be in forever.
This patch series audits all callsites that create an UNBOUND workqueue w/
@max_active==1 and converts them to alloc_ordered_workqueue() as necessary.
WHAT TO LOOK FOR
================
The conversions are from
alloc_workqueue(WQ_UNBOUND | flags, 1, args..)
to
alloc_ordered_workqueue(flags, args...)
which don't cause any functional changes. If you know that fully ordered
execution is not necessary, please let me know. I'll drop the conversion and
instead add a comment noting the fact to reduce confusion while conversion
is in progress.
If you aren't fully sure, it's completely fine to let the conversion
through. The behavior will stay exactly the same and we can always
reconsider later.
As there are follow-up workqueue core changes, I'd really appreciate if the
patch can be routed through the workqueue tree w/ your acks. Thanks.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: dm-devel@redhat.com
Cc: linux-kernel@vger.kernel.org
|
|
Pointer variables of void * type do not require type cast.
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Simplifies each DM target's init method by making dm_register_target()
responsible for its error reporting (on behalf of targets).
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Otherwise the journal_io_cache will leak if dm_register_target() fails.
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Fix DM cache target to free background tracker work items, otherwise
slab BUG will occur when kmem_cache_destroy() is called.
- Improve 2 of DM's shrinker names to reflect their use.
- Fix the DM flakey target to not corrupt the zero page. Fix dm-flakey
on 32-bit hughmem systems by using bvec_kmap_local instead of
page_address. Also, fix logic used when imposing the
"corrupt_bio_byte" feature.
- Stop using WQ_UNBOUND for DM verity target's verify_wq because it
causes significant Android latencies on ARM64 (and doesn't show real
benefit on other architectures).
- Add negative check to catch simple case of a DM table referencing
itself. More complex scenarios that use intermediate devices to
self-reference still need to be avoided/handled in userspace.
- Fix DM core's resize to only send one uevent instead of two. This
fixes a race with udev, that if udev wins, will cause udev to miss
uevents (which caused premature unmount attempts by systemd).
- Add cond_resched() to workqueue functions in DM core, dn-thin and
dm-cache so that their loops aren't the cause of unintended cpu
scheduling fairness issues.
- Fix all of DM's checkpatch errors and warnings (famous last words).
Various other small cleanups.
* tag 'for-6.3/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
dm: remove unnecessary (void*) conversion in event_callback()
dm ioctl: remove unnecessary check when using dm_get_mdptr()
dm ioctl: assert _hash_lock is held in __hash_remove
dm cache: add cond_resched() to various workqueue loops
dm thin: add cond_resched() to various workqueue loops
dm: add cond_resched() to dm_wq_requeue_work()
dm: add cond_resched() to dm_wq_work()
dm sysfs: make kobj_type structure constant
dm: update targets using system workqueues to use a local workqueue
dm: remove flush_scheduled_work() during local_exit()
dm clone: prefer kvmalloc_array()
dm: declare variables static when sensible
dm: fix suspect indent whitespace
dm ioctl: prefer strscpy() instead of strlcpy()
dm: avoid void function return statements
dm integrity: change macros min/max() -> min_t/max_t where appropriate
dm: fix use of sizeof() macro
dm: avoid 'do {} while(0)' loop in single statement macros
dm log: avoid multiple line dereference
dm log: avoid trailing semicolon in macro
...
|
|
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|