bcachefs: bch2_set_rebalance_needs_scan_device() - bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kent Overstreet <kent.overstreet@linux.dev>	2025-08-23 20:05:08 -0400
committer	Kent Overstreet <kent.overstreet@linux.dev>	2025-09-25 17:12:47 -0400
commit	d1c50f4c95a1387baf0182f6f8619db1bdb4c4dd (patch)
tree	10803503d39e7dcd7910dc0b57f50d61d5310650 /tools/testing/selftests/uevent
parent	84a72c42183f6fb7de7ab5112bae6d5da081df7a (diff)

bcachefs: bch2_set_rebalance_needs_scan_device()bcachefs-testing

Rebalance can now evacuate devices in response to state changes. This obsoletes BCH_DATA_OP_migrate; setting a device to BCH_MEMBER_STATE_failed (perhaps we should rename this) will cause it to be evacuated (and the evacuate will resume if e.g. we crash or shutdown and restart). Additionally, we'll now be able to automatically evacuate failing devices. Currently we only set devices read-only in response to IO errors; we'll need to add configuration/policy/good heuristics (and clearly document them) for deciding when a device is failing and should be evacuated. This works with rebalance scan cookies; these are currently used to respond to filesystem/inode option changes. Cookies in the range of 1-4095 now refer to devices; when rebalance sees one of those it will walk backpointers on that device and update bch_extent_rebalance, which will react to the new device state (or durability setting change). Performance implications: with BCH_DATA_OP_migrate, we walk backpointers and do the data moves directly, meaning they happen in device LBA order. However, by walking backpointers to queue up rebalance work entries and then doing the work from the rebalance_work btree, we'll do the data moves in logical key order. Pro: doing data moves in logical key order will help with fragmentation/data locality: extents from the same inode will be moved at the same time, we'll get a bit of defragmentation and do better at keeping related data together Con: reads from the device being evacuated will no longer be sequential, this will hurt performance on spinning rust. Perhaps add a mode where we kick off data moves from do_rebalance_scan_bp()? Would be pretty easy XXX: slurp backpointers into a darray and sort before processing extents in do_rebalance_scan_device: we recently saw a very slow evacuate that was mostly just dropping cached data, on a huge filesystem entirely on spinning rust with only 8GB of ram in the server - the backpointers -> extents lookups are fairly random, batching + sorting will greatly improve performance XXX: add a superblock bit to make this transactional, if we crash between the write_super for the member state/durability change and creating the device scan cookie XXX: new_needs_rb_allowed should check for device scan cookies Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Diffstat (limited to 'tools/testing/selftests/uevent')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: