bcachefs: mark active journal devices on journal replicas gc

A simple device evacuate, remove, add test loop with concurrent shutdowns occasionally reproduces a problem where the filesystem fails to mount. The mount failure occurs because the filesystem was uncleanly shut down, yet no member device is marked for journal data in the superblock. An fsck detects the problem, restores the mark and allows the mount to proceed without further consistency issues. The reason for the lack of journal data marks is the gc mechanism invoked via bch2_journal_flush_device_pins() runs while the journal happens to be empty. This results in garbage collection of all journal replicas entries. Once the updated replicas table is written to the superblock, the filesystem is put in a transiently unrecoverable state until further journal data is written, because journal recovery expects to find at least one marked journal device whenever the filesystem is not otherwise marked clean (i.e. as on clean unmount). To fix this problem, update the journal replicas gc algorithm to always mark currently active journal replicas entries by writing to the journal. This ensures that only entries for devices that are no longer used for journaling are garbage collected, not just those that don't happen to currently hold journal data. This preserves the journal recovery invariant above and avoids putting the fs into a transiently unrecoverable state. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
author: Brian Foster <bfoster@redhat.com> 2023-06-30 10:51:46 -0400
committer: Kent Overstreet <kent.overstreet@linux.dev> 2023-10-22 17:10:05 -0400
commit: d14bfd1010c4ce8bede5bd98d0b332e3b34b8bd5 (patch)
tree: fe8b02904be8146b6383bedba5860379aec09a32 /fs/bcachefs/journal_reclaim.c
parent: a02a0121b3de81f985d6c751f1557c7aea832b9a (diff)
1 files changed, 13 insertions, 1 deletions
diff --git a/fs/bcachefs/journal_reclaim.c b/fs/bcachefs/journal_reclaim.c
index 2c7f8aca9319..5174b9497721 100644
--- a/fs/bcachefs/journal_reclaim.c
+++ b/fs/bcachefs/journal_reclaim.c
@@ -837,8 +837,20 @@ int bch2_journal_flush_device_pins(struct journal *j, int dev_idx)
 	mutex_lock(&c->replicas_gc_lock);
 	bch2_replicas_gc_start(c, 1 << BCH_DATA_journal);
 
-	seq = 0;
+	/*
+	 * Now that we've populated replicas_gc, write to the journal to mark
+	 * active journal devices. This handles the case where the journal might
+	 * be empty. Otherwise we could clear all journal replicas and
+	 * temporarily put the fs into an unrecoverable state. Journal recovery
+	 * expects to find devices marked for journal data on unclean mount.
+	 */
+	ret = bch2_journal_meta(&c->journal);
+	if (ret) {
+		mutex_unlock(&c->replicas_gc_lock);
+		return ret;
+	}
 
+	seq = 0;
 	spin_lock(&j->lock);
 	while (!ret) {
 		struct bch_replicas_padded replicas;
author	Brian Foster <bfoster@redhat.com>	2023-06-30 10:51:46 -0400
committer	Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:05 -0400
commit	d14bfd1010c4ce8bede5bd98d0b332e3b34b8bd5 (patch)
tree	fe8b02904be8146b6383bedba5860379aec09a32 /fs/bcachefs/journal_reclaim.c
parent	a02a0121b3de81f985d6c751f1557c7aea832b9a (diff)