xfs: allow queued AG intents to drain before scrubbing

When a writer thread executes a chain of log intent items, the AG header buffer locks will cycle during a transaction roll to get from one intent item to the next in a chain. Although scrub takes all AG header buffer locks, this isn't sufficient to guard against scrub checking an AG while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding allocation groups. When there's a collision, cross-referencing between data structures (e.g. rmapbt and refcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the perag structure the count of active intents and make scrub wait until it has both AG header buffer locks and the intent counter reaches zero. One quirk of the drain code is that deferred bmap updates also bump and drop the intent counter. A fundamental decision made during the design phase of the reverse mapping feature is that updates to the rmapbt records are always made by the same code that updates the primary metadata. In other words, callers of bmapi functions expect that the bmapi functions will queue deferred rmap updates. Some parts of the reflink code queue deferred refcount (CUI) and bmap (BUI) updates in the same head transaction, but the deferred work manager completely finishes the CUI before the BUI work is started. As a result, the CUI drops the intent count long before the deferred rmap (RUI) update even has a chance to bump the intent count. The only way to keep the intent count elevated between the CUI and RUI is for the BUI to bump the counter until the RUI has been created. A second quirk of the intent drain code is that deferred work items must increment the intent counter as soon as the work item is added to the transaction. When a BUI completes and queues an RUI, the RUI must increment the counter before the BUI decrements it. The only way to accomplish this is to require that the counter be bumped as soon as the deferred work item is created in memory. In the next patches we'll improve on this facility, but this patch provides the basic functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
author: Darrick J. Wong <djwong@kernel.org> 2022-07-14 11:05:36 -0700
committer: Darrick J. Wong <djwong@kernel.org> 2022-11-09 19:07:13 -0800
commit: 41647d914da91ad4280a632c5cbee97cf6f7020a (patch)
tree: 99aac7c510bba9059c1ff4de4edf09aacf440e56 /fs/xfs/xfs_drain.c
parent: f7797961cd1c641bf09cf45c0cd12b54deb8bafa (diff)
1 files changed, 96 insertions, 0 deletions
diff --git a/fs/xfs/xfs_drain.c b/fs/xfs/xfs_drain.c
new file mode 100644
index 000000000000..e8fced914f88
--- /dev/null
+++ b/fs/xfs/xfs_drain.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_ag.h"
+#include "xfs_trace.h"
+
+void
+xfs_drain_init(
+	struct xfs_drain	*dr)
+{
+	atomic_set(&dr->dr_count, 0);
+	init_waitqueue_head(&dr->dr_waiters);
+}
+
+void
+xfs_drain_free(struct xfs_drain	*dr)
+{
+	ASSERT(atomic_read(&dr->dr_count) == 0);
+}
+
+/* Increase the pending intent count. */
+static inline void xfs_drain_bump(struct xfs_drain *dr)
+{
+	atomic_inc(&dr->dr_count);
+}
+
+/* Decrease the pending intent count, and wake any waiters, if appropriate. */
+static inline void xfs_drain_drop(struct xfs_drain *dr)
+{
+	if (atomic_dec_and_test(&dr->dr_count) &&
+	    wq_has_sleeper(&dr->dr_waiters))
+		wake_up(&dr->dr_waiters);
+}
+
+/* Are there work items pending? */
+static inline bool xfs_drain_busy(struct xfs_drain *dr)
+{
+	return atomic_read(&dr->dr_count) > 0;
+}
+
+/*
+ * Wait for the pending intent count for a drain to hit zero.
+ *
+ * Callers must not hold any locks that would prevent intents from being
+ * finished.
+ */
+static inline int xfs_drain_wait(struct xfs_drain *dr)
+{
+	return wait_event_killable(dr->dr_waiters, !xfs_drain_busy(dr));
+}
+
+/* Add an item to the pending count. */
+void
+xfs_perag_bump_intents(
+	struct xfs_perag	*pag)
+{
+	trace_xfs_perag_bump_intents(pag, __return_address);
+	xfs_drain_bump(&pag->pag_intents);
+}
+
+/* Remove an item from the pending count. */
+void
+xfs_perag_drop_intents(
+	struct xfs_perag	*pag)
+{
+	trace_xfs_perag_drop_intents(pag, __return_address);
+	xfs_drain_drop(&pag->pag_intents);
+}
+
+/*
+ * Wait for the pending intent count for AG metadata to hit zero.
+ * Callers must not hold any AG header buffers.
+ */
+int
+xfs_perag_drain_intents(
+	struct xfs_perag	*pag)
+{
+	trace_xfs_perag_wait_intents(pag, __return_address);
+	return xfs_drain_wait(&pag->pag_intents);
+}
+
+/* Might someone else be processing intents for this AG? */
+bool
+xfs_perag_intents_busy(
+	struct xfs_perag	*pag)
+{
+	return xfs_drain_busy(&pag->pag_intents);
+}
author	Darrick J. Wong <djwong@kernel.org>	2022-07-14 11:05:36 -0700
committer	Darrick J. Wong <djwong@kernel.org>	2022-11-09 19:07:13 -0800
commit	41647d914da91ad4280a632c5cbee97cf6f7020a (patch)
tree	99aac7c510bba9059c1ff4de4edf09aacf440e56 /fs/xfs/xfs_drain.c
parent	f7797961cd1c641bf09cf45c0cd12b54deb8bafa (diff)