summaryrefslogtreecommitdiff
path: root/fs/ocfs2
diff options
context:
space:
mode:
authorJian Wang <wangjian161@huawei.com>2019-02-13 10:53:20 +1100
committerStephen Rothwell <sfr@canb.auug.org.au>2019-02-13 14:00:11 +1100
commit5313f35b1f57d7a9f22efd0a5f19faae1144a7de (patch)
tree0d0a364f36cba0ef2ff6a9f1990e539e8e11094f /fs/ocfs2
parentfe9345753de5af7bcb2adc63353c540fd31b85ce (diff)
ocfs2/dlm: return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled
In dlm_move_lockres_to_recovery_list(), if the lock is in the granted queue and cancel_pending is set, it will encounter a BUG. I think this is a meaningless BUG, so be prepared to remove it. A scenario that causes this BUG will be given below. At the beginning, Node 1 is the master and has NL lock, Node 2 has PR lock, Node 3 has PR lock too. Node 1 Node 2 Node 3 want to get EX lock. want to get EX lock. Node 3 hinder Node 2 to get EX lock, send Node 3 a BAST. receive BAST from Node 1. downconvert thread begin to cancel PR to EX conversion. In dlmunlock_common function, downconvert thread has set lock->cancel_pending, but did not enter dlm_send_remote_unlock_request function. Node2 dies because the host is powered down. In recovery process, clean the lock that related to Node2. then finish Node 3 PR to EX request. give Node 3 a AST. receive AST from Node 1. change lock level to EX, move lock to granted list. Node1 dies because the host is powered down. In dlm_move_lockres_to_recovery_list function. the lock is in the granted queue and cancel_pending is set. BUG_ON. But after clearing this BUG, process will encounter the second BUG in the ocfs2_unlock_ast function. Here is a scenario that will cause the second BUG in ocfs2_unlock_ast as follows: At the beginning, Node 1 is the master and has NL lock, Node 2 has PR lock, Node 3 has PR lock too. Node 1 Node 2 Node 3 want to get EX lock. want to get EX lock. Node 3 hinder Node 2 to get EX lock, send Node 3 a BAST. receive BAST from Node 1. downconvert thread begin to cancel PR to EX conversion. In dlmunlock_common function, downconvert thread has released lock->spinlock and res->spinlock, but did not enter dlm_send_remote_unlock_request function. Node2 dies because the host is powered down. In recovery process, clean the lock that related to Node2. then finish Node 3 PR to EX request. give Node 3 a AST. receive AST from Node 1. change lock level to EX, move lock to granted list, set lockres->l_unlock_action as OCFS2_UNLOCK_INVALID in ocfs2_locking_ast function. Node2 dies because the host is powered down. Node 3 realize that Node 1 is dead, remove Node 1 from domain_map. downconvert thread get DLM_NORMAL from dlm_send_remote_unlock_request function and set *call_ast as 1. Then downconvert thread meet BUG in ocfs2_unlock_ast function. To avoid meet the second BUG, dlmunlock_common() should return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled. Link: http://lkml.kernel.org/r/98f0e80c-9c13-dbb6-047c-b40e100082b1@huawei.com Signed-off-by: Jian Wang <wangjian161@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Diffstat (limited to 'fs/ocfs2')
-rw-r--r--fs/ocfs2/dlm/dlmrecovery.c1
-rw-r--r--fs/ocfs2/dlm/dlmunlock.c5
2 files changed, 5 insertions, 1 deletions
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 802636d50365..74896525dc1a 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2134,7 +2134,6 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
* if this had completed successfully
* before sending this lock state to the
* new master */
- BUG_ON(i != DLM_CONVERTING_LIST);
mlog(0, "node died with cancel pending "
"on %.*s. move back to granted list.\n",
res->lockname.len, res->lockname.name);
diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c
index bde104a02b74..b92d603b0cc1 100644
--- a/fs/ocfs2/dlm/dlmunlock.c
+++ b/fs/ocfs2/dlm/dlmunlock.c
@@ -183,6 +183,11 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
flags, owner);
spin_lock(&res->spinlock);
spin_lock(&lock->spinlock);
+
+ if ((flags & LKM_CANCEL) &&
+ dlm_lock_on_list(&res->granted, lock))
+ status = DLM_CANCELGRANT;
+
/* if the master told us the lock was already granted,
* let the ast handle all of these actions */
if (status == DLM_CANCELGRANT) {