diff options
author | Jian Wang <wangjian161@huawei.com> | 2019-02-13 10:53:20 +1100 |
---|---|---|
committer | Stephen Rothwell <sfr@canb.auug.org.au> | 2019-02-13 14:00:11 +1100 |
commit | 5313f35b1f57d7a9f22efd0a5f19faae1144a7de (patch) | |
tree | 0d0a364f36cba0ef2ff6a9f1990e539e8e11094f /fs/ocfs2 | |
parent | fe9345753de5af7bcb2adc63353c540fd31b85ce (diff) |
ocfs2/dlm: return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled
In dlm_move_lockres_to_recovery_list(), if the lock is in the granted
queue and cancel_pending is set, it will encounter a BUG. I think this is
a meaningless BUG, so be prepared to remove it. A scenario that causes
this BUG will be given below.
At the beginning, Node 1 is the master and has NL lock, Node 2 has PR
lock, Node 3 has PR lock too.
Node 1 Node 2 Node 3
want to get EX lock.
want to get EX lock.
Node 3 hinder
Node 2 to get
EX lock, send
Node 3 a BAST.
receive BAST from
Node 1. downconvert
thread begin to
cancel PR to EX conversion.
In dlmunlock_common function,
downconvert thread has set
lock->cancel_pending,
but did not enter
dlm_send_remote_unlock_request
function.
Node2 dies because
the host is powered down.
In recovery process,
clean the lock that
related to Node2.
then finish Node 3
PR to EX request.
give Node 3 a AST.
receive AST from Node 1.
change lock level to EX,
move lock to granted list.
Node1 dies because
the host is powered down.
In dlm_move_lockres_to_recovery_list
function. the lock is in the
granted queue and cancel_pending
is set. BUG_ON.
But after clearing this BUG, process will encounter
the second BUG in the ocfs2_unlock_ast function.
Here is a scenario that will cause the second BUG
in ocfs2_unlock_ast as follows:
At the beginning, Node 1 is the master and has NL lock,
Node 2 has PR lock, Node 3 has PR lock too.
Node 1 Node 2 Node 3
want to get EX lock.
want to get EX lock.
Node 3 hinder
Node 2 to get
EX lock, send
Node 3 a BAST.
receive BAST from
Node 1. downconvert
thread begin to
cancel PR to EX conversion.
In dlmunlock_common function,
downconvert thread has released
lock->spinlock and res->spinlock,
but did not enter
dlm_send_remote_unlock_request
function.
Node2 dies because
the host is powered down.
In recovery process,
clean the lock that
related to Node2.
then finish Node 3
PR to EX request.
give Node 3 a AST.
receive AST from Node 1.
change lock level to EX,
move lock to granted list,
set lockres->l_unlock_action
as OCFS2_UNLOCK_INVALID
in ocfs2_locking_ast function.
Node2 dies because
the host is powered down.
Node 3 realize that Node 1
is dead, remove Node 1 from
domain_map. downconvert thread
get DLM_NORMAL from
dlm_send_remote_unlock_request
function and set *call_ast as 1.
Then downconvert thread meet
BUG in ocfs2_unlock_ast function.
To avoid meet the second BUG, dlmunlock_common() should return
DLM_CANCELGRANT if the lock is on granted list and the operation is
canceled.
Link: http://lkml.kernel.org/r/98f0e80c-9c13-dbb6-047c-b40e100082b1@huawei.com
Signed-off-by: Jian Wang <wangjian161@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Diffstat (limited to 'fs/ocfs2')
-rw-r--r-- | fs/ocfs2/dlm/dlmrecovery.c | 1 | ||||
-rw-r--r-- | fs/ocfs2/dlm/dlmunlock.c | 5 |
2 files changed, 5 insertions, 1 deletions
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 802636d50365..74896525dc1a 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2134,7 +2134,6 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, * if this had completed successfully * before sending this lock state to the * new master */ - BUG_ON(i != DLM_CONVERTING_LIST); mlog(0, "node died with cancel pending " "on %.*s. move back to granted list.\n", res->lockname.len, res->lockname.name); diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c index bde104a02b74..b92d603b0cc1 100644 --- a/fs/ocfs2/dlm/dlmunlock.c +++ b/fs/ocfs2/dlm/dlmunlock.c @@ -183,6 +183,11 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, flags, owner); spin_lock(&res->spinlock); spin_lock(&lock->spinlock); + + if ((flags & LKM_CANCEL) && + dlm_lock_on_list(&res->granted, lock)) + status = DLM_CANCELGRANT; + /* if the master told us the lock was already granted, * let the ast handle all of these actions */ if (status == DLM_CANCELGRANT) { |