Bug #22058
mds: admin socket wait for scrub completion is racy
% Done:
0%
Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
The code to wait for the scrub completion is racy:
If the scrub completes before the wait, then the admin socket thread forever waits.
The mutex we're using is wrong. We need to implement our own C_SaferCond which takes the mds_lock.
Related issues
History
#1 Updated by Zheng Yan about 6 years ago
class C_SaferCond : public Context { Mutex lock; ///< Mutex to take Cond cond; ///< Cond to signal bool done; ///< true after finish() has been called int rval; ///< return value public: C_SaferCond() : lock("C_SaferCond"), done(false), rval(0) {} void finish(int r) override { complete(r); } /// We overload complete in order to not delete the context void complete(int r) override { Mutex::Locker l(lock); done = true; rval = r; cond.Signal(); } /// Returns rval once the Context is called int wait() { Mutex::Locker l(lock); while (!done) cond.Wait(lock); return rval; } };
I don't understand why it waits forever. the variable 'done' is true if scrub completes
#2 Updated by Patrick Donnelly about 6 years ago
Hrm, I missed that bit of logic. Yes, I don't know why it waits forever either.
#3 Updated by Zheng Yan about 6 years ago
- Status changed from New to Need More Info
no log, wait for it to happen again.
#4 Updated by Zheng Yan about 6 years ago
- Status changed from Need More Info to Fix Under Review
- Backport deleted (
luminous)
#5 Updated by Patrick Donnelly about 6 years ago
- Status changed from Fix Under Review to Resolved
#6 Updated by Patrick Donnelly almost 6 years ago
- Status changed from Resolved to Pending Backport
- Backport set to luminous
Needs backport as the bug will be introduced by:
#7 Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #22907: luminous: mds: admin socket wait for scrub completion is racy added
#8 Updated by Nathan Cutler almost 6 years ago
- Status changed from Pending Backport to Resolved