Bug #38263
closedmds: fix potential re-evaluate stray dentry in _unlink_local_finish
0%
Description
```
2019-01-25 10:08:13.917522 7f882dcca700 1 /data/build_ceph/ceph-build-luminous/BUILD/ceph-12.2.8-217-gaf1d23f093/src/mds/StrayManager.cc: In function 'bool StrayManager::_eval_stray(CDentry*, bool)' thread 7f882dcca700 time 2019-01-25 10:08:13.915560 /data/build_ceph/ceph-build-luminous/BUILD/ceph-12.2.8-217-gaf1d23f093/src/mds/StrayManager.cc: 421: FAILED assert(!dn>state_test(CDentry::STATE_PURGING))
ceph version 12.2.8-217-gaf1d23f093 (af1d23f093441e0fb7550afff43153bd0bb09e3c) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f883d211f00]
2: (StrayManager::_eval_stray(CDentry*, bool)+0xd13) [0x7f883d045553]
3: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x7f883d04565e]
4: (Server::_unlink_local_finish(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*, unsigned long)+0x393) [0x7f883cf31273]
5: (MDSIOContextBase::complete(int)+0xa4) [0x7f883d15ac44]
6: (MDSLogContextBase::complete(int)+0x3f) [0x7f883d15b06f]
7: (Finisher::finisher_thread_entry()+0x198) [0x7f883d210e08]
8: (()+0x7dc5) [0x7f883acefdc5]
9: (clone()+0x6d) [0x7f8839dd574d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
```
This crash happened on the MDS under very heavy load. The root cause should be like this:
1. MDS handle_client_unlink sends early reply to client.
2. client processes faster and sends cap release to MDS.
3. MDS processes handle_client_cap_release before _unlink_local_finish.
4. MDS processes _unlink_local_finish:
4.1 drops locks and decreases ref in respond_to_request, then triggers eval_stray for the first time.
4.2 calls notify_stray and enters eval_stray for the second time, then crash happens.
Normally _unlink_local_finish will be processed before handle_client_cap_release, so eval_stray will be called only in handle_client_cap_release.
Updated by Patrick Donnelly about 5 years ago
- Status changed from New to Fix Under Review
- Assignee set to Zhi Zhang
- Target version set to v14.0.0
- Start date deleted (
02/12/2019) - Backport set to mimic,luminous
- Pull request ID set to 26374
Updated by Patrick Donnelly about 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38335: mimic: mds: fix potential re-evaluate stray dentry in _unlink_local_finish added
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38336: luminous: mds: fix potential re-evaluate stray dentry in _unlink_local_finish added
Updated by Nathan Cutler about 5 years ago
- Status changed from Pending Backport to Resolved