Bug #4753
closedmds/Locker.cc: 4167: FAILED assert(0)
0%
Description
Every mds crashed after some startup checks: "mds/Locker.cc: 4167: FAILED assert(0)":
mds/Locker.cc: 4167: FAILED assert(0)
ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
1: (Locker::scatter_mix(ScatterLock*, bool*)+0x1cb) [0x6303fb]
2: (Locker::file_eval(ScatterLock*, bool*)+0x6e5) [0x6346b5]
3: (Locker::eval(CInode*, int, bool)+0x8d5) [0x63cd75]
4: (MDCache::reissue_all_caps()+0x333) [0x58b8b3]
5: (MDS::recovery_done()+0xca) [0x4c1a0a]
6: (MDS::handle_mds_map(MMDSMap*)+0x29ab) [0x4d372b]
7: (MDS::handle_core_message(Message*)+0xb93) [0x4d4873]
8: (MDS::_dispatch(Message*)+0x36) [0x4d4a46]
9: (MDS::ms_dispatch(Message*)+0x19b) [0x4d6a3b]
10: (DispatchQueue::entry()+0x319) [0x8b56b9]
11: (DispatchQueue::DispatchThread::entry()+0xd) [0x80fd1d]
12: (()+0x84d8) [0x7fc0dcacd4d8]
13: (clone()+0x6d) [0x7fc0dafc0e8d]
Full level 10 log here: http://mahatma.bspu.unibel.by/download/transit/ceph-mds.4.log.gz
Updated by Greg Farnum about 11 years ago
file_eval is trying to move ifile from "scan" to "mixed" in order to serve up the client caps, and scatter_mix doesn't think that's a valid transition. It's probably correct as we need to determine inode state before doing other things with them?
Updated by Sage Weil about 11 years ago
yeah, that transition doesn't make sense. i think it should do nothing in the scan state..
Updated by Greg Farnum about 11 years ago
- Priority changed from Normal to High
You mean file_eval should just short-circuit if it's scanning? That seems like the most sensible place for it, but I'm not quite sure, and there are a few other states and transitions around it so I'd need to spend some time diving in to figure it out.
However, this is a single-MDS issue on recovery which doesn't go away with a reboot, so High it is!
Updated by Denis kaganovich about 11 years ago
Additional: I resolve it runtime, changing assert(0) to some lock (IMHO first in this case) on one node and found forgotten ctdbd process (trying to lock own file) on one of machines. I kill him and restart all mds with assert(0) again, now all good.
Updated by Greg Farnum about 11 years ago
- Project changed from Ceph to CephFS
- Category set to 47
Updated by Sage Weil almost 11 years ago
- Status changed from New to Resolved
fixed this in commit:482733e9603e47a3a427b17bfb9b9189dacd5109