Bug #1850
mds sometimes crashes removing trees with plenty of hardlinks
0%
Description
rsync -aH /usr/share/zoneinfo/ /mnt/ceph-fuse/subdir/ (the H and a hardlink-plentiful /usr/share/zoneinfo are essential to trigger the bug), let the mds stabilize, and then (perhaps after restarting mdses, re-mounting, not sure) rm -rf /mnt/ceph-fuse/subdir. Odds are high that the mds will fail assert(anchor_map.count(ino)) within AnchorServer::dec(inodeno_t) and die. Upon restart, it might be lucky to recover (not sure about that), but odds are it will rejoin and become active and shortly thereafter process deferred messages that will fail assert(anchor_map.count(curino) == 1) within AnchorServer::handle_query(MMDSTableRequest*).
Removing only one link at a time seems to avoid this failure mode, presumably (wild guess) because then the mds has enough time to move back into the containing dir the inode whose second hard link was removed, and if we hit the removal before this operation is complete, some internal consistency assumption doesn't hold.
History
#1 Updated by Greg Farnum almost 12 years ago
- Status changed from New to Duplicate
I'm pretty sure you're looking at #1047 here. :)