Project

General

Profile

Actions

Bug #1850

closed

mds sometimes crashes removing trees with plenty of hardlinks

Added by Alexandre Oliva over 12 years ago. Updated over 12 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

rsync -aH /usr/share/zoneinfo/ /mnt/ceph-fuse/subdir/ (the H and a hardlink-plentiful /usr/share/zoneinfo are essential to trigger the bug), let the mds stabilize, and then (perhaps after restarting mdses, re-mounting, not sure) rm -rf /mnt/ceph-fuse/subdir. Odds are high that the mds will fail assert(anchor_map.count(ino)) within AnchorServer::dec(inodeno_t) and die. Upon restart, it might be lucky to recover (not sure about that), but odds are it will rejoin and become active and shortly thereafter process deferred messages that will fail assert(anchor_map.count(curino) == 1) within AnchorServer::handle_query(MMDSTableRequest*).

Removing only one link at a time seems to avoid this failure mode, presumably (wild guess) because then the mds has enough time to move back into the containing dir the inode whose second hard link was removed, and if we hit the removal before this operation is complete, some internal consistency assumption doesn't hold.

Actions

Also available in: Atom PDF