Bug #2494
closedmds: Cannot remove directory despite it being empty.
0%
Description
Running ceph 0.47.1 on top of XFS I've got at least two directories which used to contain files but are now empty and cannot be removed. For example where /c is my ceph mount.
-bash-3.2$ rmdir /c/users/sbohrer/fio/o5007 rmdir: /c/users/sbohrer/fio/o5007: Directory not empty -bash-3.2$ ls -al /c/users/sbohrer/fio/o5007/ total 1 drwxr-xr-x 1 sbohrer hbi 18446744073276227584 May 30 16:12 . drwxr-xr-x 1 sbohrer hbi 30942429184 May 30 16:09 ..
Looking at mon.a log I see numerous messages similar to:
2012-05-30 12:18:15.069905 mds.0 192.168.50.194:6800/26686 1 : [ERR] loaded dup inode 1000001aa48 [2,head] v44262 at /users/sbohrer/fio/o5007/writer.1.106, but inode 1000001aa48.head v1630929 already exists at ~mds0/stray8/1000001aa48 2012-05-30 12:18:15.069947 mds.0 192.168.50.194:6800/26686 2 : [ERR] loaded dup inode 1000001aadf [2,head] v44408 at /users/sbohrer/fio/o5007/writer.1.112, but inode 1000001aadf.head v1631159 already exists at ~mds0/stray8/1000001aadf ... 2012-05-30 12:22:26.206209 mds.4109 192.168.50.195:6800/27459 1 : [WRN] replayed op client.4104:70041,70040 used ino 10000035751 but session next is 1000000000d 2012-05-30 12:22:26.206277 mds.4109 192.168.50.195:6800/27459 2 : [WRN] replayed op client.4104:70042,70040 used ino 1000003576f but session next is 1000000000d
See the attached mon.a.log for all of the ERR/WRN messages. Additionally /var/log/ceph/ceph-mds.a.log is enormous (54205911 lines!) but I've attached a massively trimmed down version that I think has relevent messages that start at the time of the problem. See the attached ceph-mds.a.log.gz It appears that the majority of the log is similar lines repeated over and over.
Files
Updated by Sage Weil almost 12 years ago
- Subject changed from Cannot remove directory despite it being empty. to mds: Cannot remove directory despite it being empty.
- Category set to 1
- Status changed from New to 12
Updated by Anonymous almost 12 years ago
Note that this was triggered frequently by backuppc runs:
http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6815/focus=6820
Updated by Sage Weil over 11 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1)
Updated by Greg Farnum over 11 years ago
- Status changed from 12 to Can't reproduce
The dupe inode suggests this is the problem fixed by Yan's tmap fixes.
Updated by David Galloway almost 7 years ago
- Status changed from Can't reproduce to 12
I'm observing this on our internal cluster. Attempting to remove empty dir /ceph/teuthology-archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log
fails.
dgalloway@teuthology:~$ sudo ls -lah /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log total 0 drwxrwxr-x 1 teuthworker teuthworker 16E Jan 17 01:07 . drwxrwxr-x 1 teuthworker teuthworker 1 Jan 17 01:06 .. dgalloway@teuthology:~$ sudo ls -lah /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/ total 0 drwxrwxr-x 1 teuthworker teuthworker 1 Jan 17 01:06 . drwxrwxr-x 1 teuthworker teuthworker 1 May 25 15:57 .. drwxrwxr-x 1 teuthworker teuthworker 16E Jan 17 01:07 log dgalloway@teuthology:~$ sudo rmdir /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log rmdir: failed to remove '/home/teuthworker/archive/teuthology-2016-12-11_04:20:38-upgrade:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log': Directory not empty dgalloway@teuthology:~$ sudo rm -rf /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log rm: cannot remove '/home/teuthworker/archive/teuthology-2016-12-11_04:20:38-upgrade:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log': Directory not empty
I don't know how useful this is but here are entries from the mds log when I attempt to rm it.
2017-05-25 09:05:56.141707 7fc02ea70700 4 mds.0.server handle_client_request client_request(client.27844205:70654081 getattr pAs #1001f5b79d0/log 2017-05-25 09:05:56.131631 caller_uid=0, caller_gid=0{}) v2 2017-05-25 09:05:56.142339 7fc02ea70700 4 mds.0.server handle_client_request client_request(client.27844205:70654082 rmdir #1001f5b79d0/log 2017-05-25 09:05:56.131631 caller_uid=0, caller_gid=0{}) v2
Here's another example taken from /home/teuthworker/prune.log.dgalloway
2017-05-22 18:47:19,408.408 ERROR:teuthology.prune:Failed to remove /home/teuthworker/archive/teuthology-2016-12-15_11:30:02-rados-kraken-distro-basic-smithi/638596/remote ! Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/prune.py", line 110, in remove shutil.rmtree(path) File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 256, in rmtree onerror(os.rmdir, path, sys.exc_info()) File "/usr/lib/python2.7/shutil.py", line 254, in rmtree os.rmdir(path) OSError: [Errno 39] Directory not empty: '/home/teuthworker/archive/teuthology-2016-12-15_11:30:02-rados-kraken-distro-basic-smithi/638596/remote/smithi015/log'
Updated by David Galloway almost 7 years ago
I've moved these dirs to /ceph/debug-2494
on the Sepia LRC so our prune script will exit cleanly.
Updated by Zheng Yan almost 7 years ago
David Galloway wrote:
I've moved these dirs to
/ceph/debug-2494
on the Sepia LRC so our prune script will exit cleanly.
I fixed undeletable directories by "ceph daemon mds.mira060 scrub_path /debug-2494 repair recursive force" and rm -rf /ceph/debug-2494/
Updated by David Galloway almost 7 years ago
Zheng Yan wrote:
David Galloway wrote:
I've moved these dirs to
/ceph/debug-2494
on the Sepia LRC so our prune script will exit cleanly.I fixed undeletable directories by "ceph daemon mds.mira060 scrub_path /debug-2494 repair recursive force" and rm -rf /ceph/debug-2494/
That's great but from a usability perspective, how would I have known that's what I should've run? Didn't a bug cause the dir to get in that state in the first place?