Project

General

Profile

Actions

Bug #2494

closed

mds: Cannot remove directory despite it being empty.

Added by Shawn Bohrer almost 12 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Running ceph 0.47.1 on top of XFS I've got at least two directories which used to contain files but are now empty and cannot be removed. For example where /c is my ceph mount.

-bash-3.2$ rmdir /c/users/sbohrer/fio/o5007
rmdir: /c/users/sbohrer/fio/o5007: Directory not empty
-bash-3.2$ ls -al /c/users/sbohrer/fio/o5007/
total 1
drwxr-xr-x 1 sbohrer hbi 18446744073276227584 May 30 16:12 .
drwxr-xr-x 1 sbohrer hbi          30942429184 May 30 16:09 ..

Looking at mon.a log I see numerous messages similar to:

2012-05-30 12:18:15.069905 mds.0 192.168.50.194:6800/26686 1 : [ERR] loaded dup inode 1000001aa48 [2,head] v44262 at /users/sbohrer/fio/o5007/writer.1.106, but inode 1000001aa48.head v1630929 already exists at ~mds0/stray8/1000001aa48
2012-05-30 12:18:15.069947 mds.0 192.168.50.194:6800/26686 2 : [ERR] loaded dup inode 1000001aadf [2,head] v44408 at /users/sbohrer/fio/o5007/writer.1.112, but inode 1000001aadf.head v1631159 already exists at ~mds0/stray8/1000001aadf
...
2012-05-30 12:22:26.206209 mds.4109 192.168.50.195:6800/27459 1 : [WRN]  replayed op client.4104:70041,70040 used ino 10000035751 but session next is 1000000000d
2012-05-30 12:22:26.206277 mds.4109 192.168.50.195:6800/27459 2 : [WRN]  replayed op client.4104:70042,70040 used ino 1000003576f but session next is 1000000000d

See the attached mon.a.log for all of the ERR/WRN messages. Additionally /var/log/ceph/ceph-mds.a.log is enormous (54205911 lines!) but I've attached a massively trimmed down version that I think has relevent messages that start at the time of the problem. See the attached ceph-mds.a.log.gz It appears that the majority of the log is similar lines repeated over and over.


Files

mon.a.log (126 KB) mon.a.log Shawn Bohrer, 05/31/2012 02:40 PM
ceph-mds.a.log.gz (6.52 MB) ceph-mds.a.log.gz Shawn Bohrer, 05/31/2012 02:40 PM
Actions #1

Updated by Sage Weil almost 12 years ago

  • Subject changed from Cannot remove directory despite it being empty. to mds: Cannot remove directory despite it being empty.
  • Category set to 1
  • Status changed from New to 12
Actions #2

Updated by Anonymous almost 12 years ago

Note that this was triggered frequently by backuppc runs:
http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6815/focus=6820

Actions #3

Updated by Sage Weil over 11 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
Actions #4

Updated by Greg Farnum over 11 years ago

  • Status changed from 12 to Can't reproduce

The dupe inode suggests this is the problem fixed by Yan's tmap fixes.

Actions #5

Updated by David Galloway almost 7 years ago

  • Status changed from Can't reproduce to 12

I'm observing this on our internal cluster. Attempting to remove empty dir /ceph/teuthology-archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log fails.

dgalloway@teuthology:~$ sudo ls -lah /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log
total 0
drwxrwxr-x 1 teuthworker teuthworker 16E Jan 17 01:07 .
drwxrwxr-x 1 teuthworker teuthworker   1 Jan 17 01:06 ..

dgalloway@teuthology:~$ sudo ls -lah /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/
total 0
drwxrwxr-x 1 teuthworker teuthworker   1 Jan 17 01:06 .
drwxrwxr-x 1 teuthworker teuthworker   1 May 25 15:57 ..
drwxrwxr-x 1 teuthworker teuthworker 16E Jan 17 01:07 log

dgalloway@teuthology:~$ sudo rmdir /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log
rmdir: failed to remove '/home/teuthworker/archive/teuthology-2016-12-11_04:20:38-upgrade:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log': Directory not empty

dgalloway@teuthology:~$ sudo rm -rf /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log
rm: cannot remove '/home/teuthworker/archive/teuthology-2016-12-11_04:20:38-upgrade:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log': Directory not empty

I don't know how useful this is but here are entries from the mds log when I attempt to rm it.

2017-05-25 09:05:56.141707 7fc02ea70700  4 mds.0.server handle_client_request client_request(client.27844205:70654081 getattr pAs #1001f5b79d0/log 2017-05-25 09:05:56.131631 caller_uid=0, caller_gid=0{}) v2
2017-05-25 09:05:56.142339 7fc02ea70700  4 mds.0.server handle_client_request client_request(client.27844205:70654082 rmdir #1001f5b79d0/log 2017-05-25 09:05:56.131631 caller_uid=0, caller_gid=0{}) v2

Here's another example taken from /home/teuthworker/prune.log.dgalloway

2017-05-22 18:47:19,408.408 ERROR:teuthology.prune:Failed to remove /home/teuthworker/archive/teuthology-2016-12-15_11:30:02-rados-kraken-distro-basic-smithi/638596/remote !
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/prune.py", line 110, in remove
    shutil.rmtree(path)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib/python2.7/shutil.py", line 254, in rmtree
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/teuthworker/archive/teuthology-2016-12-15_11:30:02-rados-kraken-distro-basic-smithi/638596/remote/smithi015/log'
Actions #6

Updated by David Galloway almost 7 years ago

I've moved these dirs to /ceph/debug-2494 on the Sepia LRC so our prune script will exit cleanly.

Actions #7

Updated by Zheng Yan almost 7 years ago

David Galloway wrote:

I've moved these dirs to /ceph/debug-2494 on the Sepia LRC so our prune script will exit cleanly.

I fixed undeletable directories by "ceph daemon mds.mira060 scrub_path /debug-2494 repair recursive force" and rm -rf /ceph/debug-2494/

Actions #8

Updated by Zheng Yan almost 7 years ago

  • Status changed from 12 to Resolved
Actions #9

Updated by David Galloway almost 7 years ago

Zheng Yan wrote:

David Galloway wrote:

I've moved these dirs to /ceph/debug-2494 on the Sepia LRC so our prune script will exit cleanly.

I fixed undeletable directories by "ceph daemon mds.mira060 scrub_path /debug-2494 repair recursive force" and rm -rf /ceph/debug-2494/

That's great but from a usability perspective, how would I have known that's what I should've run? Didn't a bug cause the dir to get in that state in the first place?

Actions

Also available in: Atom PDF