Bug #2494: mds: Cannot remove directory despite it being empty. - CephFS - Ceph

Actions

Copy link

Bug #2494

closed

mds: Cannot remove directory despite it being empty.

Added by Shawn Bohrer almost 12 years ago. Updated almost 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Running ceph 0.47.1 on top of XFS I've got at least two directories which used to contain files but are now empty and cannot be removed. For example where /c is my ceph mount.

-bash-3.2$ rmdir /c/users/sbohrer/fio/o5007
rmdir: /c/users/sbohrer/fio/o5007: Directory not empty
-bash-3.2$ ls -al /c/users/sbohrer/fio/o5007/
total 1
drwxr-xr-x 1 sbohrer hbi 18446744073276227584 May 30 16:12 .
drwxr-xr-x 1 sbohrer hbi          30942429184 May 30 16:09 ..

Looking at mon.a log I see numerous messages similar to:

2012-05-30 12:18:15.069905 mds.0 192.168.50.194:6800/26686 1 : [ERR] loaded dup inode 1000001aa48 [2,head] v44262 at /users/sbohrer/fio/o5007/writer.1.106, but inode 1000001aa48.head v1630929 already exists at ~mds0/stray8/1000001aa48
2012-05-30 12:18:15.069947 mds.0 192.168.50.194:6800/26686 2 : [ERR] loaded dup inode 1000001aadf [2,head] v44408 at /users/sbohrer/fio/o5007/writer.1.112, but inode 1000001aadf.head v1631159 already exists at ~mds0/stray8/1000001aadf
...
2012-05-30 12:22:26.206209 mds.4109 192.168.50.195:6800/27459 1 : [WRN]  replayed op client.4104:70041,70040 used ino 10000035751 but session next is 1000000000d
2012-05-30 12:22:26.206277 mds.4109 192.168.50.195:6800/27459 2 : [WRN]  replayed op client.4104:70042,70040 used ino 1000003576f but session next is 1000000000d

See the attached mon.a.log for all of the ERR/WRN messages. Additionally /var/log/ceph/ceph-mds.a.log is enormous (54205911 lines!) but I've attached a massively trimmed down version that I think has relevent messages that start at the time of the problem. See the attached ceph-mds.a.log.gz It appears that the majority of the log is similar lines repeated over and over.

Files

Download all files

mon.a.log (126 KB) mon.a.log		Shawn Bohrer, 05/31/2012 02:40 PM
ceph-mds.a.log.gz (6.52 MB) ceph-mds.a.log.gz		Shawn Bohrer, 05/31/2012 02:40 PM

Actions

Copy link

Updated by Sage Weil almost 12 years ago

Subject changed from Cannot remove directory despite it being empty. to mds: Cannot remove directory despite it being empty.
Category set to 1
Status changed from New to 12

Actions

Copy link

Updated by Anonymous almost 12 years ago

Note that this was triggered frequently by backuppc runs:
http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6815/focus=6820

Actions

Copy link

Updated by Sage Weil over 11 years ago

Project changed from Ceph to CephFS
Category deleted (1)

Actions

Copy link

Updated by Greg Farnum over 11 years ago

Status changed from 12 to Can't reproduce

The dupe inode suggests this is the problem fixed by Yan's tmap fixes.

Actions

Copy link

Updated by David Galloway almost 7 years ago

Status changed from Can't reproduce to 12

I'm observing this on our internal cluster. Attempting to remove empty dir /ceph/teuthology-archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log fails.

dgalloway@teuthology:~$ sudo ls -lah /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log
total 0
drwxrwxr-x 1 teuthworker teuthworker 16E Jan 17 01:07 .
drwxrwxr-x 1 teuthworker teuthworker   1 Jan 17 01:06 ..

dgalloway@teuthology:~$ sudo ls -lah /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/
total 0
drwxrwxr-x 1 teuthworker teuthworker   1 Jan 17 01:06 .
drwxrwxr-x 1 teuthworker teuthworker   1 May 25 15:57 ..
drwxrwxr-x 1 teuthworker teuthworker 16E Jan 17 01:07 log

dgalloway@teuthology:~$ sudo rmdir /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log
rmdir: failed to remove '/home/teuthworker/archive/teuthology-2016-12-11_04:20:38-upgrade:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log': Directory not empty

dgalloway@teuthology:~$ sudo rm -rf /home/teuthworker/archive/teuthology-2016-12-11_04\:20\:38-upgrade\:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log
rm: cannot remove '/home/teuthworker/archive/teuthology-2016-12-11_04:20:38-upgrade:jewel-x-master-distro-basic-vps/624946/remote/vpm169/log': Directory not empty

I don't know how useful this is but here are entries from the mds log when I attempt to rm it.

2017-05-25 09:05:56.141707 7fc02ea70700  4 mds.0.server handle_client_request client_request(client.27844205:70654081 getattr pAs #1001f5b79d0/log 2017-05-25 09:05:56.131631 caller_uid=0, caller_gid=0{}) v2
2017-05-25 09:05:56.142339 7fc02ea70700  4 mds.0.server handle_client_request client_request(client.27844205:70654082 rmdir #1001f5b79d0/log 2017-05-25 09:05:56.131631 caller_uid=0, caller_gid=0{}) v2

Here's another example taken from /home/teuthworker/prune.log.dgalloway

2017-05-22 18:47:19,408.408 ERROR:teuthology.prune:Failed to remove /home/teuthworker/archive/teuthology-2016-12-15_11:30:02-rados-kraken-distro-basic-smithi/638596/remote !
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/prune.py", line 110, in remove
    shutil.rmtree(path)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib/python2.7/shutil.py", line 254, in rmtree
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/teuthworker/archive/teuthology-2016-12-15_11:30:02-rados-kraken-distro-basic-smithi/638596/remote/smithi015/log'

Actions

Copy link

Updated by David Galloway almost 7 years ago

I've moved these dirs to /ceph/debug-2494 on the Sepia LRC so our prune script will exit cleanly.

Actions

Copy link

Updated by Zheng Yan almost 7 years ago

David Galloway wrote:

I've moved these dirs to /ceph/debug-2494 on the Sepia LRC so our prune script will exit cleanly.

I fixed undeletable directories by "ceph daemon mds.mira060 scrub_path /debug-2494 repair recursive force" and rm -rf /ceph/debug-2494/

Actions

Copy link

Updated by Zheng Yan almost 7 years ago

Status changed from 12 to Resolved

Actions

Copy link

Updated by David Galloway almost 7 years ago

Zheng Yan wrote:

David Galloway wrote:

I've moved these dirs to /ceph/debug-2494 on the Sepia LRC so our prune script will exit cleanly.

I fixed undeletable directories by "ceph daemon mds.mira060 scrub_path /debug-2494 repair recursive force" and rm -rf /ceph/debug-2494/

That's great but from a usability perspective, how would I have known that's what I should've run? Didn't a bug cause the dir to get in that state in the first place?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #2494

mds: Cannot remove directory despite it being empty.

Updated by Sage Weil almost 12 years ago

Updated by Anonymous almost 12 years ago

Updated by Sage Weil over 11 years ago

Updated by Greg Farnum over 11 years ago

Updated by David Galloway almost 7 years ago

Updated by David Galloway almost 7 years ago

Updated by Zheng Yan almost 7 years ago

Updated by Zheng Yan almost 7 years ago

Updated by David Galloway almost 7 years ago