Project

General

Profile

Bug #17921

CephFS snapshot removal fails with "Stale file handle" error

Added by Ramakrishnan Periyasamy almost 4 years ago. Updated almost 4 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature:

Description

Removing CephFS snapshot fails while testing it in a loop with "stale file handle" error but there will not be any snapshot in particular directory.

Error in teuthology log:
2016-11-16T08:49:17.283 INFO:teuthology.orchestra.run.magna079:Running: 'sudo adjust-ulimits daemon-helper kill python -c \'\nimport zlib\npath = "/home/ubuntu/cephtest/mnt.0/source/subdir/file_a"\nf = open(path, \'"\'"\'w\'"\'"\')\nfor i in range(0, 2097152):\n val = zlib.crc32("%s" % i) & 7\n f.write(chr(val))\nf.close()\n\''
2016-11-16T08:49:19.768 INFO:teuthology.orchestra.run.magna079:Running: 'cd /home/ubuntu/cephtest/mnt.0 && sudo mkdir source/.snap/snap6'
2016-11-16T08:49:19.918 INFO:teuthology.orchestra.run.magna079:Running: 'cd /home/ubuntu/cephtest/mnt.0 && sudo rsync -azvh source/.snap/snap6 rsyncdir/dir1/'
2016-11-16T08:49:20.233 INFO:teuthology.orchestra.run.magna079.stdout:sending incremental file list
2016-11-16T08:49:20.233 INFO:teuthology.orchestra.run.magna079.stdout:snap6/
2016-11-16T08:49:20.233 INFO:teuthology.orchestra.run.magna079.stdout:snap6/subdir/
2016-11-16T08:49:20.234 INFO:teuthology.orchestra.run.magna079.stdout:snap6/subdir/file_a
2016-11-16T08:49:20.234 INFO:teuthology.orchestra.run.magna079.stdout:
2016-11-16T08:49:20.234 INFO:teuthology.orchestra.run.magna079.stdout:sent 21.29K bytes received 39 bytes 14.22K bytes/sec
2016-11-16T08:49:20.235 INFO:teuthology.orchestra.run.magna079.stdout:total size is 2.10M speedup is 98.31
2016-11-16T08:49:20.235 INFO:teuthology.orchestra.run.magna079:Running: 'cd /home/ubuntu/cephtest/mnt.0 && sudo rmdir source/.snap/snap6'
2016-11-16T08:49:20.295 INFO:teuthology.orchestra.run.magna079.stderr:rmdir: failed to remove ‘source/.snap/snap6’: Stale file handle
2016-11-16T08:49:20.296 ERROR:tasks.rsync.rsync:Exception in do_rsync:
Traceback (most recent call last):
File "/home/rperiyas/ceph-qa-suite/tasks/rsync.py", line 106, in _run
self.do_rsync()
File "/home/rperiyas/ceph-qa-suite/tasks/rsync.py", line 176, in do_rsync
self.my_mnt.run_shell(["rmdir", "{}".format(self.snapShot)])
File "/home/rperiyas/ceph-qa-suite/tasks/cephfs/mount.py", line 139, in run_shell
return self.client_remote.run(args=args, stdout=StringIO(), wait=wait)
File "/home/rperiyas/teuthology/teuthology/orchestra/remote.py", line 194, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/rperiyas/teuthology/teuthology/orchestra/run.py", line 402, in run
r.wait()
File "/home/rperiyas/teuthology/teuthology/orchestra/run.py", line 166, in wait
label=self.label)
CommandFailedError: Command failed on magna079 with status 1: 'cd /home/ubuntu/cephtest/mnt.0 && sudo rmdir source/.snap/snap6'

Steps to reproduce:
1. Configure Ceph Fuse and mount
2. copy some data
3. create snapshot (create snapshot with different name and remove it in step5)
4. rsync snap data to another directory in same mount
5. delete snapshot
6. delete source data
7. do step 2 to 6 in a loop

Getting this problem 2/20 iteration or 1/20 iteration.

Attaching teuthology run logs.

snapshot_stale.zip (75.2 KB) Ramakrishnan Periyasamy, 11/16/2016 02:33 PM

History

#1 Updated by John Spray almost 4 years ago

  • Project changed from www.ceph.com to fs
  • Category set to 89
  • Component(FS) MDS added

#2 Updated by Zheng Yan almost 4 years ago

please set debug_client=20 and try again

#3 Updated by Zheng Yan almost 4 years ago

  • Status changed from New to Rejected

you were testing snaphost on multimds setup. It's known broken. For now, please test snapshot only on single active mds setup.

#4 Updated by John Spray almost 4 years ago

Ah, I didn't notice that.

Rama: modify your .yaml configuration to only list one MDS daemon, or make sure that all but one have a "-s" suffix to the name (this makes them standbys).

It's a little bit confusing because when running python tests in tasks/cephfs/test_*.py, we don't have to worry about it as those tests recreate their own filesystems. However, when running e.g. just rsync+workunit, you need to watch out for how many MDSs are in your config.

Also available in: Atom PDF