Project

General

Profile

Actions

Bug #50260

open

pacific: qa: "rmdir: failed to remove '/home/ubuntu/cephtest': Directory not empty"

Added by Patrick Donnelly about 3 years ago. Updated 3 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-04-07T19:26:29.037 INFO:teuthology.orchestra.run.smithi038.stderr:rmdir: failed to remove '/home/ubuntu/cephtest': Directory not empty
2021-04-07T19:26:29.037 ERROR:teuthology.run_tasks:Manager failed: internal.base
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/run_tasks.py", line 176, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/task/internal/__init__.py", line 56, in base
    wait=False,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 479, in wait
    proc.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_85d61eae4759f46ce21e9a37cd816a7a1a66c9d5/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi038 with status 1: 'find /home/ubuntu/cephtest -ls ; rmdir -- /home/ubuntu/cephtest'
2021-04-07T19:26:29.038 DEBUG:teuthology.run_tasks:Unwinding manager console_log
2021-04-07T19:26:29.054 DEBUG:teuthology.run_tasks:Exception was not quenched, exiting: CommandFailedError: Command failed on smithi038 with status 1: 'find /home/ubuntu/cephtest -ls ; rmdir -- /home/ubuntu/cephtest'

From: /ceph/teuthology-archive/yuriw-2021-04-07_17:37:43-fs-wip-yuri-testing-2021-04-07-0905-pacific-distro-basic-smithi/6027405/teuthology.log

Actions #2

Updated by Xiubo Li 3 days ago · Edited

/a/yuriw-2024-04-09_14:58:21-upgrade-wip-yuri4-testing-2024-04-08-1432-distro-default-smithi/7648870

The mnt.0 was unmounted at 2024-04-10T02:13:47.301:

2024-04-10T02:13:47.301 INFO:tasks.cephfs.fuse_mount:Running fusermount -u on ubuntu@smithi016.front.sepia.ceph.com...
2024-04-10T02:13:47.301 INFO:teuthology.orchestra.run:Running command with timeout 300
2024-04-10T02:13:47.301 DEBUG:teuthology.orchestra.run.smithi016:> sudo fusermount -u /home/ubuntu/cephtest/mnt.0
2024-04-10T02:13:47.328 INFO:teuthology.orchestra.run:waiting for 300
...

But failed after 300s:

2024-04-10T02:18:46.398 ERROR:tasks.cephfs.fuse_mount:process failed to terminate after unmount. This probably indicates a bug within ceph-fuse.
2024-04-10T02:18:46.403 ERROR:teuthology.run_tasks:Manager failed: ceph-fuse
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/run_tasks.py", line 154, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3a866975a60f6dc859246f6f19fd8ddb4db2cc2e/qa/tasks/ceph_fuse.py", line 183, in task
    mount.umount_wait()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3a866975a60f6dc859246f6f19fd8ddb4db2cc2e/qa/tasks/cephfs/fuse_mount.py", line 403, in umount_wait
    run.wait([self.fuse_daemon], timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/orchestra/run.py", line 473, in wait
    check_time()
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (51) after waiting for 300 seconds
2024-04-10T02:18:46.411 DEBUG:teuthology.run_tasks:Unwinding manager cephadm
2024-04-10T02:18:46.422 INFO:tasks.cephadm:Teardown begin
2024-04-10T02:18:46.422 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/contextutil.py", line 32, in nested
    yield vars
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3a866975a60f6dc859246f6f19fd8ddb4db2cc2e/qa/tasks/cephadm.py", line 1807, in task
    yield
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/run_tasks.py", line 154, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3a866975a60f6dc859246f6f19fd8ddb4db2cc2e/qa/tasks/ceph_fuse.py", line 183, in task
    mount.umount_wait()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3a866975a60f6dc859246f6f19fd8ddb4db2cc2e/qa/tasks/cephfs/fuse_mount.py", line 403, in umount_wait
    run.wait([self.fuse_daemon], timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/orchestra/run.py", line 473, in wait
    check_time()
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (51) after waiting for 300 seconds

...

2024-04-10T02:21:25.294 DEBUG:teuthology.run_tasks:Unwinding manager internal.base
2024-04-10T02:21:25.322 INFO:teuthology.task.internal:Tidying up after the test...
2024-04-10T02:21:25.322 DEBUG:teuthology.orchestra.run.smithi016:> find /home/ubuntu/cephtest -ls ; rmdir -- /home/ubuntu/cephtest
2024-04-10T02:21:25.324 DEBUG:teuthology.orchestra.run.smithi082:> find /home/ubuntu/cephtest -ls ; rmdir -- /home/ubuntu/cephtest
2024-04-10T02:21:25.336 INFO:teuthology.orchestra.run.smithi016.stdout:   265693      4 drwxr-xr-x   3 ubuntu   ubuntu       4096 Apr 10 02:21 /home/ubuntu/cephtest
2024-04-10T02:21:25.336 INFO:teuthology.orchestra.run.smithi016.stdout:   556285      4 d---------   2 ubuntu   ubuntu       4096 Apr 10 02:05 /home/ubuntu/cephtest/mnt.0
2024-04-10T02:21:25.337 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-04-10T02:21:25.337 INFO:teuthology.orchestra.run.smithi016.stderr:find: ‘/home/ubuntu/cephtest/mnt.0’: Permission denied
2024-04-10T02:21:25.337 INFO:teuthology.orchestra.run.smithi016.stderr:rmdir: failed to remove '/home/ubuntu/cephtest': Directory not empty
2024-04-10T02:21:25.338 ERROR:teuthology.run_tasks:Manager failed: internal.base
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/task/internal/__init__.py", line 48, in base
    yield
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/run_tasks.py", line 154, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3a866975a60f6dc859246f6f19fd8ddb4db2cc2e/qa/tasks/ceph_fuse.py", line 183, in task
    mount.umount_wait()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3a866975a60f6dc859246f6f19fd8ddb4db2cc2e/qa/tasks/cephfs/fuse_mount.py", line 403, in umount_wait
    run.wait([self.fuse_daemon], timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/orchestra/run.py", line 473, in wait
    check_time()
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (51) after waiting for 300 seconds

During handling of the above exception, another exception occurred:

While from the remote/smithi016/log/ceph-client.0.log.gz the client just received the unmount request at 2024-04-10T02:19:40.359, which is 360s later and finished in 30 seconds:

2024-04-10T02:19:40.359+0000 7f894925b300  2 client.14524 unmounting
...
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 put_inode on 0x1.head(faked_ino=0 nref=5 ll_ref=387 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130) n = 1
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 put_inode on 0x1.head(faked_ino=0 nref=5 ll_ref=387 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130) n = 1
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 _ll_drop_pins
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 _ll_put 0x7f8924008130 0x1 387 -> 0
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 put_inode on 0x1.head(faked_ino=0 nref=6 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130) n = 1
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 put_inode on 0x1.head(faked_ino=0 nref=6 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130) n = 1
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 _release 0x1.head(faked_ino=0 nref=7 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130)
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 _invalidate_inode_cache 0x1.head(faked_ino=0 nref=7 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130)
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 _flush 0x1.head(faked_ino=0 nref=8 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130)
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524  nothing to flush
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 put_inode on 0x1.head(faked_ino=0 nref=8 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130) n = 1
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 put_inode on 0x1.head(faked_ino=0 nref=8 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130) n = 1
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 flush_caps_sync
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 wait_sync_caps want 14641 (last is 14641, 0 total flushing)
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 trim_cache size 1 max 16384
2024-04-10T02:19:40.359+0000 7f894925b300 15 client.14524 trim_dentry unlinking dn client.0 in dir 0x1
2024-04-10T02:19:40.359+0000 7f894925b300 15 client.14524 unlink dir 0x7f8924008130 'client.0' dn 0x7f8918024650 inode 0
2024-04-10T02:19:40.359+0000 7f894925b300 15 client.14524 unlink  removing 'client.0' dn 0x7f8918024650
2024-04-10T02:19:40.359+0000 7f894925b300 15 client.14524 close_dir dir 0x7f89180223a0 on 0x7f8924008130
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 put_inode on 0x1.head(faked_ino=0 nref=8 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130) n = 1
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 _put_inode on 0x1.head(faked_ino=0 nref=8 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130) n = 7
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 remove_cap mds.0 on 0x1.head(faked_ino=0 nref=1 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130)
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 remove_cap mds.1 on 0x1.head(faked_ino=0 nref=1 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=pAsLsXs(1=pAsLsXs) COMPLETE has_dir_layout 0x7f8924008130)
2024-04-10T02:19:40.359+0000 7f894925b300 15 client.14524 remove_cap last one, closing snaprealm 0x7f8924008040
2024-04-10T02:19:40.359+0000 7f894925b300 20 client.14524 put_snap_realm 0x1 0x7f8924008040 1 -> 0
2024-04-10T02:19:40.359+0000 7f894925b300 10 client.14524 _put_inode deleting 0x1.head(faked_ino=0 nref=1 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2024-04-10T02:04:58.477988+0000 mtime=2024-04-10T02:09:25.419876+0000 ctime=2024-04-10T02:09:25.419876+0000 change_attr=5 caps=- COMPLETE has_dir_layout 0x7f8924008130)
2024-04-10T02:19:40.359+0000 7f894925b300  2 client.14524 _close_mds_session mds.0 seq 6797
2024-04-10T02:19:40.359+0000 7f894925b300  1 -- 192.168.0.1:0/3683042709 --> [v2:172.21.15.16:6826/2557678205,v1:172.21.15.16:6827/2557678205] -- client_session(request_close seq 6797) v5 -- 0x5564697db030 con 0x5564697d8520
2024-04-10T02:19:40.359+0000 7f894925b300  2 client.14524 _close_mds_session mds.1 seq 58307
2024-04-10T02:19:40.359+0000 7f894925b300  1 -- 192.168.0.1:0/3683042709 --> [v2:172.21.15.16:6828/2355422819,v1:172.21.15.16:6829/2355422819] -- client_session(request_close seq 58307) v5 -- 0x5564697247e0 con 0x5564697d5520
2024-04-10T02:19:40.359+0000 7f894925b300  2 client.14524 waiting for 2 mds session(s) to close (timeout: 30s)
2024-04-10T02:19:41.082+0000 7f894579a640  1 -- 192.168.0.1:0/3683042709 >> [v2:172.21.15.82:6816/3567172453,v1:172.21.15.82:6817/3567172453] conn(0x7f891042c680 msgr2=0x7f892551a430 unknown :-1 s=STATE_CONNECTING_RE l=1).process reconnect failed to v2:172.21.15.82:6816/3567172453
...
2024-04-10T02:20:10.359+0000 7f894925b300 10 client.14524 remove_session_caps mds.1
2024-04-10T02:20:10.359+0000 7f894925b300 10 client.14524 kick_requests_closed for mds.1
2024-04-10T02:20:10.359+0000 7f894925b300 20 client.14524 put_snap_realm 0x3 0x7f8925316da0 1 -> 0
2024-04-10T02:20:10.359+0000 7f894925b300 10 client.14524 unmount stopping remount finisher
2024-04-10T02:20:10.359+0000 7f894925b300  2 client.14524 unmounted.
2024-04-10T02:20:10.359+0000 7f894925b300  1 client.14524 shutdown
2024-04-10T02:20:10.359+0000 7f894925b300 10 client.14524 shutdown stopping cache invalidator finisher
2024-04-10T02:20:10.359+0000 7f894925b300 10 client.14524 shutdown stopping dentry invalidator finisher
2024-04-10T02:20:10.359+0000 7f894925b300 10 client.14524 shutdown stopping interrupt finisher

So there is not bug in ceph-fuse, but still not sure why the unmount request delayed to send to the client daemon.

Actions

Also available in: Atom PDF