Actions
Bug #17391
closedteuthology-nuke should lazy-unmount fuse mounts
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
Maybe new occurrence of #17084
See full log http://paste2.org/Y96czkVh
command line
yuriw@teuthology ~ [16:19:57]> for owner in $(teuthology-lock --summary | grep -i scheduled | awk '{ print $4 }' | sort -u); do echo $owner; echo "========"; teuthology-nuke --stale --owner $owner --unlock --noipmi; echo "======== "; done
2016-09-23 16:14:30,365.365 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-0’, since it's on a different device 2016-09-23 16:14:30,365.365 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-2’, since it's on a different device 2016-09-23 16:14:30,366.366 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-1’, since it's on a different device 2016-09-23 16:14:30,366.366 ERROR:teuthology.parallel:Exception in parallel execution Traceback (most recent call last): File "/home/yuriw/teuthology/teuthology/parallel.py", line 83, in __exit__ for result in self: File "/home/yuriw/teuthology/teuthology/parall2016-09-23 16:14:30,365.365 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-0’, since it's on a different device 2016-09-23 16:14:30,365.365 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-2’, since it's on a different device 2016-09-23 16:14:30,366.366 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-1’, since it's on a different device 2016-09-23 16:14:30,366.366 ERROR:teuthology.parallel:Exception in parallel execution Traceback (most recent call last): File "/home/yuriw/teuthology/teuthology/parallel.py", line 83, in __exit__ for result in self: File "/home/yuriw/teuthology/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/yuriw/teuthology/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/yuriw/teuthology/teuthology/task/install.py", line 340, in _purge_data 'rm', '-rf', '--one-file-system', '--', '/var/lib/ceph', File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 192, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 403, in run r.wait() File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 166, in wait label=self.label) CommandFailedError: Command failed on smithi042 with status 1: "sudo rm -rf --one-file-system -- /var/lib/ceph || true ; test -d /var/lib/ceph && sudo find /var/lib/ceph -mindepth 1 -maxdepth 2 -type d -exec umount '{}' ';' ; sudo rm -rf --one-file-system -- /var/lib/ceph" 2016-09-23 16:14:30,367.367 ERROR:teuthology.nuke:Could not nuke {u'smithi042.front.sepia.ceph.com': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPDyfDLTKxzPJp18SVzilGgNFlYEDrps6FxzmBSGE5I1HOceCbiW2e4k7zSq9uqWGV57KirMafkbNu89qk2pOMeJMw9mhJa5DDrvvm4r39X5KNn3DdDg4zdK64+Rcn+X8Vic3SucZshh0A+v8u8NdmTZm/innViGHQl97WzyEe10S5tnBfRq1U0zUwiKG9yxJbQAA6ICFrZS8tNKNlGMWr6By1M13Z9ZAvvm3gZ7iNXPU06ruw4a9mhZj5cIMhs/jbmcdFXaGJ50oFhJUUYigND7OheaR8vxLBDdJtisTY0Ls+3kGsmzCk2B6NeJz57n2l9TTOCmVXjR9uW6nppVBd'} Traceback (most recent call last): File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 280, in nuke_one nuke_helper(ctx, should_unlock) File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 330, in nuke_helper remove_installed_packages(ctx) File "/home/yuriw/teuthology/teuthology/nuke/actions.py", line 307, in remove_installed_packages install_task.purge_data(ctx) File "/home/yuriw/teuthology/teuthology/task/install.py", line 314, in purge_data p.spawn(_purge_data, remote) File "/home/yuriw/teuthology/teuthology/parallel.py", line 83, in __exit__ for result in self: File "/home/yuriw/teuthology/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/yuriw/teuthology/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/yuriw/teuthology/teuthology/task/install.py", line 340, in _purge_data 'rm', '-rf', '--one-file-system', '--', '/var/lib/ceph', File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 192, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 403, in run r.wait() File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 166, in wait label=self.label) CommandFailedError: Command failed on smithi042 with status 1: "sudo rm -rf --one-file-system -- /var/lib/ceph || true ; test -d /var/lib/ceph && sudo find /var/lib/ceph -mindepth 1 -maxdepth 2 -type d -exec umount '{}' ';' ; sudo rm -rf --one-file-system -- /var/lib/ceph" 2016-09-23 16:14:30,369.369 ERROR:teuthology.nuke:Could not nuke the following targets: targets: smithi042.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPDyfDLTKxzPJp18SVzilGgNFlYEDrps6FxzmBSGE5I1HOceCbiW2e4k7zSq9uqWGV57KirMafkbNu89qk2pOMeJMw9mhJa5DDrvvm4r39X5KNn3DdDg4zdK64+Rcn+X8Vic3SucZshh0A+v8u8NdmTZm/innViGHQl97WzyEe10S5tnBfRq1U0zUwiKG9yxJbQAA6ICFrZS8tNKNlGMWr6By1M13Z9ZAvvm3gZ7iNXPU06ruw4a9mhZj5cIMhs/jbmcdFXaGJ50oFhJUUYigND7OheaR8vxLBDdJtisTY0Ls+3kGsmzCk2B6NeJz57n2l9TTOCmVXjR9uW6nppVBd smithi111.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSah4cXIMiBX/iRuFcoeh+b9A44nRY1c2Epvs9fVN/0VQCLKPD5Rk//N4QdgFpaxYcYjEhAwW5mD5wh0LPnvcGcgmTiSzn+fmsMsI777Vw5dZovHr5BSifrhUHFwIz2PzjMBd9bPNRFKpkEFXwG0/DGRc4XIYWMAOjj884f+2MVi9zeMuxuLo7CzWZy6pg/8LJJk8vdfg53GdXkFQTliqZEUM8Fvkg1LRrA+xUVdq5fN9kvTZ89zN9k1vnodgXAEtjmEowrcJIpbByCZgahytH/7mc8VYAhpL/83WtM/6OTROH3yFp1rMYrkvr4c5WQ6UFTqe8PU+/h+xNxPsaXMRp 2016-09-23 16:14:31,375.375 INFO:teuthology.nuke:targets: el.py", line 101, in next resurrect_traceback(result) File "/home/yuriw/teuthology/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/yuriw/teuthology/teuthology/task/install.py", line 340, in _purge_data 'rm', '-rf', '--one-file-system', '--', '/var/lib/ceph', File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 192, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 403, in run r.wait() File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 166, in wait label=self.label) CommandFailedError: Command failed on smithi042 with status 1: "sudo rm -rf --one-file-system -- /var/lib/ceph || true ; test -d /var/lib/ceph && sudo find /var/lib/ceph -mindepth 1 -maxdepth 2 -type d -exec umount '{}' ';' ; sudo rm -rf --one-file-system -- /var/lib/ceph" 2016-09-23 16:14:30,367.367 ERROR:teuthology.nuke:Could not nuke {u'smithi042.front.sepia.ceph.com': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPDyfDLTKxzPJp18SVzilGgNFlYEDrps6FxzmBSGE5I1HOceCbiW2e4k7zSq9uqWGV57KirMafkbNu89qk2pOMeJMw9mhJa5DDrvvm4r39X5KNn3DdDg4zdK64+Rcn+X8Vic3SucZshh0A+v8u8NdmTZm/innViGHQl97WzyEe10S5tnBfRq1U0zUwiKG9yxJbQAA6ICFrZS8tNKNlGMWr6By1M13Z9ZAvvm3gZ7iNXPU06ruw4a9mhZj5cIMhs/jbmcdFXaGJ50oFhJUUYigND7OheaR8vxLBDdJtisTY0Ls+3kGsmzCk2B6NeJz57n2l9TTOCmVXjR9uW6nppVBd'} Traceback (most recent call last): File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 280, in nuke_one nuke_helper(ctx, should_unlock) File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 330, in nuke_helper remove_installed_packages(ctx) File "/home/yuriw/teuthology/teuthology/nuke/actions.py", line 307, in remove_installed_packages install_task.purge_data(ctx) File "/home/yuriw/teuthology/teuthology/task/install.py", line 314, in purge_data p.spawn(_purge_data, remote) File "/home/yuriw/teuthology/teuthology/parallel.py", line 83, in __exit__ for result in self: File "/home/yuriw/teuthology/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/yuriw/teuthology/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/yuriw/teuthology/teuthology/task/install.py", line 340, in _purge_data 'rm', '-rf', '--one-file-system', '--', '/var/lib/ceph', File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 192, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 403, in run r.wait() File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 166, in wait label=self.label) CommandFailedError: Command failed on smithi042 with status 1: "sudo rm -rf --one-file-system -- /var/lib/ceph || true ; test -d /var/lib/ceph && sudo find /var/lib/ceph -mindepth 1 -maxdepth 2 -type d -exec umount '{}' ';' ; sudo rm -rf --one-file-system -- /var/lib/ceph" 2016-09-23 16:14:30,369.369 ERROR:teuthology.nuke:Could not nuke the following targets: targets: smithi042.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPDyfDLTKxzPJp18SVzilGgNFlYEDrps6FxzmBSGE5I1HOceCbiW2e4k7zSq9uqWGV57KirMafkbNu89qk2pOMeJMw9mhJa5DDrvvm4r39X5KNn3DdDg4zdK64+Rcn+X8Vic3SucZshh0A+v8u8NdmTZm/innViGHQl97WzyEe10S5tnBfRq1U0zUwiKG9yxJbQAA6ICFrZS8tNKNlGMWr6By1M13Z9ZAvvm3gZ7iNXPU06ruw4a9mhZj5cIMhs/jbmcdFXaGJ50oFhJUUYigND7OheaR8vxLBDdJtisTY0Ls+3kGsmzCk2B6NeJz57n2l9TTOCmVXjR9uW6nppVBd smithi111.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSah4cXIMiBX/iRuFcoeh+b9A44nRY1c2Epvs9fVN/0VQCLKPD5Rk//N4QdgFpaxYcYjEhAwW5mD5wh0LPnvcGcgmTiSzn+fmsMsI777Vw5dZovHr5BSifrhUHFwIz2PzjMBd9bPNRFKpkEFXwG0/DGRc4XIYWMAOjj884f+2MVi9zeMuxuLo7CzWZy6pg/8LJJk8vdfg53GdXkFQTliqZEUM8Fvkg1LRrA+xUVdq5fN9kvTZ89zN9k1vnodgXAEtjmEowrcJIpbByCZgahytH/7mc8VYAhpL/83WtM/6OTROH3yFp1rMYrkvr4c5WQ6UFTqe8PU+/h+xNxPsaXMRp 2016-09-23 16:14:31,375.375 INFO:teuthology.nuke:targets:
Updated by Zack Cerza over 7 years ago
You left out the real error:
2016-09-23 16:14:30,359.359 INFO:teuthology.orchestra.run.smithi042.stderr:umount: /var/lib/ceph/osd/ceph-1: device is busy. 2016-09-23 16:14:30,360.360 INFO:teuthology.orchestra.run.smithi042.stderr: (In some cases useful info about processes that use 2016-09-23 16:14:30,360.360 INFO:teuthology.orchestra.run.smithi042.stderr: the device is found by lsof(8) or fuser(1))
Updated by Zack Cerza over 7 years ago
smithi042
was last used in http://pulpito.front.sepia.ceph.com/samuelj-2016-09-22_17:41:48-rados-wip-sam-working---basic-smithi/430918/
Updated by Zack Cerza over 7 years ago
root@smithi042:~# mount | grep '/var/lib' /dev/nvme0n1p2 on /var/lib/ceph/osd/ceph-0 type btrfs (rw,noatime,user_subvol_rm_allowed) /dev/nvme0n1p3 on /var/lib/ceph/osd/ceph-1 type btrfs (rw,noatime,user_subvol_rm_allowed) /dev/nvme0n1p1 on /var/lib/ceph/osd/ceph-2 type btrfs (rw,noatime,user_subvol_rm_allowed) foo on /var/lib/ceph/osd/ceph-0/fuse type fuse.foo (rw,nosuid,nodev) foo on /var/lib/ceph/osd/ceph-1/fuse type fuse.foo (rw,nosuid,nodev) foo on /var/lib/ceph/osd/ceph-2/fuse type fuse.foo (rw,nosuid,nodev)
Updated by Zack Cerza over 7 years ago
Probably unrelated:
[Fri Sep 23 06:44:38 2016] init: tgt main process (28249) killed by KILL signal [Fri Sep 23 17:44:17 2016] pmlogcheck[17685]: segfault at 0 ip 00000000004017ec sp 00007fff6f18ffc0 error 4 in pmlogcheck[400000+4000] [Fri Sep 23 17:44:17 2016] pmlogcheck[17687]: segfault at 0 ip 00000000004017ec sp 00007fffe6ab6ac0 error 4 in pmlogcheck[400000+4000]
Updated by Zack Cerza over 7 years ago
root@smithi042:~# umount /var/lib/ceph/osd/ceph-0 umount: /var/lib/ceph/osd/ceph-0: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) root@smithi042:~# lsof | grep '/var/lib/ceph' lsof: WARNING: can't stat() fuse.foo file system /var/lib/ceph/osd/ceph-0/fuse Output information may be incomplete. lsof: WARNING: can't stat() fuse.foo file system /var/lib/ceph/osd/ceph-1/fuse Output information may be incomplete. lsof: WARNING: can't stat() fuse.foo file system /var/lib/ceph/osd/ceph-2/fuse Output information may be incomplete. root@smithi042:~# fuser '/var/lib/ceph' root@smithi042:~# fuser '/var/lib/ceph/osd' root@smithi042:~# fuser '/var/lib/ceph/osd/ceph-0' root@smithi042:~# fuser '/var/lib/ceph/osd/ceph-0/fuse' Cannot stat /var/lib/ceph/osd/ceph-0/fuse: Transport endpoint is not connected
Updated by Zack Cerza over 7 years ago
root@smithi042:~# umount -f /var/lib/ceph/osd/ceph-0 umount2: Device or resource busy umount: /var/lib/ceph/osd/ceph-0: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy root@smithi042:~# umount -l /var/lib/ceph/osd/ceph-0 root@smithi042:~# rm -rf /var/lib/ceph/osd/ceph-0 foo on /var/lib/ceph/osd/ceph-0/fuse type fuse.foo (rw,nosuid,nodev) root@smithi042:~# umount -l /var/lib/ceph/osd/ceph-1 root@smithi042:~# umount -l /var/lib/ceph/osd/ceph-2 root@smithi042:~# ls /var/lib/ceph/osd/ ceph-1 ceph-2 root@smithi042:~# mount | grep ceph- foo on /var/lib/ceph/osd/ceph-0/fuse type fuse.foo (rw,nosuid,nodev) foo on /var/lib/ceph/osd/ceph-1/fuse type fuse.foo (rw,nosuid,nodev) foo on /var/lib/ceph/osd/ceph-2/fuse type fuse.foo (rw,nosuid,nodev) root@smithi042:~# rm -rf --one-file-system -- /var/lib/ceph root@smithi042:~# ls -l /var/lib/ceph ls: cannot access /var/lib/ceph: No such file or directory
Updated by Zack Cerza over 7 years ago
- Subject changed from teuthology-nuke failed to nuke to teuthology-nuke should lazy-unmount fuse mounts
- Status changed from New to 12
nuke just succeeded on smithi042
. So the 'lazy unmount' is all we need here.
Updated by Zack Cerza over 7 years ago
- Status changed from 12 to In Progress
- Assignee set to Zack Cerza
Here's my PR that hasn't been tested yet:
https://github.com/ceph/teuthology/pull/960
Just need a node or two to test with.
Updated by Zack Cerza over 7 years ago
- Status changed from In Progress to Resolved
Actions