Project

General

Profile

Actions

Bug #17391

closed

teuthology-nuke should lazy-unmount fuse mounts

Added by Yuri Weinstein over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Maybe new occurrence of #17084

See full log http://paste2.org/Y96czkVh

command line

yuriw@teuthology ~ [16:19:57]> for owner in $(teuthology-lock --summary | grep -i scheduled | awk '{ print $4 }' | sort -u); do echo $owner; echo "========"; teuthology-nuke --stale --owner $owner --unlock --noipmi; echo "======== "; done
2016-09-23 16:14:30,365.365 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-0’, since it's on a different device
2016-09-23 16:14:30,365.365 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-2’, since it's on a different device
2016-09-23 16:14:30,366.366 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-1’, since it's on a different device
2016-09-23 16:14:30,366.366 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 83, in __exit__
    for result in self:
  File "/home/yuriw/teuthology/teuthology/parall2016-09-23 16:14:30,365.365 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-0’, since it's on a different device
2016-09-23 16:14:30,365.365 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-2’, since it's on a different device
2016-09-23 16:14:30,366.366 INFO:teuthology.orchestra.run.smithi042.stderr:rm: skipping ‘/var/lib/ceph/osd/ceph-1’, since it's on a different device
2016-09-23 16:14:30,366.366 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 83, in __exit__
    for result in self:
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/yuriw/teuthology/teuthology/task/install.py", line 340, in _purge_data
    'rm', '-rf', '--one-file-system', '--', '/var/lib/ceph',
  File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 192, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 403, in run
    r.wait()
  File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 166, in wait
    label=self.label)
CommandFailedError: Command failed on smithi042 with status 1: "sudo rm -rf --one-file-system -- /var/lib/ceph || true ; test -d /var/lib/ceph && sudo find /var/lib/ceph -mindepth 1 -maxdepth 2 -type d -exec umount '{}' ';' ; sudo rm -rf --one-file-system -- /var/lib/ceph" 
2016-09-23 16:14:30,367.367 ERROR:teuthology.nuke:Could not nuke {u'smithi042.front.sepia.ceph.com': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPDyfDLTKxzPJp18SVzilGgNFlYEDrps6FxzmBSGE5I1HOceCbiW2e4k7zSq9uqWGV57KirMafkbNu89qk2pOMeJMw9mhJa5DDrvvm4r39X5KNn3DdDg4zdK64+Rcn+X8Vic3SucZshh0A+v8u8NdmTZm/innViGHQl97WzyEe10S5tnBfRq1U0zUwiKG9yxJbQAA6ICFrZS8tNKNlGMWr6By1M13Z9ZAvvm3gZ7iNXPU06ruw4a9mhZj5cIMhs/jbmcdFXaGJ50oFhJUUYigND7OheaR8vxLBDdJtisTY0Ls+3kGsmzCk2B6NeJz57n2l9TTOCmVXjR9uW6nppVBd'}
Traceback (most recent call last):
  File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 280, in nuke_one
    nuke_helper(ctx, should_unlock)
  File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 330, in nuke_helper
    remove_installed_packages(ctx)
  File "/home/yuriw/teuthology/teuthology/nuke/actions.py", line 307, in remove_installed_packages
    install_task.purge_data(ctx)
  File "/home/yuriw/teuthology/teuthology/task/install.py", line 314, in purge_data
    p.spawn(_purge_data, remote)
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 83, in __exit__
    for result in self:
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/yuriw/teuthology/teuthology/task/install.py", line 340, in _purge_data
    'rm', '-rf', '--one-file-system', '--', '/var/lib/ceph',
  File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 192, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 403, in run
    r.wait()
  File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 166, in wait
    label=self.label)
CommandFailedError: Command failed on smithi042 with status 1: "sudo rm -rf --one-file-system -- /var/lib/ceph || true ; test -d /var/lib/ceph && sudo find /var/lib/ceph -mindepth 1 -maxdepth 2 -type d -exec umount '{}' ';' ; sudo rm -rf --one-file-system -- /var/lib/ceph" 
2016-09-23 16:14:30,369.369 ERROR:teuthology.nuke:Could not nuke the following targets:
targets:
  smithi042.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPDyfDLTKxzPJp18SVzilGgNFlYEDrps6FxzmBSGE5I1HOceCbiW2e4k7zSq9uqWGV57KirMafkbNu89qk2pOMeJMw9mhJa5DDrvvm4r39X5KNn3DdDg4zdK64+Rcn+X8Vic3SucZshh0A+v8u8NdmTZm/innViGHQl97WzyEe10S5tnBfRq1U0zUwiKG9yxJbQAA6ICFrZS8tNKNlGMWr6By1M13Z9ZAvvm3gZ7iNXPU06ruw4a9mhZj5cIMhs/jbmcdFXaGJ50oFhJUUYigND7OheaR8vxLBDdJtisTY0Ls+3kGsmzCk2B6NeJz57n2l9TTOCmVXjR9uW6nppVBd
  smithi111.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSah4cXIMiBX/iRuFcoeh+b9A44nRY1c2Epvs9fVN/0VQCLKPD5Rk//N4QdgFpaxYcYjEhAwW5mD5wh0LPnvcGcgmTiSzn+fmsMsI777Vw5dZovHr5BSifrhUHFwIz2PzjMBd9bPNRFKpkEFXwG0/DGRc4XIYWMAOjj884f+2MVi9zeMuxuLo7CzWZy6pg/8LJJk8vdfg53GdXkFQTliqZEUM8Fvkg1LRrA+xUVdq5fN9kvTZ89zN9k1vnodgXAEtjmEowrcJIpbByCZgahytH/7mc8VYAhpL/83WtM/6OTROH3yFp1rMYrkvr4c5WQ6UFTqe8PU+/h+xNxPsaXMRp
2016-09-23 16:14:31,375.375 INFO:teuthology.nuke:targets:
el.py", line 101, in next
    resurrect_traceback(result)
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/yuriw/teuthology/teuthology/task/install.py", line 340, in _purge_data
    'rm', '-rf', '--one-file-system', '--', '/var/lib/ceph',
  File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 192, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 403, in run
    r.wait()
  File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 166, in wait
    label=self.label)
CommandFailedError: Command failed on smithi042 with status 1: "sudo rm -rf --one-file-system -- /var/lib/ceph || true ; test -d /var/lib/ceph && sudo find /var/lib/ceph -mindepth 1 -maxdepth 2 -type d -exec umount '{}' ';' ; sudo rm -rf --one-file-system -- /var/lib/ceph" 
2016-09-23 16:14:30,367.367 ERROR:teuthology.nuke:Could not nuke {u'smithi042.front.sepia.ceph.com': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPDyfDLTKxzPJp18SVzilGgNFlYEDrps6FxzmBSGE5I1HOceCbiW2e4k7zSq9uqWGV57KirMafkbNu89qk2pOMeJMw9mhJa5DDrvvm4r39X5KNn3DdDg4zdK64+Rcn+X8Vic3SucZshh0A+v8u8NdmTZm/innViGHQl97WzyEe10S5tnBfRq1U0zUwiKG9yxJbQAA6ICFrZS8tNKNlGMWr6By1M13Z9ZAvvm3gZ7iNXPU06ruw4a9mhZj5cIMhs/jbmcdFXaGJ50oFhJUUYigND7OheaR8vxLBDdJtisTY0Ls+3kGsmzCk2B6NeJz57n2l9TTOCmVXjR9uW6nppVBd'}
Traceback (most recent call last):
  File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 280, in nuke_one
    nuke_helper(ctx, should_unlock)
  File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 330, in nuke_helper
    remove_installed_packages(ctx)
  File "/home/yuriw/teuthology/teuthology/nuke/actions.py", line 307, in remove_installed_packages
    install_task.purge_data(ctx)
  File "/home/yuriw/teuthology/teuthology/task/install.py", line 314, in purge_data
    p.spawn(_purge_data, remote)
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 83, in __exit__
    for result in self:
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/yuriw/teuthology/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/yuriw/teuthology/teuthology/task/install.py", line 340, in _purge_data
    'rm', '-rf', '--one-file-system', '--', '/var/lib/ceph',
  File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 192, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 403, in run
    r.wait()
  File "/home/yuriw/teuthology/teuthology/orchestra/run.py", line 166, in wait
    label=self.label)
CommandFailedError: Command failed on smithi042 with status 1: "sudo rm -rf --one-file-system -- /var/lib/ceph || true ; test -d /var/lib/ceph && sudo find /var/lib/ceph -mindepth 1 -maxdepth 2 -type d -exec umount '{}' ';' ; sudo rm -rf --one-file-system -- /var/lib/ceph" 
2016-09-23 16:14:30,369.369 ERROR:teuthology.nuke:Could not nuke the following targets:
targets:
  smithi042.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPDyfDLTKxzPJp18SVzilGgNFlYEDrps6FxzmBSGE5I1HOceCbiW2e4k7zSq9uqWGV57KirMafkbNu89qk2pOMeJMw9mhJa5DDrvvm4r39X5KNn3DdDg4zdK64+Rcn+X8Vic3SucZshh0A+v8u8NdmTZm/innViGHQl97WzyEe10S5tnBfRq1U0zUwiKG9yxJbQAA6ICFrZS8tNKNlGMWr6By1M13Z9ZAvvm3gZ7iNXPU06ruw4a9mhZj5cIMhs/jbmcdFXaGJ50oFhJUUYigND7OheaR8vxLBDdJtisTY0Ls+3kGsmzCk2B6NeJz57n2l9TTOCmVXjR9uW6nppVBd
  smithi111.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSah4cXIMiBX/iRuFcoeh+b9A44nRY1c2Epvs9fVN/0VQCLKPD5Rk//N4QdgFpaxYcYjEhAwW5mD5wh0LPnvcGcgmTiSzn+fmsMsI777Vw5dZovHr5BSifrhUHFwIz2PzjMBd9bPNRFKpkEFXwG0/DGRc4XIYWMAOjj884f+2MVi9zeMuxuLo7CzWZy6pg/8LJJk8vdfg53GdXkFQTliqZEUM8Fvkg1LRrA+xUVdq5fN9kvTZ89zN9k1vnodgXAEtjmEowrcJIpbByCZgahytH/7mc8VYAhpL/83WtM/6OTROH3yFp1rMYrkvr4c5WQ6UFTqe8PU+/h+xNxPsaXMRp
2016-09-23 16:14:31,375.375 INFO:teuthology.nuke:targets:

Actions #1

Updated by Yuri Weinstein over 7 years ago

  • Description updated (diff)
Actions #2

Updated by Zack Cerza over 7 years ago

You left out the real error:

2016-09-23 16:14:30,359.359 INFO:teuthology.orchestra.run.smithi042.stderr:umount: /var/lib/ceph/osd/ceph-1: device is busy.
2016-09-23 16:14:30,360.360 INFO:teuthology.orchestra.run.smithi042.stderr:        (In some cases useful info about processes that use
2016-09-23 16:14:30,360.360 INFO:teuthology.orchestra.run.smithi042.stderr:         the device is found by lsof(8) or fuser(1))

Actions #4

Updated by Zack Cerza over 7 years ago

root@smithi042:~# mount | grep '/var/lib'
/dev/nvme0n1p2 on /var/lib/ceph/osd/ceph-0 type btrfs (rw,noatime,user_subvol_rm_allowed)
/dev/nvme0n1p3 on /var/lib/ceph/osd/ceph-1 type btrfs (rw,noatime,user_subvol_rm_allowed)
/dev/nvme0n1p1 on /var/lib/ceph/osd/ceph-2 type btrfs (rw,noatime,user_subvol_rm_allowed)
foo on /var/lib/ceph/osd/ceph-0/fuse type fuse.foo (rw,nosuid,nodev)
foo on /var/lib/ceph/osd/ceph-1/fuse type fuse.foo (rw,nosuid,nodev)
foo on /var/lib/ceph/osd/ceph-2/fuse type fuse.foo (rw,nosuid,nodev)
Actions #5

Updated by Zack Cerza over 7 years ago

Probably unrelated:

[Fri Sep 23 06:44:38 2016] init: tgt main process (28249) killed by KILL signal
[Fri Sep 23 17:44:17 2016] pmlogcheck[17685]: segfault at 0 ip 00000000004017ec sp 00007fff6f18ffc0 error 4 in pmlogcheck[400000+4000]
[Fri Sep 23 17:44:17 2016] pmlogcheck[17687]: segfault at 0 ip 00000000004017ec sp 00007fffe6ab6ac0 error 4 in pmlogcheck[400000+4000]

Actions #6

Updated by Zack Cerza over 7 years ago

root@smithi042:~# umount /var/lib/ceph/osd/ceph-0
umount: /var/lib/ceph/osd/ceph-0: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
root@smithi042:~# lsof | grep '/var/lib/ceph'
lsof: WARNING: can't stat() fuse.foo file system /var/lib/ceph/osd/ceph-0/fuse
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.foo file system /var/lib/ceph/osd/ceph-1/fuse
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.foo file system /var/lib/ceph/osd/ceph-2/fuse
      Output information may be incomplete.
root@smithi042:~# fuser '/var/lib/ceph'
root@smithi042:~# fuser '/var/lib/ceph/osd'
root@smithi042:~# fuser '/var/lib/ceph/osd/ceph-0'
root@smithi042:~# fuser '/var/lib/ceph/osd/ceph-0/fuse'
Cannot stat /var/lib/ceph/osd/ceph-0/fuse: Transport endpoint is not connected
Actions #7

Updated by Zack Cerza over 7 years ago

root@smithi042:~# umount -f /var/lib/ceph/osd/ceph-0
umount2: Device or resource busy
umount: /var/lib/ceph/osd/ceph-0: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
root@smithi042:~# umount -l /var/lib/ceph/osd/ceph-0
root@smithi042:~# rm -rf /var/lib/ceph/osd/ceph-0
foo on /var/lib/ceph/osd/ceph-0/fuse type fuse.foo (rw,nosuid,nodev)
root@smithi042:~# umount -l /var/lib/ceph/osd/ceph-1
root@smithi042:~# umount -l /var/lib/ceph/osd/ceph-2
root@smithi042:~# ls /var/lib/ceph/osd/
ceph-1  ceph-2
root@smithi042:~# mount | grep ceph-
foo on /var/lib/ceph/osd/ceph-0/fuse type fuse.foo (rw,nosuid,nodev)
foo on /var/lib/ceph/osd/ceph-1/fuse type fuse.foo (rw,nosuid,nodev)
foo on /var/lib/ceph/osd/ceph-2/fuse type fuse.foo (rw,nosuid,nodev)
root@smithi042:~# rm -rf --one-file-system -- /var/lib/ceph
root@smithi042:~# ls -l /var/lib/ceph
ls: cannot access /var/lib/ceph: No such file or directory
Actions #8

Updated by Zack Cerza over 7 years ago

  • Subject changed from teuthology-nuke failed to nuke to teuthology-nuke should lazy-unmount fuse mounts
  • Status changed from New to 12

nuke just succeeded on smithi042. So the 'lazy unmount' is all we need here.

Actions #9

Updated by Zack Cerza over 7 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Zack Cerza

Here's my PR that hasn't been tested yet:
https://github.com/ceph/teuthology/pull/960

Just need a node or two to test with.

Actions #10

Updated by Zack Cerza over 7 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF