Project

General

Profile

Actions

Bug #15098

closed

"stop ceph-all" no longer stops ceph

Added by Yuri Weinstein about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

on Trusty only
Run: http://pulpito.ceph.com/teuthology-2016-03-10_21:13:02-ceph-deploy-jewel-distro-basic-vps/
Jobs: ['52029', '52031', '52034']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-03-10_21:13:02-ceph-deploy-jewel-distro-basic-vps/52031/teuthology.log

2016-03-10T21:32:04.228 INFO:teuthology.orchestra.run.vpm036:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 
2016-03-10T21:32:04.250 INFO:teuthology.orchestra.run.vpm036:Running: 'sudo tar cz -f /tmp/tmpqamrX6 -C /var/lib/ceph/mon -- .'
2016-03-10T21:32:04.367 INFO:teuthology.orchestra.run.vpm036:Running: 'sudo chmod 0666 /tmp/tmpqamrX6'
2016-03-10T21:32:04.500 INFO:teuthology.orchestra.run.vpm036:Running: 'rm -fr /tmp/tmpqamrX6'
2016-03-10T21:32:04.507 DEBUG:teuthology.misc:Transferring archived files from vpm115:/var/lib/ceph/mon to /var/lib/teuthworker/archive/teuthology-2016-03-10_21:13:02-ceph-deploy-jewel-distro-basic-vps/52031/data/mon.vpm115.tgz
2016-03-10T21:32:04.508 INFO:teuthology.orchestra.run.vpm115:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 
2016-03-10T21:32:04.533 INFO:teuthology.orchestra.run.vpm115:Running: 'sudo tar cz -f /tmp/tmpBs3gcz -C /var/lib/ceph/mon -- .'
2016-03-10T21:32:04.683 INFO:teuthology.orchestra.run.vpm115.stderr:tar: ./ceph-vpm115/store.db/000006.log: file changed as we read it
2016-03-10T21:32:04.686 INFO:tasks.ceph_deploy:Removing ceph-deploy ...
2016-03-10T21:32:04.686 INFO:teuthology.orchestra.run.vpm115:Running: 'rm -rf /home/ubuntu/cephtest/ceph-deploy'
2016-03-10T21:32:04.716 INFO:teuthology.task.install:Removing shipped files: /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits...
2016-03-10T21:32:04.717 INFO:teuthology.orchestra.run.vpm036:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits'
2016-03-10T21:32:04.721 INFO:teuthology.orchestra.run.vpm115:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits'
2016-03-10T21:32:04.768 ERROR:teuthology.run_tasks:Manager failed: ceph-deploy
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 139, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/ceph_deploy.py", line 664, in task
    yield
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 44, in nested
    if exit(*exc):
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/ceph_deploy.py", line 408, in build_ceph_cluster
    path + '/' + role + '.tgz')
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 772, in pull_directory_tarball
    remote.get_tar(remotedir, localfile, sudo=True)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 348, in get_tar
    self.run(args=args)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 196, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait
    label=self.label)
CommandFailedError: Command failed on vpm115 with status 1: 'sudo tar cz -f /tmp/tmpBs3gcz -C /var/lib/ceph/mon -- .'
2016-03-10T21:32:04.772 DEBUG:teuthology.run_tasks:Unwinding manager ssh_keys
2016-03-10T21:32:04.799 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested
    yield vars
  File "/home/teuthworker/src/teuthology_master/teuthology/task/ssh_keys.py", line 206, in task
    yield
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 139, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/ceph_deploy.py", line 664, in task
    yield
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 44, in nested
    if exit(*exc):
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/ceph_deploy.py", line 408, in build_ceph_cluster
    path + '/' + role + '.tgz')
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 772, in pull_directory_tarball
    remote.get_tar(remotedir, localfile, sudo=True)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 348, in get_tar
    self.run(args=args)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 196, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait
    label=self.label)
CommandFailedError: Command failed on vpm115 with status 1: 'sudo tar cz -f /tmp/tmpBs3gcz -C /var/lib/ceph/mon -- .'

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #21482: "file changed as we read it" in ceph-deploy-jewelCan't reproduce09/20/2017

Actions
Actions #3

Updated by Yuri Weinstein about 8 years ago

  • Priority changed from Normal to High
Actions #4

Updated by Vasu Kulkarni about 8 years ago

This is due to ceph service stop issue which is seen on CentOS but I guess now its also happening on Ubuntu. The service stop issue itself returns success without actually stoping ceph (seems intermittent) causing this other issues to pop up.

Related issue on centos: http://tracker.ceph.com/issues/14839

2016-03-31T21:35:21.295 INFO:teuthology.orchestra.run.vpm133:Running: 'sudo stop ceph-all || sudo service ceph stop || sudo systemctl stop ceph.target'
2016-03-31T21:35:21.323 INFO:teuthology.orchestra.run.vpm133.stdout:ceph-all stop/waiting
2016-03-31T21:35:21.324 INFO:teuthology.orchestra.run.vpm184:Running: 'sudo stop ceph-all || sudo service ceph stop || sudo systemctl stop ceph.target'
2016-03-31T21:35:21.356 INFO:teuthology.orchestra.run.vpm184.stdout:ceph-all stop/waiting
2016-03-31T21:35:21.357 INFO:teuthology.orchestra.run.vpm133:Running: 'sudo status ceph-all || sudo service ceph status || sudo systemctl status ceph.target'
2016-03-31T21:35:21.375 INFO:teuthology.orchestra.run.vpm133.stdout:ceph-all stop/waiting
2016-03-31T21:35:21.376 INFO:teuthology.orchestra.run.vpm184:Running: 'sudo status ceph-all || sudo service ceph status || sudo systemctl status ceph.target'
2016-03-31T21:35:21.419 INFO:teuthology.orchestra.run.vpm184.stdout:ceph-all stop/waiting
2016-03-31T21:35:21.420 INFO:teuthology.orchestra.run.vpm133:Running: 'sudo ps aux | grep -v grep | grep ceph'
2016-03-31T21:35:21.446 INFO:teuthology.orchestra.run.vpm133.stdout:ceph     29384  0.8  1.1 351896 22948 ?        Ssl  04:33   0:01 /usr/bin/ceph-mon --cluster=ceph -i vpm133 -f --setuser ceph --setgroup ceph
2016-03-31T21:35:21.446 INFO:teuthology.orchestra.run.vpm133.stdout:ceph     29804  0.0  0.7 368544 13696 ?        Ssl  04:33   0:00 /usr/bin/ceph-mds --cluster=ceph -i vpm133 -f --setuser ceph --setgroup ceph
2016-03-31T21:35:21.447 INFO:teuthology.orchestra.run.vpm133.stdout:ceph     30511  1.1  1.3 842228 26984 ?        Ssl  04:34   0:00 /usr/bin/ceph-osd --cluster=ceph -i 0 -f --setuser ceph --setgroup ceph
2016-03-31T21:35:21.447 INFO:teuthology.orchestra.run.vpm184:Running: 'sudo ps aux | grep -v grep | grep ceph'
2016-03-31T21:35:21.494 INFO:teuthology.orchestra.run.vpm184.stdout:ceph     29210  0.5  1.2 344704 23652 ?        Ssl  04:33   0:00 /usr/bin/ceph-mon --cluster=ceph -i vpm184 -f --setuser ceph --setgroup ceph
2016-03-31T21:35:21.495 INFO:teuthology.orchestra.run.vpm184.stdout:ceph     30424  1.5  1.3 844280 25404 ?        Ssl  04:34   0:00 /usr/bin/ceph-osd --cluster=ceph -i 1 -f --setuser ceph --setgroup ceph

Actions #5

Updated by Dan Mick about 8 years ago

  • Project changed from teuthology to Ceph
  • Subject changed from "file changed as we read it" .. "Running: 'sudo tar cz -f" in eph-deploy-jewel-distro-basic-vps to "stop ceph-all" no longer stops ceph
  • Severity changed from 3 - minor to 2 - major

So if this is not a teuthology bug, it needs to be recategorized to be looked at.

Actions #6

Updated by Yuri Weinstein about 8 years ago

  • Subject changed from "stop ceph-all" no longer stops ceph to "stop ceph-all" no longer stops ceph (??)
  • Priority changed from High to Urgent
  • ceph-qa-suite upgrade/infernalis-x added

The comment below is moved into #15427

I see similar problem in other suites as well
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-04-07_02:10:01-upgrade:infernalis-x-jewel-distro-basic-openstack/
Jobs: 30510, 30523
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-04-07_02:10:01-upgrade:infernalis-x-jewel-distro-basic-openstack/30523/teuthology.log
http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-04-07_02:10:01-upgrade:infernalis-x-jewel-distro-basic-openstack/30510/teuthology.log

Both are hanging on lines:

2016-04-07T10:43:22.559 INFO:teuthology.orchestra.run.target064051:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 
2016-04-07T10:43:23.770 INFO:teuthology.orchestra.run.target064051:Running: 'sudo tar cz -f /tmp/tmpBeaXdf -C /var/log/ceph -- .'
2016-04-07T10:44:51.214 INFO:teuthology.orchestra.run.target064051:Running: 'sudo chmod 0666 /tmp/tmpBeaXdf
Actions #7

Updated by Yuri Weinstein about 8 years ago

  • ceph-qa-suite deleted (upgrade/infernalis-x)
Actions #8

Updated by Yuri Weinstein about 8 years ago

  • Subject changed from "stop ceph-all" no longer stops ceph (??) to "stop ceph-all" no longer stops ceph
Actions #9

Updated by Sage Weil about 8 years ago

  • Status changed from New to Need More Info

I bet this is fallout from the package split and upstart files being removed and re-added.. leaving upstart in a state where the ceph-all or ceph-osd/mon-all tasks aren't in the started state. What would be helpful is to reproduce this manually and see the 'initlct list | grep ceph' before/after. And/or the upstart logs in /var/log/upstart.

Actions #10

Updated by Yuri Weinstein about 8 years ago

I see the suite passed after this issue was logged
http://pulpito.ceph.com/teuthology-2016-04-02_21:13:02-ceph-deploy-jewel-distro-basic-vps/

Keeping open for now to confirm more passes

Actions #12

Updated by Yuri Weinstein about 8 years ago

And failed again :(
Run: http://pulpito.ceph.com/teuthology-2016-04-12_21:13:01-ceph-deploy-jewel-distro-basic-vps/
Jobs: 124579, 124581
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-12_21:13:01-ceph-deploy-jewel-distro-basic-vps/124581/teuthology.log

2016-04-12T21:42:41.334 INFO:teuthology.orchestra.run.vpm020:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 
2016-04-12T21:42:41.408 INFO:teuthology.orchestra.run.vpm020:Running: 'sudo tar cz -f /tmp/tmps_zsPu -C /var/log/ceph -- .'
2016-04-12T21:42:41.495 INFO:teuthology.orchestra.run.vpm020.stderr:tar: .: file changed as we read it
2016-04-12T21:42:41.498 INFO:tasks.ceph_deploy:Removing ceph-deploy ...
2016-04-12T21:42:41.498 INFO:teuthology.orchestra.run.vpm020:Running: 'rm -rf /home/ubuntu/cephtest/ceph-deploy'
2016-04-12T21:42:41.591 INFO:teuthology.task.install:Removing shipped files: /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits...
2016-04-12T21:42:41.591 INFO:teuthology.orchestra.run.vpm020:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits'
2016-04-12T21:42:41.598 INFO:teuthology.orchestra.run.vpm198:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits'
2016-04-12T21:42:41.658 ERROR:teuthology.run_tasks:Manager failed: ceph-deploy

ps snippet of the log:

2016-04-12T21:40:29.873 INFO:teuthology.task.print:**** done ceph-deploy/ceph-deploy_hello_world.sh
2016-04-12T21:40:29.873 DEBUG:teuthology.run_tasks:Unwinding manager ceph-deploy
2016-04-12T21:40:29.905 INFO:tasks.ceph_deploy:Stopping ceph...
2016-04-12T21:40:29.905 INFO:teuthology.orchestra.run.vpm136:Running: 'sudo stop ceph-all || sudo service ceph stop || sudo systemctl stop ceph.target'
2016-04-12T21:40:29.965 INFO:teuthology.orchestra.run.vpm136.stderr:sudo: stop: command not found
2016-04-12T21:40:29.988 INFO:teuthology.orchestra.run.vpm136.stderr:Redirecting to /bin/systemctl stop  ceph.service
2016-04-12T21:40:29.991 INFO:teuthology.orchestra.run.vpm136.stderr:Failed to stop ceph.service: Unit ceph.service not loaded.
2016-04-12T21:40:30.006 INFO:teuthology.orchestra.run.vpm191:Running: 'sudo stop ceph-all || sudo service ceph stop || sudo systemctl stop ceph.target'
2016-04-12T21:40:30.053 INFO:teuthology.orchestra.run.vpm191.stderr:sudo: stop: command not found
2016-04-12T21:40:30.073 INFO:teuthology.orchestra.run.vpm191.stderr:Redirecting to /bin/systemctl stop  ceph.service
2016-04-12T21:40:30.077 INFO:teuthology.orchestra.run.vpm191.stderr:Failed to stop ceph.service: Unit ceph.service not loaded.
2016-04-12T21:40:30.090 INFO:teuthology.orchestra.run.vpm136:Running: 'sudo status ceph-all || sudo service ceph status || sudo systemctl status ceph.target'
2016-04-12T21:40:30.137 INFO:teuthology.orchestra.run.vpm136.stderr:sudo: status: command not found
2016-04-12T21:40:30.156 INFO:teuthology.orchestra.run.vpm136.stderr:Redirecting to /bin/systemctl status  ceph.service
2016-04-12T21:40:30.160 INFO:teuthology.orchestra.run.vpm136.stdout:â— ceph.service
2016-04-12T21:40:30.160 INFO:teuthology.orchestra.run.vpm136.stdout:   Loaded: not-found (Reason: No such file or directory)
2016-04-12T21:40:30.160 INFO:teuthology.orchestra.run.vpm136.stdout:   Active: inactive (dead)
2016-04-12T21:40:30.170 INFO:teuthology.orchestra.run.vpm136.stdout:â— ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
2016-04-12T21:40:30.171 INFO:teuthology.orchestra.run.vpm136.stdout:   Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor preset: disabled)
2016-04-12T21:40:30.171 INFO:teuthology.orchestra.run.vpm136.stdout:   Active: inactive (dead)
2016-04-12T21:40:30.171 INFO:teuthology.orchestra.run.vpm136.stdout:
2016-04-12T21:40:30.171 INFO:teuthology.orchestra.run.vpm136.stdout:Apr 13 04:40:29 vpm136 systemd[1]: Stopped target ceph target allowing to start/stop all ceph*@.service instances at once.
2016-04-12T21:40:30.172 INFO:teuthology.orchestra.run.vpm191:Running: 'sudo status ceph-all || sudo service ceph status || sudo systemctl status ceph.target'
2016-04-12T21:40:30.219 INFO:teuthology.orchestra.run.vpm191.stderr:sudo: status: command not found
2016-04-12T21:40:30.238 INFO:teuthology.orchestra.run.vpm191.stderr:Redirecting to /bin/systemctl status  ceph.service
2016-04-12T21:40:30.242 INFO:teuthology.orchestra.run.vpm191.stdout:â— ceph.service
2016-04-12T21:40:30.242 INFO:teuthology.orchestra.run.vpm191.stdout:   Loaded: not-found (Reason: No such file or directory)
2016-04-12T21:40:30.243 INFO:teuthology.orchestra.run.vpm191.stdout:   Active: inactive (dead)
2016-04-12T21:40:30.252 INFO:teuthology.orchestra.run.vpm191.stdout:â— ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
2016-04-12T21:40:30.252 INFO:teuthology.orchestra.run.vpm191.stdout:   Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor preset: disabled)
2016-04-12T21:40:30.252 INFO:teuthology.orchestra.run.vpm191.stdout:   Active: inactive (dead)
2016-04-12T21:40:30.252 INFO:teuthology.orchestra.run.vpm191.stdout:
2016-04-12T21:40:30.253 INFO:teuthology.orchestra.run.vpm191.stdout:Apr 13 04:40:30 vpm191 systemd[1]: Stopped target ceph target allowing to start/stop all ceph*@.service instances at once.
2016-04-12T21:40:30.254 INFO:teuthology.orchestra.run.vpm136:Running: 'sudo ps aux | grep -v grep | grep ceph'
2016-04-12T21:40:30.319 INFO:teuthology.orchestra.run.vpm136.stdout:ceph      4599  0.7  1.4 344592 26464 ?        Ssl  04:38   0:00 /usr/bin/ceph-mon -f --cluster ceph --id vpm136 --setuser ceph --setgroup ceph
2016-04-12T21:40:30.320 INFO:teuthology.orchestra.run.vpm136.stdout:ceph      5117  0.0  0.5 360092 10612 ?        Ssl  04:38   0:00 /usr/bin/ceph-mds -f --cluster ceph --id vpm136 --setuser ceph --setgroup ceph
2016-04-12T21:40:30.320 INFO:teuthology.orchestra.run.vpm136.stdout:ceph      5931  0.6  2.2 839796 39916 ?        Ssl  04:39   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
2016-04-12T21:40:30.320 INFO:teuthology.orchestra.run.vpm191:Running: 'sudo ps aux | grep -v grep | grep ceph'
2016-04-12T21:40:30.374 INFO:teuthology.orchestra.run.vpm191.stdout:ceph      1256  0.3  1.2 336376 22192 ?        Ssl  04:38   0:00 /usr/bin/ceph-mon -f --cluster ceph --id vpm191 --setuser ceph --setgroup ceph
2016-04-12T21:40:30.374 INFO:teuthology.orchestra.run.vpm191.stdout:ceph      2417  0.6  1.4 840572 26488 ?        Ssl  04:39   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
2016-04-12T21:40:30.375 INFO:tasks.ceph_deploy:Archiving mon data...
Actions #13

Updated by Yuri Weinstein about 8 years ago

  • Status changed from Rejected to New
Actions #14

Updated by Yuri Weinstein about 8 years ago

Tried reproducing with hammer.sh this job http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-13_21:13:01-ceph-deploy-jewel-distro-basic-mira/128176/orig.config.yaml

1st time in the loop test passed (see [ERROR ] RuntimeError: Failed to execute command: rm -rf --one-file-system -- /var/lib/ceph below)

                                                                                                                                                                                          [0/0]
2016-04-14 13:05:15,340.340 INFO:teuthology.orchestra.run.mira119.stderr:[mira119][INFO  ] Running command: sudo rm -rf --one-file-system -- /var/lib/ceph
2016-04-14 13:05:15,355.355 INFO:teuthology.orchestra.run.mira119.stderr:[mira119][WARNING] rm: skipping \342\200\230/var/lib/ceph/osd/ceph-0\342\200\231, since it's on a different device
2016-04-14 13:05:15,356.356 INFO:teuthology.orchestra.run.mira119.stderr:[mira119][ERROR ] RuntimeError: command returned non-zero exit status: 1
2016-04-14 13:05:15,356.356 INFO:teuthology.orchestra.run.mira119.stderr:[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: rm -rf --one-file-system -- /var/lib/ceph
2016-04-14 13:05:15,356.356 INFO:teuthology.orchestra.run.mira119.stderr:
2016-04-14 13:05:15,391.391 INFO:tasks.ceph_deploy:Removing ceph-deploy ...

and the 2d time on the loop test failed:

2016-04-14 13:05:19,492.492 INFO:teuthology.task.internal:Checking for old /var/lib/ceph...
2016-04-14 13:05:19,492.492 INFO:teuthology.orchestra.run.mira047:Running: "test '!' -e /var/lib/ceph" 
2016-04-14 13:05:19,528.528 INFO:teuthology.orchestra.run.mira119:Running: "test '!' -e /var/lib/ceph" 
2016-04-14 13:05:19,532.532 ERROR:teuthology.task.internal:Host mira047 has stale /var/lib/ceph, check lock and nuke/cleanup.
Actions #15

Updated by Dan Mick about 8 years ago

ubuntu@mira119:~$ initctl list | grep ceph
ceph-osd (ceph/0) start/running, process 8122

I was going to look at the log for the run to see exactly what it did, until I realized I don't know where the log is (or if there is one).

Actions #16

Updated by Sage Weil about 8 years ago

  • Status changed from New to Fix Under Review

https://github.com/ceph/ceph/pull/8617

Notes for this ^ PR

Per Sage's suggestion, I did re-ran the same job, stopped it after the ""hello_world" test and on mira084 verified ps before after ceph-all stop:

ubuntu@mira084:~$ ps ax | grep ceph
 7077 ?        Ssl    0:00 /usr/bin/ceph-mon --cluster=ceph -i mira084 -f --setuser ceph --setgroup ceph
 8068 ?        Ssl    0:01 /usr/bin/ceph-osd --cluster=ceph -i 1 -f --setuser ceph --setgroup ceph
 8378 pts/3    S+     0:00 grep --color=auto ceph
ubuntu@mira084:~$ initctl list | grep ceph
ceph-mds-all-starter stop/waiting
ceph-mds-all start/running
ceph-osd-all stop/waiting
ceph-osd-all-starter stop/waiting
ceph-all start/running
ceph-mon-all stop/waiting
ceph-mon-all-starter stop/waiting
ceph-mon (ceph/mira084) start/running, process 7077
ceph-disk stop/waiting
ceph-create-keys stop/waiting
ceph-osd (ceph/1) start/running, process 8068
ceph-mds stop/waiting
ubuntu@mira084:~$ stop ceph-all
stop: Rejected send message, 1 matched rules; type="method_call", sender=":1.90" (uid=1000 pid=8381 comm="stop ceph-all ") interface="com.ubuntu.Upstart0_6.Job" member="Stop" error name="(unset)" requested_reply="0" destination="com.ubuntu.Upstart" (uid=0 pid=1 comm="/sbin/init")
ubuntu@mira084:~$ sudo stop ceph-all
ceph-all stop/waiting
ubuntu@mira084:~$ ps ax | grep ceph
 7077 ?        Ssl    0:00 /usr/bin/ceph-mon --cluster=ceph -i mira084 -f --setuser ceph --setgroup ceph
 8068 ?        Ssl    0:01 /usr/bin/ceph-osd --cluster=ceph -i 1 -f --setuser ceph --setgroup ceph
 8388 pts/3    S+     0:00 grep --color=auto ceph
ubuntu@mira084:~$ initctl list | grep ceph
ceph-mds-all-starter stop/waiting
ceph-mds-all stop/waiting
ceph-osd-all stop/waiting
ceph-osd-all-starter stop/waiting
ceph-all stop/waiting
ceph-mon-all stop/waiting
ceph-mon-all-starter stop/waiting
ceph-mon (ceph/mira084) start/running, process 7077
ceph-disk stop/waiting
ceph-create-keys stop/waiting
ceph-osd (ceph/1) start/running, process 8068
ceph-mds stop/waiting

Actions #17

Updated by Yuri Weinstein about 8 years ago

@Dan logs are in yuriw@teuthology ~/_run_tests.yaml.out , not much in it as it was ran with hammer.sh and only last run logs were saved.

Actions #18

Updated by Yuri Weinstein about 8 years ago

Looks working now on wip-sage-testing2
sha1: 3bab4ee1d2dc4d1fd2fbbbb18744a03399d05e1c

The same job on that sha1

buntu@mira084:~$ ps ax|grep ceph
 7125 ?        Ssl    0:00 /usr/bin/ceph-mon --cluster=ceph -i mira084 -f --setuser ceph --setgroup ceph
 8112 ?        Ssl    0:00 /usr/bin/ceph-osd --cluster=ceph -i 1 -f --setuser ceph --setgroup ceph
 8421 pts/3    S+     0:00 grep --color=auto ceph
ubuntu@mira084:~$ sudo ceph -s^C
ubuntu@mira084:~$ initctl list | grep ceph
ceph-mds-all-starter stop/waiting
ceph-mds-all start/running
ceph-osd-all start/running
ceph-osd-all-starter stop/waiting
ceph-all start/running
ceph-mon-all start/running
ceph-mon-all-starter stop/waiting
ceph-mon (ceph/mira084) start/running, process 7125
ceph-disk stop/waiting
ceph-create-keys stop/waiting
ceph-osd (ceph/1) start/running, process 8112
ceph-mds stop/waiting
ubuntu@mira084:~$ sudo stop ceph-all
ceph-all stop/waiting
ubuntu@mira084:~$ ps ax|grep ceph
 8456 pts/3    S+     0:00 grep --color=auto ceph
ubuntu@mira084:~$ initctl list | grep ceph
ceph-mds-all-starter stop/waiting
ceph-mds-all stop/waiting
ceph-osd-all stop/waiting
ceph-osd-all-starter stop/waiting
ceph-all stop/waiting
ceph-mon-all stop/waiting
ceph-mon-all-starter stop/waiting
ceph-mon stop/waiting
ceph-disk stop/waiting
ceph-create-keys stop/waiting
ceph-osd stop/waiting
ceph-mds stop/waiting
ubuntu@mira084:~$ 

And I was able to run this test with hammer.sh for several times, which was not possible before the fix

Actions #19

Updated by Dan Mick about 8 years ago

The proposed fix starts the ceph-*-all jobs at package installation time; this won't actually start any services, but it puts the daemons' -all jobs in started state, so that at -stop time (whether it's in the package prerm or explicitly called, like from teuthology) stopping ceph-all actually stops everything.

(ceph-all is already started in the ceph postinst script)

Actions #20

Updated by Sage Weil about 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions #21

Updated by Yuri Weinstein over 6 years ago

  • Related to Bug #21482: "file changed as we read it" in ceph-deploy-jewel added
Actions

Also available in: Atom PDF