Bug #15098
closed"stop ceph-all" no longer stops ceph
0%
Description
on Trusty only
Run: http://pulpito.ceph.com/teuthology-2016-03-10_21:13:02-ceph-deploy-jewel-distro-basic-vps/
Jobs: ['52029', '52031', '52034']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-03-10_21:13:02-ceph-deploy-jewel-distro-basic-vps/52031/teuthology.log
2016-03-10T21:32:04.228 INFO:teuthology.orchestra.run.vpm036:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 2016-03-10T21:32:04.250 INFO:teuthology.orchestra.run.vpm036:Running: 'sudo tar cz -f /tmp/tmpqamrX6 -C /var/lib/ceph/mon -- .' 2016-03-10T21:32:04.367 INFO:teuthology.orchestra.run.vpm036:Running: 'sudo chmod 0666 /tmp/tmpqamrX6' 2016-03-10T21:32:04.500 INFO:teuthology.orchestra.run.vpm036:Running: 'rm -fr /tmp/tmpqamrX6' 2016-03-10T21:32:04.507 DEBUG:teuthology.misc:Transferring archived files from vpm115:/var/lib/ceph/mon to /var/lib/teuthworker/archive/teuthology-2016-03-10_21:13:02-ceph-deploy-jewel-distro-basic-vps/52031/data/mon.vpm115.tgz 2016-03-10T21:32:04.508 INFO:teuthology.orchestra.run.vpm115:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 2016-03-10T21:32:04.533 INFO:teuthology.orchestra.run.vpm115:Running: 'sudo tar cz -f /tmp/tmpBs3gcz -C /var/lib/ceph/mon -- .' 2016-03-10T21:32:04.683 INFO:teuthology.orchestra.run.vpm115.stderr:tar: ./ceph-vpm115/store.db/000006.log: file changed as we read it 2016-03-10T21:32:04.686 INFO:tasks.ceph_deploy:Removing ceph-deploy ... 2016-03-10T21:32:04.686 INFO:teuthology.orchestra.run.vpm115:Running: 'rm -rf /home/ubuntu/cephtest/ceph-deploy' 2016-03-10T21:32:04.716 INFO:teuthology.task.install:Removing shipped files: /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits... 2016-03-10T21:32:04.717 INFO:teuthology.orchestra.run.vpm036:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits' 2016-03-10T21:32:04.721 INFO:teuthology.orchestra.run.vpm115:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits' 2016-03-10T21:32:04.768 ERROR:teuthology.run_tasks:Manager failed: ceph-deploy Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 139, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/ceph_deploy.py", line 664, in task yield File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 44, in nested if exit(*exc): File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/ceph_deploy.py", line 408, in build_ceph_cluster path + '/' + role + '.tgz') File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 772, in pull_directory_tarball remote.get_tar(remotedir, localfile, sudo=True) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 348, in get_tar self.run(args=args) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 196, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run r.wait() File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait label=self.label) CommandFailedError: Command failed on vpm115 with status 1: 'sudo tar cz -f /tmp/tmpBs3gcz -C /var/lib/ceph/mon -- .' 2016-03-10T21:32:04.772 DEBUG:teuthology.run_tasks:Unwinding manager ssh_keys 2016-03-10T21:32:04.799 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested yield vars File "/home/teuthworker/src/teuthology_master/teuthology/task/ssh_keys.py", line 206, in task yield File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 139, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/ceph_deploy.py", line 664, in task yield File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 44, in nested if exit(*exc): File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/ceph_deploy.py", line 408, in build_ceph_cluster path + '/' + role + '.tgz') File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 772, in pull_directory_tarball remote.get_tar(remotedir, localfile, sudo=True) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 348, in get_tar self.run(args=args) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 196, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run r.wait() File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait label=self.label) CommandFailedError: Command failed on vpm115 with status 1: 'sudo tar cz -f /tmp/tmpBs3gcz -C /var/lib/ceph/mon -- .'
Updated by Yuri Weinstein about 8 years ago
Run: http://pulpito.ceph.com/teuthology-2016-03-29_21:13:02-ceph-deploy-jewel-distro-basic-vps/
Jobs: 95598, 95604
Updated by Yuri Weinstein about 8 years ago
Run: http://pulpito.ceph.com/teuthology-2016-03-31_21:13:02-ceph-deploy-jewel-distro-basic-vps/
Jobs: 101104, 101107
Updated by Yuri Weinstein about 8 years ago
- Priority changed from Normal to High
Updated by Vasu Kulkarni about 8 years ago
This is due to ceph service stop issue which is seen on CentOS but I guess now its also happening on Ubuntu. The service stop issue itself returns success without actually stoping ceph (seems intermittent) causing this other issues to pop up.
Related issue on centos: http://tracker.ceph.com/issues/14839
2016-03-31T21:35:21.295 INFO:teuthology.orchestra.run.vpm133:Running: 'sudo stop ceph-all || sudo service ceph stop || sudo systemctl stop ceph.target' 2016-03-31T21:35:21.323 INFO:teuthology.orchestra.run.vpm133.stdout:ceph-all stop/waiting 2016-03-31T21:35:21.324 INFO:teuthology.orchestra.run.vpm184:Running: 'sudo stop ceph-all || sudo service ceph stop || sudo systemctl stop ceph.target' 2016-03-31T21:35:21.356 INFO:teuthology.orchestra.run.vpm184.stdout:ceph-all stop/waiting 2016-03-31T21:35:21.357 INFO:teuthology.orchestra.run.vpm133:Running: 'sudo status ceph-all || sudo service ceph status || sudo systemctl status ceph.target' 2016-03-31T21:35:21.375 INFO:teuthology.orchestra.run.vpm133.stdout:ceph-all stop/waiting 2016-03-31T21:35:21.376 INFO:teuthology.orchestra.run.vpm184:Running: 'sudo status ceph-all || sudo service ceph status || sudo systemctl status ceph.target' 2016-03-31T21:35:21.419 INFO:teuthology.orchestra.run.vpm184.stdout:ceph-all stop/waiting 2016-03-31T21:35:21.420 INFO:teuthology.orchestra.run.vpm133:Running: 'sudo ps aux | grep -v grep | grep ceph' 2016-03-31T21:35:21.446 INFO:teuthology.orchestra.run.vpm133.stdout:ceph 29384 0.8 1.1 351896 22948 ? Ssl 04:33 0:01 /usr/bin/ceph-mon --cluster=ceph -i vpm133 -f --setuser ceph --setgroup ceph 2016-03-31T21:35:21.446 INFO:teuthology.orchestra.run.vpm133.stdout:ceph 29804 0.0 0.7 368544 13696 ? Ssl 04:33 0:00 /usr/bin/ceph-mds --cluster=ceph -i vpm133 -f --setuser ceph --setgroup ceph 2016-03-31T21:35:21.447 INFO:teuthology.orchestra.run.vpm133.stdout:ceph 30511 1.1 1.3 842228 26984 ? Ssl 04:34 0:00 /usr/bin/ceph-osd --cluster=ceph -i 0 -f --setuser ceph --setgroup ceph 2016-03-31T21:35:21.447 INFO:teuthology.orchestra.run.vpm184:Running: 'sudo ps aux | grep -v grep | grep ceph' 2016-03-31T21:35:21.494 INFO:teuthology.orchestra.run.vpm184.stdout:ceph 29210 0.5 1.2 344704 23652 ? Ssl 04:33 0:00 /usr/bin/ceph-mon --cluster=ceph -i vpm184 -f --setuser ceph --setgroup ceph 2016-03-31T21:35:21.495 INFO:teuthology.orchestra.run.vpm184.stdout:ceph 30424 1.5 1.3 844280 25404 ? Ssl 04:34 0:00 /usr/bin/ceph-osd --cluster=ceph -i 1 -f --setuser ceph --setgroup ceph
Updated by Dan Mick about 8 years ago
- Project changed from teuthology to Ceph
- Subject changed from "file changed as we read it" .. "Running: 'sudo tar cz -f" in eph-deploy-jewel-distro-basic-vps to "stop ceph-all" no longer stops ceph
- Severity changed from 3 - minor to 2 - major
So if this is not a teuthology bug, it needs to be recategorized to be looked at.
Updated by Yuri Weinstein about 8 years ago
- Subject changed from "stop ceph-all" no longer stops ceph to "stop ceph-all" no longer stops ceph (??)
- Priority changed from High to Urgent
- ceph-qa-suite upgrade/infernalis-x added
The comment below is moved into #15427
I see similar problem in other suites as well
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-04-07_02:10:01-upgrade:infernalis-x-jewel-distro-basic-openstack/
Jobs: 30510, 30523
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-04-07_02:10:01-upgrade:infernalis-x-jewel-distro-basic-openstack/30523/teuthology.log
http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-04-07_02:10:01-upgrade:infernalis-x-jewel-distro-basic-openstack/30510/teuthology.log
Both are hanging on lines:
2016-04-07T10:43:22.559 INFO:teuthology.orchestra.run.target064051:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 2016-04-07T10:43:23.770 INFO:teuthology.orchestra.run.target064051:Running: 'sudo tar cz -f /tmp/tmpBeaXdf -C /var/log/ceph -- .' 2016-04-07T10:44:51.214 INFO:teuthology.orchestra.run.target064051:Running: 'sudo chmod 0666 /tmp/tmpBeaXdf
Updated by Yuri Weinstein about 8 years ago
- ceph-qa-suite deleted (
upgrade/infernalis-x)
Updated by Yuri Weinstein about 8 years ago
- Subject changed from "stop ceph-all" no longer stops ceph (??) to "stop ceph-all" no longer stops ceph
Updated by Sage Weil about 8 years ago
- Status changed from New to Need More Info
I bet this is fallout from the package split and upstart files being removed and re-added.. leaving upstart in a state where the ceph-all or ceph-osd/mon-all tasks aren't in the started state. What would be helpful is to reproduce this manually and see the 'initlct list | grep ceph' before/after. And/or the upstart logs in /var/log/upstart.
Updated by Yuri Weinstein about 8 years ago
I see the suite passed after this issue was logged
http://pulpito.ceph.com/teuthology-2016-04-02_21:13:02-ceph-deploy-jewel-distro-basic-vps/
Keeping open for now to confirm more passes
Updated by Yuri Weinstein about 8 years ago
- Status changed from Need More Info to Rejected
Updated by Yuri Weinstein about 8 years ago
And failed again :(
Run: http://pulpito.ceph.com/teuthology-2016-04-12_21:13:01-ceph-deploy-jewel-distro-basic-vps/
Jobs: 124579, 124581
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-12_21:13:01-ceph-deploy-jewel-distro-basic-vps/124581/teuthology.log
2016-04-12T21:42:41.334 INFO:teuthology.orchestra.run.vpm020:Running: "python -c 'import os; import tempfile; import sys;(fd,fname) = tempfile.mkstemp();os.close(fd);sys.stdout.write(fname.rstrip());sys.stdout.flush()'" 2016-04-12T21:42:41.408 INFO:teuthology.orchestra.run.vpm020:Running: 'sudo tar cz -f /tmp/tmps_zsPu -C /var/log/ceph -- .' 2016-04-12T21:42:41.495 INFO:teuthology.orchestra.run.vpm020.stderr:tar: .: file changed as we read it 2016-04-12T21:42:41.498 INFO:tasks.ceph_deploy:Removing ceph-deploy ... 2016-04-12T21:42:41.498 INFO:teuthology.orchestra.run.vpm020:Running: 'rm -rf /home/ubuntu/cephtest/ceph-deploy' 2016-04-12T21:42:41.591 INFO:teuthology.task.install:Removing shipped files: /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits... 2016-04-12T21:42:41.591 INFO:teuthology.orchestra.run.vpm020:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits' 2016-04-12T21:42:41.598 INFO:teuthology.orchestra.run.vpm198:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits' 2016-04-12T21:42:41.658 ERROR:teuthology.run_tasks:Manager failed: ceph-deploy
ps
snippet of the log:
2016-04-12T21:40:29.873 INFO:teuthology.task.print:**** done ceph-deploy/ceph-deploy_hello_world.sh 2016-04-12T21:40:29.873 DEBUG:teuthology.run_tasks:Unwinding manager ceph-deploy 2016-04-12T21:40:29.905 INFO:tasks.ceph_deploy:Stopping ceph... 2016-04-12T21:40:29.905 INFO:teuthology.orchestra.run.vpm136:Running: 'sudo stop ceph-all || sudo service ceph stop || sudo systemctl stop ceph.target' 2016-04-12T21:40:29.965 INFO:teuthology.orchestra.run.vpm136.stderr:sudo: stop: command not found 2016-04-12T21:40:29.988 INFO:teuthology.orchestra.run.vpm136.stderr:Redirecting to /bin/systemctl stop ceph.service 2016-04-12T21:40:29.991 INFO:teuthology.orchestra.run.vpm136.stderr:Failed to stop ceph.service: Unit ceph.service not loaded. 2016-04-12T21:40:30.006 INFO:teuthology.orchestra.run.vpm191:Running: 'sudo stop ceph-all || sudo service ceph stop || sudo systemctl stop ceph.target' 2016-04-12T21:40:30.053 INFO:teuthology.orchestra.run.vpm191.stderr:sudo: stop: command not found 2016-04-12T21:40:30.073 INFO:teuthology.orchestra.run.vpm191.stderr:Redirecting to /bin/systemctl stop ceph.service 2016-04-12T21:40:30.077 INFO:teuthology.orchestra.run.vpm191.stderr:Failed to stop ceph.service: Unit ceph.service not loaded. 2016-04-12T21:40:30.090 INFO:teuthology.orchestra.run.vpm136:Running: 'sudo status ceph-all || sudo service ceph status || sudo systemctl status ceph.target' 2016-04-12T21:40:30.137 INFO:teuthology.orchestra.run.vpm136.stderr:sudo: status: command not found 2016-04-12T21:40:30.156 INFO:teuthology.orchestra.run.vpm136.stderr:Redirecting to /bin/systemctl status ceph.service 2016-04-12T21:40:30.160 INFO:teuthology.orchestra.run.vpm136.stdout:â— ceph.service 2016-04-12T21:40:30.160 INFO:teuthology.orchestra.run.vpm136.stdout: Loaded: not-found (Reason: No such file or directory) 2016-04-12T21:40:30.160 INFO:teuthology.orchestra.run.vpm136.stdout: Active: inactive (dead) 2016-04-12T21:40:30.170 INFO:teuthology.orchestra.run.vpm136.stdout:â— ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once 2016-04-12T21:40:30.171 INFO:teuthology.orchestra.run.vpm136.stdout: Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor preset: disabled) 2016-04-12T21:40:30.171 INFO:teuthology.orchestra.run.vpm136.stdout: Active: inactive (dead) 2016-04-12T21:40:30.171 INFO:teuthology.orchestra.run.vpm136.stdout: 2016-04-12T21:40:30.171 INFO:teuthology.orchestra.run.vpm136.stdout:Apr 13 04:40:29 vpm136 systemd[1]: Stopped target ceph target allowing to start/stop all ceph*@.service instances at once. 2016-04-12T21:40:30.172 INFO:teuthology.orchestra.run.vpm191:Running: 'sudo status ceph-all || sudo service ceph status || sudo systemctl status ceph.target' 2016-04-12T21:40:30.219 INFO:teuthology.orchestra.run.vpm191.stderr:sudo: status: command not found 2016-04-12T21:40:30.238 INFO:teuthology.orchestra.run.vpm191.stderr:Redirecting to /bin/systemctl status ceph.service 2016-04-12T21:40:30.242 INFO:teuthology.orchestra.run.vpm191.stdout:â— ceph.service 2016-04-12T21:40:30.242 INFO:teuthology.orchestra.run.vpm191.stdout: Loaded: not-found (Reason: No such file or directory) 2016-04-12T21:40:30.243 INFO:teuthology.orchestra.run.vpm191.stdout: Active: inactive (dead) 2016-04-12T21:40:30.252 INFO:teuthology.orchestra.run.vpm191.stdout:â— ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once 2016-04-12T21:40:30.252 INFO:teuthology.orchestra.run.vpm191.stdout: Loaded: loaded (/usr/lib/systemd/system/ceph.target; enabled; vendor preset: disabled) 2016-04-12T21:40:30.252 INFO:teuthology.orchestra.run.vpm191.stdout: Active: inactive (dead) 2016-04-12T21:40:30.252 INFO:teuthology.orchestra.run.vpm191.stdout: 2016-04-12T21:40:30.253 INFO:teuthology.orchestra.run.vpm191.stdout:Apr 13 04:40:30 vpm191 systemd[1]: Stopped target ceph target allowing to start/stop all ceph*@.service instances at once. 2016-04-12T21:40:30.254 INFO:teuthology.orchestra.run.vpm136:Running: 'sudo ps aux | grep -v grep | grep ceph' 2016-04-12T21:40:30.319 INFO:teuthology.orchestra.run.vpm136.stdout:ceph 4599 0.7 1.4 344592 26464 ? Ssl 04:38 0:00 /usr/bin/ceph-mon -f --cluster ceph --id vpm136 --setuser ceph --setgroup ceph 2016-04-12T21:40:30.320 INFO:teuthology.orchestra.run.vpm136.stdout:ceph 5117 0.0 0.5 360092 10612 ? Ssl 04:38 0:00 /usr/bin/ceph-mds -f --cluster ceph --id vpm136 --setuser ceph --setgroup ceph 2016-04-12T21:40:30.320 INFO:teuthology.orchestra.run.vpm136.stdout:ceph 5931 0.6 2.2 839796 39916 ? Ssl 04:39 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph 2016-04-12T21:40:30.320 INFO:teuthology.orchestra.run.vpm191:Running: 'sudo ps aux | grep -v grep | grep ceph' 2016-04-12T21:40:30.374 INFO:teuthology.orchestra.run.vpm191.stdout:ceph 1256 0.3 1.2 336376 22192 ? Ssl 04:38 0:00 /usr/bin/ceph-mon -f --cluster ceph --id vpm191 --setuser ceph --setgroup ceph 2016-04-12T21:40:30.374 INFO:teuthology.orchestra.run.vpm191.stdout:ceph 2417 0.6 1.4 840572 26488 ? Ssl 04:39 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph 2016-04-12T21:40:30.375 INFO:tasks.ceph_deploy:Archiving mon data...
Updated by Yuri Weinstein about 8 years ago
- Status changed from Rejected to New
Updated by Yuri Weinstein about 8 years ago
Tried reproducing with hammer.sh this job http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-13_21:13:01-ceph-deploy-jewel-distro-basic-mira/128176/orig.config.yaml
1st time in the loop test passed (see [ERROR ] RuntimeError: Failed to execute command: rm -rf --one-file-system -- /var/lib/ceph
below)
[0/0] 2016-04-14 13:05:15,340.340 INFO:teuthology.orchestra.run.mira119.stderr:[mira119][INFO ] Running command: sudo rm -rf --one-file-system -- /var/lib/ceph 2016-04-14 13:05:15,355.355 INFO:teuthology.orchestra.run.mira119.stderr:[mira119][WARNING] rm: skipping \342\200\230/var/lib/ceph/osd/ceph-0\342\200\231, since it's on a different device 2016-04-14 13:05:15,356.356 INFO:teuthology.orchestra.run.mira119.stderr:[mira119][ERROR ] RuntimeError: command returned non-zero exit status: 1 2016-04-14 13:05:15,356.356 INFO:teuthology.orchestra.run.mira119.stderr:[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: rm -rf --one-file-system -- /var/lib/ceph 2016-04-14 13:05:15,356.356 INFO:teuthology.orchestra.run.mira119.stderr: 2016-04-14 13:05:15,391.391 INFO:tasks.ceph_deploy:Removing ceph-deploy ...
and the 2d time on the loop test failed:
2016-04-14 13:05:19,492.492 INFO:teuthology.task.internal:Checking for old /var/lib/ceph... 2016-04-14 13:05:19,492.492 INFO:teuthology.orchestra.run.mira047:Running: "test '!' -e /var/lib/ceph" 2016-04-14 13:05:19,528.528 INFO:teuthology.orchestra.run.mira119:Running: "test '!' -e /var/lib/ceph" 2016-04-14 13:05:19,532.532 ERROR:teuthology.task.internal:Host mira047 has stale /var/lib/ceph, check lock and nuke/cleanup.
Updated by Dan Mick about 8 years ago
ubuntu@mira119:~$ initctl list | grep ceph
ceph-osd (ceph/0) start/running, process 8122
I was going to look at the log for the run to see exactly what it did, until I realized I don't know where the log is (or if there is one).
Updated by Sage Weil about 8 years ago
- Status changed from New to Fix Under Review
https://github.com/ceph/ceph/pull/8617
Notes for this ^ PR
Per Sage's suggestion, I did re-ran the same job, stopped it after the ""hello_world" test and on mira084 verified ps before after ceph-all stop:
ubuntu@mira084:~$ ps ax | grep ceph 7077 ? Ssl 0:00 /usr/bin/ceph-mon --cluster=ceph -i mira084 -f --setuser ceph --setgroup ceph 8068 ? Ssl 0:01 /usr/bin/ceph-osd --cluster=ceph -i 1 -f --setuser ceph --setgroup ceph 8378 pts/3 S+ 0:00 grep --color=auto ceph ubuntu@mira084:~$ initctl list | grep ceph ceph-mds-all-starter stop/waiting ceph-mds-all start/running ceph-osd-all stop/waiting ceph-osd-all-starter stop/waiting ceph-all start/running ceph-mon-all stop/waiting ceph-mon-all-starter stop/waiting ceph-mon (ceph/mira084) start/running, process 7077 ceph-disk stop/waiting ceph-create-keys stop/waiting ceph-osd (ceph/1) start/running, process 8068 ceph-mds stop/waiting ubuntu@mira084:~$ stop ceph-all stop: Rejected send message, 1 matched rules; type="method_call", sender=":1.90" (uid=1000 pid=8381 comm="stop ceph-all ") interface="com.ubuntu.Upstart0_6.Job" member="Stop" error name="(unset)" requested_reply="0" destination="com.ubuntu.Upstart" (uid=0 pid=1 comm="/sbin/init") ubuntu@mira084:~$ sudo stop ceph-all ceph-all stop/waiting ubuntu@mira084:~$ ps ax | grep ceph 7077 ? Ssl 0:00 /usr/bin/ceph-mon --cluster=ceph -i mira084 -f --setuser ceph --setgroup ceph 8068 ? Ssl 0:01 /usr/bin/ceph-osd --cluster=ceph -i 1 -f --setuser ceph --setgroup ceph 8388 pts/3 S+ 0:00 grep --color=auto ceph ubuntu@mira084:~$ initctl list | grep ceph ceph-mds-all-starter stop/waiting ceph-mds-all stop/waiting ceph-osd-all stop/waiting ceph-osd-all-starter stop/waiting ceph-all stop/waiting ceph-mon-all stop/waiting ceph-mon-all-starter stop/waiting ceph-mon (ceph/mira084) start/running, process 7077 ceph-disk stop/waiting ceph-create-keys stop/waiting ceph-osd (ceph/1) start/running, process 8068 ceph-mds stop/waiting
Updated by Yuri Weinstein about 8 years ago
@Dan logs are in yuriw@teuthology ~/_run_tests.yaml.out , not much in it as it was ran with hammer.sh and only last run logs were saved.
Updated by Yuri Weinstein about 8 years ago
Looks working now on wip-sage-testing2
sha1: 3bab4ee1d2dc4d1fd2fbbbb18744a03399d05e1c
The same job on that sha1
buntu@mira084:~$ ps ax|grep ceph 7125 ? Ssl 0:00 /usr/bin/ceph-mon --cluster=ceph -i mira084 -f --setuser ceph --setgroup ceph 8112 ? Ssl 0:00 /usr/bin/ceph-osd --cluster=ceph -i 1 -f --setuser ceph --setgroup ceph 8421 pts/3 S+ 0:00 grep --color=auto ceph ubuntu@mira084:~$ sudo ceph -s^C ubuntu@mira084:~$ initctl list | grep ceph ceph-mds-all-starter stop/waiting ceph-mds-all start/running ceph-osd-all start/running ceph-osd-all-starter stop/waiting ceph-all start/running ceph-mon-all start/running ceph-mon-all-starter stop/waiting ceph-mon (ceph/mira084) start/running, process 7125 ceph-disk stop/waiting ceph-create-keys stop/waiting ceph-osd (ceph/1) start/running, process 8112 ceph-mds stop/waiting ubuntu@mira084:~$ sudo stop ceph-all ceph-all stop/waiting ubuntu@mira084:~$ ps ax|grep ceph 8456 pts/3 S+ 0:00 grep --color=auto ceph ubuntu@mira084:~$ initctl list | grep ceph ceph-mds-all-starter stop/waiting ceph-mds-all stop/waiting ceph-osd-all stop/waiting ceph-osd-all-starter stop/waiting ceph-all stop/waiting ceph-mon-all stop/waiting ceph-mon-all-starter stop/waiting ceph-mon stop/waiting ceph-disk stop/waiting ceph-create-keys stop/waiting ceph-osd stop/waiting ceph-mds stop/waiting ubuntu@mira084:~$
And I was able to run this test with hammer.sh for several times, which was not possible before the fix
Updated by Dan Mick about 8 years ago
The proposed fix starts the ceph-*-all jobs at package installation time; this won't actually start any services, but it puts the daemons' -all jobs in started state, so that at -stop time (whether it's in the package prerm or explicitly called, like from teuthology) stopping ceph-all actually stops everything.
(ceph-all is already started in the ceph postinst script)
Updated by Sage Weil about 8 years ago
- Status changed from Fix Under Review to Resolved
Updated by Yuri Weinstein over 6 years ago
- Related to Bug #21482: "file changed as we read it" in ceph-deploy-jewel added