Bug #53010
closedcehpadm rm-cluster does not clean up /var/run/ceph
0%
Description
teuthology.exceptions.CommandFailedError: Command failed with status 1: ['../src/stop.sh']
This API test failure has been occurring across PR Jenkins builds. It appears to show up sporadically-- occasionally a PR will pass the test, but others will fail. This failure does not seem related to any changes in the PRs in which it occurs.
Here, I have copied the Python Traceback as well as some surrounding output to provide more context.
Collecting pytz
Using cached pytz-2021.3-py2.py3-none-any.whl (503 kB)
Installing collected packages: more-itertools, pytz, jaraco.functools, tempora, repoze.lru, portend, idna, cheroot, chardet, Routes, requests, pyopenssl, PyJWT, CherryPy, ceph, bcrypt
Attempting uninstall: idna
Found existing installation: idna 3.3
Uninstalling idna-3.3:
Successfully uninstalled idna-3.3
Attempting uninstall: requests
Found existing installation: requests 2.26.0
Uninstalling requests-2.26.0:
Successfully uninstalled requests-2.26.0
Attempting uninstall: PyJWT
Found existing installation: PyJWT 2.3.0
Uninstalling PyJWT-2.3.0:
Successfully uninstalled PyJWT-2.3.0
Running setup.py develop for ceph
Attempting uninstall: bcrypt
Found existing installation: bcrypt 3.2.0
Uninstalling bcrypt-3.2.0:
Successfully uninstalled bcrypt-3.2.0
Successfully installed CherryPy-13.1.0 PyJWT-2.0.1 Routes-2.4.1 bcrypt-3.1.4 ceph-1.0.0 chardet-4.0.0 cheroot-8.5.2 idna-2.10 jaraco.functools-3.3.0 more-itertools-4.1.0 portend-3.0.0 pyopenssl-21.0.0 pytz-2021.3 repoze.lru-0.7 requests-2.25.1 tempora-4.1.2
/tmp/tmp.mAQFsq4fJ8
Processing /home/jenkins-build/.cache/pip/wheels/d8/81/0a/fae9efd3c9c706cefa25842310896e727a46567f2dc2dac6a8/coverage-4.5.2-cp38-cp38-linux_x86_64.whl
Installing collected packages: coverage
Successfully installed coverage-4.5.2
Cannot find device "ceph-brx"
2021-10-21 14:21:10,353.353 INFO:__main__:Creating cluster with 1 MDS daemons
2021-10-21 14:21:10,354.354 INFO:__main__:
tearing down the cluster...
rm: cannot remove '/var/run/ceph': Permission denied
Using guessed paths /home/jenkins-build/build/workspace/ceph-api/build/lib/ ['/home/jenkins-build/build/workspace/ceph-api/qa', '/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-api/src/pybind']
Traceback (most recent call last):
File "../qa/tasks/vstart_runner.py", line 1522, in <module>
exec_test()
File "../qa/tasks/vstart_runner.py", line 1357, in exec_test
teardown_cluster()
File "../qa/tasks/vstart_runner.py", line 1091, in teardown_cluster
remote.run(args=[os.path.join(SRC_PREFIX, "stop.sh")], timeout=60)
File "../qa/tasks/vstart_runner.py", line 410, in run
return self._do_run(**kwargs)
File "../qa/tasks/vstart_runner.py", line 478, in _do_run
proc.wait()
File "../qa/tasks/vstart_runner.py", line 221, in wait
raise CommandFailedError(self.args, self.exitstatus)
teuthology.exceptions.CommandFailedError: Command failed with status 1: ['../src/stop.sh']
find: ‘/home/jenkins-build/build/workspace/ceph-api/build/out’: No such file or directory
Sample run:
Updated by Laura Flores over 2 years ago
The issue seems to occur during a "tearing down the cluster..." step.
Successful API test run:
2021-10-19 21:36:10,384.384 INFO:__main__:Creating cluster with 1 MDS daemons
2021-10-19 21:36:10,384.384 INFO:__main__:
tearing down the cluster...
2021-10-19 21:36:12,050.050 INFO:__main__:
ceph cluster torn down
2021-10-19 21:36:12,059.059 INFO:__main__:
running vstart.sh now...
2021-10-19 21:37:08,783.783 INFO:__main__:
vstart.sh finished running
Using guessed paths /home/jenkins-build/build/workspace/ceph-api/build/lib/ ['/home/jenkins-build/build/workspace/ceph-api/qa', '/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-api/src/pybind']
Failed API test run:
2021-10-22 02:43:37,352.352 INFO:__main__:Creating cluster with 1 MDS daemons
2021-10-22 02:43:37,353.353 INFO:__main__:
tearing down the cluster...
rm: cannot remove '/var/run/ceph': Permission denied
Using guessed paths /home/jenkins-build/build/workspace/ceph-api/build/lib/ ['/home/jenkins-build/build/workspace/ceph-api/qa', '/home/jenkins-build/teuthology', '/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-api/src/pybind']
Perhaps we need to specify SUDO to ensure that /var/run/ceph can be accessed?
Updated by Ernesto Puerta over 2 years ago
- Description updated (diff)
- Status changed from New to In Progress
- Assignee set to Ernesto Puerta
Updated by Ernesto Puerta over 2 years ago
David found that the issue could come from leftovers of this Jenkins job: https://github.com/ceph/ceph-build/pull/1922/#issuecomment-952062596
The underlying issue could be in cephadm, as it seems that cephadm rm-cluster --fsid $FSID --force
is not enough for cleaning up all the stuff in /var/run/ceph
Updated by Sebastian Wagner over 2 years ago
- Project changed from teuthology to Orchestrator
- Subject changed from teuthology.exceptions.CommandFailedError: Command failed with status 1: ['../src/stop.sh'] to cehpadm rm-cluster does not clean up /var/run/ceph
- Description updated (diff)
- Category changed from QA Suite to cephadm (binary)
Updated by Sebastian Wagner over 2 years ago
seems as if cephadm doesn not clean up /var/run/ceph
Updated by Sebastian Wagner over 2 years ago
- Related to Bug #46655: cephadm rm-cluster: Systemd ceph.target not deleted added
Updated by Sebastian Wagner over 2 years ago
- Status changed from In Progress to New
- Assignee deleted (
Ernesto Puerta)
Updated by Sebastian Wagner over 2 years ago
- Related to Bug #44669: cephadm: rm-cluster should clean up /etc/ceph added
Updated by Sebastian Wagner over 2 years ago
- Related to Feature #53815: cephadm rm-cluster should delete log files added
Updated by Redouane Kachach Elhichou about 2 years ago
- Assignee set to Redouane Kachach Elhichou
Updated by Redouane Kachach Elhichou about 2 years ago
- Related to Bug #54018: Suspicious behavior when deleting a cluster (by running cephadm rm-cluster) added
Updated by Redouane Kachach Elhichou about 2 years ago
- Status changed from New to Fix Under Review
Updated by Redouane Kachach Elhichou about 2 years ago
Fixed by: https://github.com/ceph/ceph/pull/44779
Updated by Redouane Kachach Elhichou about 2 years ago
- Status changed from Fix Under Review to Closed
Updated by Redouane Kachach Elhichou about 2 years ago
- Status changed from Closed to Resolved
Updated by Redouane Kachach Elhichou about 2 years ago
- Pull request ID set to 44779
Updated by Redouane Kachach Elhichou about 2 years ago
- Related to Bug #54142: quincy cephadm-purge-cluster needs work added