Project

General

Profile

Actions

Bug #53010

closed

cehpadm rm-cluster does not clean up /var/run/ceph

Added by Laura Flores over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm (binary)
Target version:
-
% Done:

0%

Source:
Tags:
low-hanging-fruit
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

teuthology.exceptions.CommandFailedError: Command failed with status 1: ['../src/stop.sh']

This API test failure has been occurring across PR Jenkins builds. It appears to show up sporadically-- occasionally a PR will pass the test, but others will fail. This failure does not seem related to any changes in the PRs in which it occurs.

Here, I have copied the Python Traceback as well as some surrounding output to provide more context.

Collecting pytz
  Using cached pytz-2021.3-py2.py3-none-any.whl (503 kB)
Installing collected packages: more-itertools, pytz, jaraco.functools, tempora, repoze.lru, portend, idna, cheroot, chardet, Routes, requests, pyopenssl, PyJWT, CherryPy, ceph, bcrypt
  Attempting uninstall: idna
    Found existing installation: idna 3.3
    Uninstalling idna-3.3:
      Successfully uninstalled idna-3.3
  Attempting uninstall: requests
    Found existing installation: requests 2.26.0
    Uninstalling requests-2.26.0:
      Successfully uninstalled requests-2.26.0
  Attempting uninstall: PyJWT
    Found existing installation: PyJWT 2.3.0
    Uninstalling PyJWT-2.3.0:
      Successfully uninstalled PyJWT-2.3.0
  Running setup.py develop for ceph
  Attempting uninstall: bcrypt
    Found existing installation: bcrypt 3.2.0
    Uninstalling bcrypt-3.2.0:
      Successfully uninstalled bcrypt-3.2.0
Successfully installed CherryPy-13.1.0 PyJWT-2.0.1 Routes-2.4.1 bcrypt-3.1.4 ceph-1.0.0 chardet-4.0.0 cheroot-8.5.2 idna-2.10 jaraco.functools-3.3.0 more-itertools-4.1.0 portend-3.0.0 pyopenssl-21.0.0 pytz-2021.3 repoze.lru-0.7 requests-2.25.1 tempora-4.1.2
/tmp/tmp.mAQFsq4fJ8
Processing /home/jenkins-build/.cache/pip/wheels/d8/81/0a/fae9efd3c9c706cefa25842310896e727a46567f2dc2dac6a8/coverage-4.5.2-cp38-cp38-linux_x86_64.whl
Installing collected packages: coverage
Successfully installed coverage-4.5.2
Cannot find device "ceph-brx" 
2021-10-21 14:21:10,353.353 INFO:__main__:Creating cluster with 1 MDS daemons
2021-10-21 14:21:10,354.354 INFO:__main__:
tearing down the cluster...
rm: cannot remove '/var/run/ceph': Permission denied
Using guessed paths /home/jenkins-build/build/workspace/ceph-api/build/lib/ ['/home/jenkins-build/build/workspace/ceph-api/qa', '/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-api/src/pybind']
Traceback (most recent call last):
  File "../qa/tasks/vstart_runner.py", line 1522, in <module>
    exec_test()
  File "../qa/tasks/vstart_runner.py", line 1357, in exec_test
    teardown_cluster()
  File "../qa/tasks/vstart_runner.py", line 1091, in teardown_cluster
    remote.run(args=[os.path.join(SRC_PREFIX, "stop.sh")], timeout=60)
  File "../qa/tasks/vstart_runner.py", line 410, in run
    return self._do_run(**kwargs)
  File "../qa/tasks/vstart_runner.py", line 478, in _do_run
    proc.wait()
  File "../qa/tasks/vstart_runner.py", line 221, in wait
    raise CommandFailedError(self.args, self.exitstatus)
teuthology.exceptions.CommandFailedError: Command failed with status 1: ['../src/stop.sh']
find: ‘/home/jenkins-build/build/workspace/ceph-api/build/out’: No such file or directory
Sample run:

Related issues 5 (0 open5 closed)

Related to Orchestrator - Bug #46655: cephadm rm-cluster: Systemd ceph.target not deletedResolvedRedouane Kachach Elhichou

Actions
Related to Orchestrator - Bug #44669: cephadm: rm-cluster should clean up /etc/cephResolvedDaniel Pivonka

Actions
Related to Orchestrator - Feature #53815: cephadm rm-cluster should delete log filesResolvedRedouane Kachach Elhichou

Actions
Related to Orchestrator - Bug #54018: Suspicious behavior when deleting a cluster (by running cephadm rm-cluster)ResolvedRedouane Kachach Elhichou

Actions
Related to Orchestrator - Bug #54142: quincy cephadm-purge-cluster needs workResolvedRedouane Kachach Elhichou

Actions
Actions #1

Updated by Laura Flores over 2 years ago

The issue seems to occur during a "tearing down the cluster..." step.

Successful API test run:

2021-10-19 21:36:10,384.384 INFO:__main__:Creating cluster with 1 MDS daemons
2021-10-19 21:36:10,384.384 INFO:__main__:
tearing down the cluster...
2021-10-19 21:36:12,050.050 INFO:__main__:
ceph cluster torn down
2021-10-19 21:36:12,059.059 INFO:__main__:
running vstart.sh now...
2021-10-19 21:37:08,783.783 INFO:__main__:
vstart.sh finished running
Using guessed paths /home/jenkins-build/build/workspace/ceph-api/build/lib/ ['/home/jenkins-build/build/workspace/ceph-api/qa', '/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-api/src/pybind']

Failed API test run:

2021-10-22 02:43:37,352.352 INFO:__main__:Creating cluster with 1 MDS daemons
2021-10-22 02:43:37,353.353 INFO:__main__:
tearing down the cluster...
rm: cannot remove '/var/run/ceph': Permission denied
Using guessed paths /home/jenkins-build/build/workspace/ceph-api/build/lib/ ['/home/jenkins-build/build/workspace/ceph-api/qa', '/home/jenkins-build/teuthology', '/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-api/src/pybind']

Perhaps we need to specify SUDO to ensure that /var/run/ceph can be accessed?

Actions #2

Updated by Ernesto Puerta over 2 years ago

  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee set to Ernesto Puerta
Actions #3

Updated by Ernesto Puerta over 2 years ago

David found that the issue could come from leftovers of this Jenkins job: https://github.com/ceph/ceph-build/pull/1922/#issuecomment-952062596

The underlying issue could be in cephadm, as it seems that cephadm rm-cluster --fsid $FSID --force is not enough for cleaning up all the stuff in /var/run/ceph

Actions #4

Updated by Sebastian Wagner over 2 years ago

  • Project changed from teuthology to Orchestrator
  • Subject changed from teuthology.exceptions.CommandFailedError: Command failed with status 1: ['../src/stop.sh'] to cehpadm rm-cluster does not clean up /var/run/ceph
  • Description updated (diff)
  • Category changed from QA Suite to cephadm (binary)
Actions #5

Updated by Sebastian Wagner over 2 years ago

seems as if cephadm doesn not clean up /var/run/ceph

Actions #6

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #46655: cephadm rm-cluster: Systemd ceph.target not deleted added
Actions #7

Updated by Sebastian Wagner over 2 years ago

  • Status changed from In Progress to New
  • Assignee deleted (Ernesto Puerta)
Actions #8

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #44669: cephadm: rm-cluster should clean up /etc/ceph added
Actions #9

Updated by Sebastian Wagner over 2 years ago

  • Related to Feature #53815: cephadm rm-cluster should delete log files added
Actions #10

Updated by Sebastian Wagner about 2 years ago

  • Tags set to low-hanging-fruit
Actions #11

Updated by Redouane Kachach Elhichou about 2 years ago

  • Assignee set to Redouane Kachach Elhichou
Actions #12

Updated by Redouane Kachach Elhichou about 2 years ago

  • Related to Bug #54018: Suspicious behavior when deleting a cluster (by running cephadm rm-cluster) added
Actions #13

Updated by Redouane Kachach Elhichou about 2 years ago

  • Status changed from New to Fix Under Review
Actions #15

Updated by Redouane Kachach Elhichou about 2 years ago

  • Status changed from Fix Under Review to Closed
Actions #16

Updated by Redouane Kachach Elhichou about 2 years ago

  • Status changed from Closed to Resolved
Actions #17

Updated by Redouane Kachach Elhichou about 2 years ago

  • Pull request ID set to 44779
Actions #18

Updated by Redouane Kachach Elhichou about 2 years ago

  • Related to Bug #54142: quincy cephadm-purge-cluster needs work added
Actions

Also available in: Atom PDF