Project

General

Profile

Actions

Bug #8850

closed

ceph-deploy tests fail during tar due to file changed; incomplete shutdown?

Added by Sage Weil almost 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/a/teuthology-2014-07-15_19:08:01-ceph-deploy-dumpling-testing-basic-plana/363933

and others.

2014-07-15T23:26:28.943 INFO:teuthology.task.ceph_deploy:Stopping ceph...
2014-07-15T23:26:28.943 DEBUG:teuthology.orchestra.run:Running [10.214.131.7]: 'sudo stop ceph-all || sudo service ceph stop'
2014-07-15T23:26:29.459 INFO:teuthology.orchestra.run.out:[10.214.131.7]: ceph-all stop/waiting
2014-07-15T23:26:29.462 DEBUG:teuthology.orchestra.run:Running [10.214.131.6]: 'sudo stop ceph-all || sudo service ceph stop'
2014-07-15T23:26:29.482 INFO:teuthology.orchestra.run.out:[10.214.131.6]: ceph-all stop/waiting
2014-07-15T23:26:29.485 DEBUG:teuthology.orchestra.run:Running [10.214.132.17]: 'sudo stop ceph-all || sudo service ceph stop'
2014-07-15T23:26:30.591 INFO:teuthology.orchestra.run.out:[10.214.132.17]: ceph-all stop/waiting
2014-07-15T23:26:30.593 INFO:teuthology.task.ceph_deploy:Archiving mon data...
2014-07-15T23:26:30.596 DEBUG:teuthology.misc:Transferring archived files from ubuntu@plana61.front.sepia.ceph.com:/var/lib/ceph/mon to /var/lib/teuthworker/archive/teuthology-2014-07-15_19:08:01-ceph-deploy-dumpling-testing-basic-plana/363933/data/mon.a.tgz
2014-07-15T23:26:30.598 DEBUG:teuthology.orchestra.run:Running [10.214.132.17]: 'sudo tar cz -f - -C /var/lib/ceph/mon -- .'
2014-07-15T23:26:31.097 INFO:teuthology.orchestra.run.err:[10.214.132.17]: tar: ./ceph-plana61/store.db/000006.log: file changed as we read it
2014-07-15T23:26:31.102 INFO:teuthology.task.ceph_deploy:Removing ceph-deploy ...
2014-07-15T23:26:31.103 DEBUG:teuthology.orchestra.run:Running [10.214.132.17]: 'rm -rf /home/ubuntu/cephtest/ceph-deploy'
2014-07-15T23:26:31.205 INFO:teuthology.task.ceph:Removing shipped files: daemon-helper adjust-ulimits chdir-coredump valgrind.supp kcon_most...
2014-07-15T23:26:31.206 DEBUG:teuthology.orchestra.run:Running [10.214.131.7]: 'rm -rf -- /home/ubuntu/cephtest/daemon-helper /home/ubuntu/cephtest/adjust-ulimits /home/ubuntu/cephtest/chdir-coredump /home/ubuntu/cephtest/valgrind.supp /home/ubuntu/cephtest/kcon_mos
t'
2014-07-15T23:26:31.209 DEBUG:teuthology.orchestra.run:Running [10.214.131.6]: 'rm -rf -- /home/ubuntu/cephtest/daemon-helper /home/ubuntu/cephtest/adjust-ulimits /home/ubuntu/cephtest/chdir-coredump /home/ubuntu/cephtest/valgrind.supp /home/ubuntu/cephtest/kcon_mos
t'
2014-07-15T23:26:31.212 DEBUG:teuthology.orchestra.run:Running [10.214.132.17]: 'rm -rf -- /home/ubuntu/cephtest/daemon-helper /home/ubuntu/cephtest/adjust-ulimits /home/ubuntu/cephtest/chdir-coredump /home/ubuntu/cephtest/valgrind.supp /home/ubuntu/cephtest/kcon_mo
st'
2014-07-15T23:26:31.252 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x2f3e6d0>
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_dumpling/teuthology/run_tasks.py", line 49, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/teuthology_dumpling/teuthology/task/ceph_deploy.py", line 435, in task
    yield
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/teuthology_dumpling/teuthology/contextutil.py", line 35, in nested
    if exit(*exc):
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/src/teuthology_dumpling/teuthology/task/ceph_deploy.py", line 336, in build_ceph_cluster
    path + '/' + role + '.tgz')
  File "/home/teuthworker/src/teuthology_dumpling/teuthology/misc.py", line 508, in pull_directory_tarball
    proc.exitstatus.get()
  File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 223, in get
    raise self._exception
CommandFailedError: Command failed on 10.214.132.17 with status 1: 'sudo tar cz -f - -C /var/lib/ceph/mon -- .'

Actions #1

Updated by Alfredo Deza over 9 years ago

  • Assignee set to Alfredo Deza
Actions #2

Updated by Alfredo Deza over 9 years ago

  • Project changed from devops to teuthology
  • Category deleted (ceph-deploy)
Actions #3

Updated by Alfredo Deza over 9 years ago

  • Project changed from teuthology to devops

I initially thought that the ceph daemon was still running but according to upstart docs, this output:

stop/waiting

Means that the daemon has really, truly, stopped.

Instead of adding code that will check on the status and act on it, I am initially more inclined to spit out more logging
that will tell us the status, so that we are not in the dark, assuming what the state of the daemons is.

Actions #4

Updated by Alfredo Deza over 9 years ago

  • Status changed from New to 7

an initial take on getting more information on what is going on:

https://github.com/ceph/teuthology/pull/302/files

Actions #5

Updated by Sage Weil over 9 years ago

  • Status changed from 7 to Can't reproduce
Actions

Also available in: Atom PDF