Actions
Bug #10370
closed"MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds" in upgrade:dumpling-firefly-x:stress-split-next-distro-basic-vps run
Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2014-12-17T23:48:55.902 INFO:tasks.ceph.mon.c.vpm157.stderr:2014-12-18 02:48:55.901685 7f098fba4700 -1 mon.c@2(peon).mds e5 Missing health data for MDS 4112 2014-12-17T23:48:55.916 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 5 pgs peering; 7 pgs stuck inactive; 7 pgs stuck unclean 2014-12-17T23:49:02.917 INFO:teuthology.orchestra.run.vpm157:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health' 2014-12-17T23:49:03.175 INFO:tasks.ceph.mon.a.vpm157.stderr:2014-12-18 02:49:03.175628 7fdca36ec700 -1 mon.a@0(leader).mds e5 Missing health data for MDS 4112 2014-12-17T23:49:03.187 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 5 pgs peering; 7 pgs stuck inactive; 7 pgs stuck unclean 2014-12-17T23:49:04.187 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 55, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/ceph.py", line 1090, in restart healthy(ctx=ctx, config=None) File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/ceph.py", line 995, in healthy remote=mon0_remote, File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 853, in wait_until_healthy while proceed(): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 133, in __call__ raise MaxWhileTries(error_msg) MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
Updated by Greg Farnum over 9 years ago
- Project changed from CephFS to Ceph
- Subject changed from "mds e5 Missing health data for MDS 4112" in upgrade:dumpling-firefly-x:stress-split-next-distro-basic-vps run to "MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds" in upgrade:dumpling-firefly-x:stress-split-next-distro-basic-vps run
Sure looks to me like the error here is the HEALTH_WARN on stuck PGs, rather than the MDS not reporting health (presumably because it's a Dumpling MDS with upgraded Giant monitors).
Updated by Samuel Just about 9 years ago
- Status changed from New to Can't reproduce
marking this as can't reproduce for now
Updated by Yuri Weinstein about 9 years ago
- Status changed from Can't reproduce to New
Reopening as looks similar and now we have coredump.
Run: http://pulpito.ceph.com/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/
Job: 761725
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/
In teuthology log:
2015-02-16T21:49:24.166 INFO:tasks.ceph.mon.a.plana80.stderr:2015-02-16 21:49:24.165894 7f9d53b30700 -1 mon.a@0(leader).mds e5 Missing health data for MDS 4112 2015-02-16T21:50:24.167 INFO:tasks.ceph.mon.a.plana80.stderr:2015-02-16 21:50:24.166225 7f9d53b30700 -1 mon.a@0(leader).mds e5 Missing health data for MDS 4112 2015-02-16T21:51:24.167 INFO:tasks.ceph.mon.a.plana80.stderr:2015-02-16 21:51:24.166554 7f9d53b30700 -1 mon.a@0(leader).mds e5 Missing health data for MDS 4112 2015-02-16T21:52:24.167 INFO:tasks.ceph.mon.a.plana80.stderr:2015-02-16 21:52:24.166884 7f9d53b30700 -1 mon.a@0(leader).mds e5 Missing health data for MDS 4112 2015-02-16T21:53:24.168 INFO:tasks.ceph.mon.a.plana80.stderr:2015-02-16 21:53:24.167222 7f9d53b30700 -1 mon.a@0(leader).mds e5 Missing health data for MDS 4112 2015-02-16T21:53:25.937 INFO:tasks.workunit.client.0.plana65.stderr:test_rbd.test_create_defaults ... 2015-02-16T21:53:25.937 INFO:tasks.workunit:Stopping ['rbd/test_librbd_python.sh'] on client.0... 2015-02-16T21:53:25.938 INFO:teuthology.orchestra.run.plana65:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0' 2015-02-16T21:53:25.948 ERROR:teuthology.parallel:Exception in parallel execution Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__ for result in self: File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/workunit.py", line 360, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 137, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run r.wait() File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait label=self.label) CommandFailedError: Command failed (workunit test rbd/test_librbd_python.sh) on plana65 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=firefly TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.0/rbd/test_librbd_python.sh'
and then in /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz
/a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572246-2015-02-16 19:05:53.214124 7fe265cc0700 -1 *** Caught signal (Aborted) ** /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572320- in thread 7fe265cc0700 /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572344- /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572345: ceph version 0.80.8-49-g9ef7743 (9ef77430f3d46789b0ba1a2afa42729627734500) /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572421- 1: ceph-osd() [0x99bc2a] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572447- 2: (()+0xfcb0) [0x7fe27b6eccb0] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572480- 3: (gsignal()+0x35) [0x7fe279fd7425] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572518- 4: (abort()+0x17b) [0x7fe279fdab8b] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572555- 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe27a92a69d] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572625- 6: (()+0xb5846) [0x7fe27a928846] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572659- 7: (()+0xb5873) [0x7fe27a928873] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572693- 8: (()+0xb596e) [0x7fe27a92896e] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572727- 9: (ObjectStore::Transaction::decode(ceph::buffer::list::iterator&)+0x219) [0x83af79] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572814- 10: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0x5f3) [0x7d91f3] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572904- 11: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x55c) [0x91dedc] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400572995- 12: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1ee) [0x7ba37e] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400573100- 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x34a) [0x614a9a] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400573222- 14: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x628) [0x630ca8] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400573315- 15: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6772ac] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400573506- 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa6ee86] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400573574- 17: (ThreadPool::WorkThread::entry()+0x10) [0xa70ea0] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400573629- 18: (()+0x7e9a) [0x7fe27b6e4e9a] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400573663- 19: (clone()+0x6d) [0x7fe27a0953fd] /a/teuthology-2015-02-16_17:13:02-upgrade:firefly-x-hammer-distro-basic-multi/761725/remote/plana13/log/ceph-osd.12.log.gz:400573700- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Samuel Just about 9 years ago
- Status changed from New to Duplicate
That second one at least is #10908, which is fixed as of a few days ago.
Actions