Bug #10525
closed"HEALTH_WARN 44 pgs peering" failure in upgrade:firefly-firefly-distro-basic-vps run
0%
Description
Jobs failed - ['698004', '698005', '698006', '698007', '698008', '698009', '698010', '698011', '698012', '698013', '698015']
Logs for one - http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-11_17:00:03-upgrade:firefly-firefly-distro-basic-vps/698005/
2015-01-12T18:12:15.411 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 44 pgs peering; 55 pgs stuck inactive; 55 pgs stuck unclean 2015-01-12T18:12:22.412 INFO:teuthology.orchestra.run.vpm198:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health' 2015-01-12T18:12:22.596 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 44 pgs peering; 55 pgs stuck inactive; 55 pgs stuck unclean 2015-01-12T18:12:29.597 INFO:teuthology.orchestra.run.vpm198:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health' 2015-01-12T18:12:29.800 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 44 pgs peering; 55 pgs stuck inactive; 55 pgs stuck unclean 2015-01-12T18:12:30.799 ERROR:teuthology.parallel:Exception in parallel execution Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__ for result in self: File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task return fn(**kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task mgr.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph.py", line 1086, in restart healthy(ctx=ctx, config=None) File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph.py", line 994, in healthy remote=mon0_remote, File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 853, in wait_until_healthy while proceed(): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 133, in __call__ raise MaxWhileTries(error_msg) MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds 2015-01-12T18:12:30.802 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 53, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task return fn(**kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 43, in task p.spawn(_run_spawned, ctx, confg, taskname) File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__ for result in self: File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task return fn(**kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task mgr.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph.py", line 1086, in restart healthy(ctx=ctx, config=None) File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph.py", line 994, in healthy remote=mon0_remote, File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 853, in wait_until_healthy while proceed(): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 133, in __call__ raise MaxWhileTries(error_msg) MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
Updated by Sage Weil over 9 years ago
- Assignee set to Yuri Weinstein
- Priority changed from Urgent to Immediate
the older releases need osd map max advance = 100 in the conf to work around the peering queue bug.
Updated by Yuri Weinstein over 9 years ago
Old tests until 0.80.8 don't have "peering queue fix" and have to be changed to have:
overrides: ceph: conf: osd: osd map max advance: 100
Updated by Yuri Weinstein over 9 years ago
- Status changed from New to Fix Under Review
- Assignee changed from Yuri Weinstein to Sage Weil
Updated by Yuri Weinstein over 9 years ago
And more here https://github.com/ceph/ceph-qa-suite/pull/292
tests:
http://pulpito.front.sepia.ceph.com/teuthology-2015-01-13_13:37:50-upgrade:firefly:singleton:versions-steps-x-firefly-distro-basic-multi/
http://pulpito.front.sepia.ceph.com/teuthology-2015-01-13_13:38:09-upgrade:firefly:singleton:versions-steps-firefly-distro-basic-multi/
Updated by Yuri Weinstein over 9 years ago
See https://github.com/ceph/ceph-qa-suite/pull/294 (Replaced step for v0.80.6 with v0.80.7)
Updated by Yuri Weinstein over 9 years ago
Final fixes - https://github.com/ceph/ceph-qa-suite/pull/297
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Resolved