Project

General

Profile

Actions

Bug #10525

closed

"HEALTH_WARN 44 pgs peering" failure in upgrade:firefly-firefly-distro-basic-vps run

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run http://pulpito.front.sepia.ceph.com/teuthology-2015-01-11_17:00:03-upgrade:firefly-firefly-distro-basic-vps/

Jobs failed - ['698004', '698005', '698006', '698007', '698008', '698009', '698010', '698011', '698012', '698013', '698015']

Logs for one - http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-11_17:00:03-upgrade:firefly-firefly-distro-basic-vps/698005/

2015-01-12T18:12:15.411 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 44 pgs peering; 55 pgs stuck inactive; 55 pgs stuck unclean
2015-01-12T18:12:22.412 INFO:teuthology.orchestra.run.vpm198:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2015-01-12T18:12:22.596 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 44 pgs peering; 55 pgs stuck inactive; 55 pgs stuck unclean
2015-01-12T18:12:29.597 INFO:teuthology.orchestra.run.vpm198:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2015-01-12T18:12:29.800 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 44 pgs peering; 55 pgs stuck inactive; 55 pgs stuck unclean
2015-01-12T18:12:30.799 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph.py", line 1086, in restart
    healthy(ctx=ctx, config=None)
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph.py", line 994, in healthy
    remote=mon0_remote,
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 853, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 133, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
2015-01-12T18:12:30.802 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 53, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 43, in task
    p.spawn(_run_spawned, ctx, confg, taskname)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph.py", line 1086, in restart
    healthy(ctx=ctx, config=None)
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph.py", line 994, in healthy
    remote=mon0_remote,
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 853, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 133, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
Actions #1

Updated by Sage Weil over 9 years ago

  • Assignee set to Yuri Weinstein
  • Priority changed from Urgent to Immediate

the older releases need osd map max advance = 100 in the conf to work around the peering queue bug.

Actions #2

Updated by Yuri Weinstein over 9 years ago

Old tests until 0.80.8 don't have "peering queue fix" and have to be changed to have:

overrides:
  ceph:
    conf:
      osd:
        osd map max advance: 100

See https://github.com/ceph/ceph-qa-suite/pull/289

Actions #3

Updated by Yuri Weinstein over 9 years ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Yuri Weinstein to Sage Weil
Actions #7

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF