Bug #13359: giant stale requests - Ceph - Ceph

Actions

Copy link

Bug #13359

closed

giant stale requests

Added by Loïc Dachary over 8 years ago. Updated over 8 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

upgrade/hammer

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

http://pulpito.ceph.com/loic-2015-10-05_01:41:20-upgrade:hammer-hammer-backports---basic-vps/1088492/

upgrade:hammer/older/{0-cluster/start.yaml 1-install/latest_giant_release.yaml
    2-workload/testrados.yaml 3-upgrade-sequence/upgrade-osd-mon-mds.yaml 4-final/{monthrash.yaml
    osdthrash.yaml testrados.yaml} distros/centos_6.5.yaml}

2015-10-04T20:51:17.163 INFO:tasks.rados.rados.0.vpm159.stdout:1899:  expect (ObjNum 694 snap 234 seq_num 694)
2015-10-04T20:51:19.456 INFO:tasks.ceph.osd.0.vpm186.stdout:starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2015-10-04T20:51:20.148 INFO:tasks.ceph.osd.0.vpm186.stderr:2015-10-04 23:51:20.132469 7f77517ae800 -1 filestore(/var/lib/ceph/osd/ceph-0) FileStore::mount : stale version stamp detected: 3. Proceeding, do_update is set, performing disk format upgrade.
2015-10-04T20:51:20.777 DEBUG:teuthology.misc:6 of 6 OSDs are up
2015-10-04T20:51:20.778 INFO:teuthology.orchestra.run.vpm186:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2015-10-04T20:51:20.807 INFO:tasks.ceph.osd.0.vpm186.stderr:2015-10-04 23:51:20.791634 7f77517ae800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2015-10-04T20:51:21.380 INFO:tasks.ceph.osd.0.vpm186.stderr:2015-10-04 23:51:21.365391 7f77517ae800 -1 osd.0 518 PGs are upgrading
2015-10-04T20:51:22.215 INFO:tasks.ceph.osd.0.vpm186.stderr:2015-10-04 23:51:22.198976 7f77517ae800 -1 osd.0 518 log_to_monitors {default=true}
2015-10-04T20:51:22.291 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 7 requests are blocked > 32 sec
2015-10-04T20:51:25.787 INFO:tasks.rados.rados.0.vpm159.stdout:1893:  finishing write tid 1 to vpm1599889-37
2015-10-04T20:51:25.787 INFO:tasks.rados.rados.0.vpm159.stdout:1893:  finishing write tid 2 to vpm1599889-37
...
2015-10-04T20:52:24.722 INFO:tasks.rados.rados.0.vpm159.stdout:1996:  finishing write tid 7 to vpm1599889-32
2015-10-04T20:52:24.722 INFO:tasks.rados.rados.0.vpm159.stdout:update_object_version oid 32 v 1118 (ObjNum 767 snap 260 seq_num 767) dirty exists
2015-10-04T20:52:24.723 INFO:tasks.rados.rados.0.vpm159.stdout:1996: done (1 left)
2015-10-04T20:52:24.723 INFO:tasks.rados.rados.0.vpm159.stdout:1999: done (0 left)
2015-10-04T20:52:24.836 INFO:tasks.rados.rados.0.vpm159.stderr:0 errors.
2015-10-04T20:52:24.836 INFO:tasks.rados.rados.0.vpm159.stderr:
2015-10-04T20:52:24.953 INFO:tasks.ceph.ceph_manager:removing pool_name unique_pool_0
2015-10-04T20:52:24.953 INFO:teuthology.orchestra.run.vpm186:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados rmpool unique_pool_0 unique_pool_0 --yes-i-really-really-mean-it'
2015-10-04T20:52:26.013 INFO:teuthology.orchestra.run.vpm186.stdout:successfully deleted pool unique_pool_0
2015-10-04T20:52:26.015 DEBUG:teuthology.parallel:result is None
2015-10-04T20:52:30.825 INFO:teuthology.orchestra.run.vpm186:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2015-10-04T20:52:31.178 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 6 requests are blocked > 32 sec
2015-10-04T20:52:38.179 INFO:teuthology.orchestra.run.vpm186:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2015-10-04T20:52:38.591 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 6 requests are blocked > 32 sec
...
2015-10-04T21:11:24.688 INFO:teuthology.orchestra.run.vpm186:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2015-10-04T21:11:25.042 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 6 requests are blocked > 32 sec
2015-10-04T21:11:26.042 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/ceph.py", line 1066, in restart
    healthy(ctx=ctx, config=None)
  File "/var/lib/teuthworker/src/ceph-qa-suite_hammer/tasks/ceph.py", line 972, in healthy
    remote=mon0_remote,
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 876, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy' reached maximum tries (150) after waiting for 900 seconds

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Subject changed from wait_until_healthy: to wait_until_healthy: HEALTH_WARN 6 requests are blocked > 32 sec
Description updated (diff)

Actions

Copy link

Updated by Sage Weil over 8 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by Samuel Just over 8 years ago

Assignee set to Samuel Just

Actions

Copy link

Updated by Samuel Just over 8 years ago

Subject changed from wait_until_healthy: HEALTH_WARN 6 requests are blocked > 32 sec to wait_until_healthy: timed out

Actions

Copy link

Updated by Samuel Just over 8 years ago

I think /a/loic-2015-10-05_01:41:20-upgrade:hammer-hammer-backports---basic-vps/1088508/remote is the actual path -- the one above seems to be another run.

Actions

Copy link

Updated by Samuel Just over 8 years ago

Subject changed from wait_until_healthy: timed out to giant stale requests
Assignee deleted (~~Samuel Just~~)
Priority changed from Urgent to High

I think this is a bug in giant. There seem to be some old requests stuck in the op tracker which I think actually completed. Anyway, not high priority.

Actions

Copy link

Updated by David Zafman over 8 years ago

Status changed from New to Resolved

Giant EOL

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #13359

giant stale requests

Updated by Loïc Dachary over 8 years ago

Updated by Sage Weil over 8 years ago

Updated by Samuel Just over 8 years ago

Updated by Samuel Just over 8 years ago

Updated by Samuel Just over 8 years ago

Updated by Samuel Just over 8 years ago

Updated by David Zafman over 8 years ago