Bug #15519: failed to recover before timeout expired - Ceph - Ceph

Actions

Copy link

Bug #15519

closed

failed to recover before timeout expired

Added by David Zafman about 8 years ago. Updated over 7 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

rados

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

http://pulpito.ceph.com/dzafman-2016-04-14_10:27:09-rados:thrash-jewel---basic-smithi/129483/

All 6 OSDs are up and in.

u'osdmap': {u'osdmap': {u'full': False, u'nearfull': False, u'num_osds': 6, u'num_up_osds': 6, u'epoch': 613, u'num_in_osds': 6, u'num_remapped_pgs': 8}}

There is a massive teuthology.log because the thrasher background threads didn't stop after this error:

2016-04-14T12:03:42.553 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/var/lib/teuthworker/src/ceph-qa-suite_wip-8885/tasks/ceph_manager.py", line 657, in wrapper
return func(self)
File "/var/lib/teuthworker/src/ceph-qa-suite_wip-8885/tasks/ceph_manager.py", line 765, in do_thrash
timeout=self.config.get('timeout')
File "/var/lib/teuthworker/src/ceph-qa-suite_wip-8885/tasks/ceph_manager.py", line 1713, in wait_for_recovery
'failed to recover before timeout expired'
AssertionError: failed to recover before timeout expired

1.1a    5       0       0       10      0       14483201        129     129     active+remapped+wait_backfill   2016-04-14 18:43:24.346042      213'400 546:191 [1,4]   1       [1,2]   1       113'240 2016-04-14 18:40:48.438370  0'0     2016-04-14 18:38:20.120679
1.18    0       0       3       0       0       495203  83      83      active+recovery_wait+degraded   2016-04-14 18:43:19.074773      206'83  545:47  [2,5]   2       [2,5]   2       70'34   2016-04-14 18:39:45.187599  0'0     2016-04-14 18:38:20.120661
1.17    4       0       0       16      0       9283748 113     113     active+remapped+wait_backfill   2016-04-14 18:43:30.487653      140'217 548:122 [4,0]   4       [1,3]   1       0'0     2016-04-14 18:38:20.120650  0'0     2016-04-14 18:38:20.120650
1.9     0       0       2       0       0       1307115 80      80      active+recovery_wait+degraded   2016-04-14 18:43:24.342528      185'80  546:87  [1,4]   1       [1,4]   1       0'0     2016-04-14 18:38:20.120670  0'0     2016-04-14 18:38:20.120670
1.e     3       0       2       0       0       8132857 176     176     active+recovery_wait+degraded   2016-04-14 18:43:23.516688      183'176 534:126 [1,4]   1       [1,4]   1       0'0     2016-04-14 18:38:20.120733  0'0     2016-04-14 18:38:20.120733
1.11    4       0       2       0       0       8218136 41      41      active+recovery_wait+degraded   2016-04-14 18:43:32.878493      214'241 549:214 [1,0]   1       [1,0]   1       176'237 2016-04-14 18:42:12.089802  70'119  2016-04-14 18:39:42.471949
1.1e    3       0       9       0       0       7162000 165     165     active+recovery_wait+degraded   2016-04-14 18:43:32.874862      185'165 549:29  [3,0]   3       [3,0]   3       0'0     2016-04-14 18:38:20.120733  0'0     2016-04-14 18:38:20.120733
1.22    4       0       4       12      0       10612650        133     133     undersized+degraded+remapped+wait_backfill+peered       2016-04-14 18:43:25.645207      182'275 546:143 [2,5]   2       [4]4    64'140  2016-04-14 18:39:36.187705      0'0     2016-04-14 18:38:20.120599
1.26    2       0       4       0       0       6800651 166     166     active+recovery_wait+degraded   2016-04-14 18:43:25.642740      208'166 546:24  [1,4]   1       [1,4]   1       69'79   2016-04-14 18:39:47.155539  0'0     2016-04-14 18:38:20.120641
1.2d    6       0       0       6       0       15427476        121     121     active+remapped+backfilling     2016-04-14 18:43:16.464681      216'448 545:338 [1,5]   1       [1,3]   1       67'194  2016-04-14 18:39:39.106077  0'0     2016-04-14 18:38:20.120720
1.2c    5       0       5       5       0       10473475        12      12      undersized+degraded+remapped+wait_backfill+peered       2016-04-14 18:43:25.649851      185'224 546:139 [3,4]   3       [3]3        69'131  2016-04-14 18:39:49.157256      0'0     2016-04-14 18:38:20.120704
1.35    3       0       0       6       0       7792057 68      68      active+remapped+wait_backfill   2016-04-14 18:43:17.895333      215'255 545:194 [3,5]   3       [3,2]   3       67'136  2016-04-14 18:39:38.190855  0'0     2016-04-14 18:38:20.120631
1.37    3       0       0       6       0       14833959        230     230     active+remapped+wait_backfill   2016-04-14 18:43:24.343021      185'334 546:161 [4,1]   4       [1,3]   1       0'0     2016-04-14 18:38:20.120650  0'0     2016-04-14 18:38:20.120650
1.3a    5       0       0       10      0       10886470        130     130     active+remapped+wait_backfill   2016-04-14 18:43:24.340794      210'401 546:188 [1,4]   1       [1,2]   1       113'240 2016-04-14 18:40:48.438370  0'0     2016-04-14 18:38:20.120679

Actions

Copy link

Updated by Sage Weil over 7 years ago

Status changed from New to Can't reproduce

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #15519

failed to recover before timeout expired

Updated by Sage Weil over 7 years ago