Project

General

Profile

Bug #24452

Backfill hangs in a test case in master not mimic

Added by David Zafman about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
Start date:
06/07/2018
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

../qa/run-standalone.sh "osd-backfill-stats.sh TEST_backfill_down_out" 2>&1 | tee obs.log

This test times out waiting for pg to go clean.

$ ceph -c td/recout/ceph.conf pg dump pgs
dumped pgs
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE                                           STATE_STAMP                VERSION REPORTED UP    UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN
1.0         200                  0      200         0       0     0 100      100 undersized+degraded+remapped+backfilling+peered 2018-06-07 21:18:47.197271  30'200   35:240 [1,2]          1    [1]              1        0'0 2018-06-07 21:18:25.325401             0'0 2018-06-07 21:18:25.325401             0

Related issues

Related to RADOS - Bug #38155: PG stuck in undersized+degraded+remapped+backfill_toofull+peered New 02/02/2019
Duplicated by RADOS - Bug #24454: failed to recover before timeout expired Duplicate 06/08/2018

History

#1 Updated by David Zafman about 1 year ago

  • Description updated (diff)

#2 Updated by David Zafman about 1 year ago

  • Description updated (diff)

#3 Updated by David Zafman about 1 year ago

Final messages on primary during backfill about pg 1.0.

2018-06-07 21:18:47.306 7fdc6e7bd700 20 osd.1 pg_epoch: 35 pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}]  peer shard 2 backfill BackfillInfo(MIN-MIN 0 objects)
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 pg_epoch: 35 pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}]  scanning peer osd.2 from MIN
2018-06-07 21:18:47.306 7fdc6e7bd700 20 osd.1 35 share_map_peer 0x557524024e00 already has epoch 35
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 pg_epoch: 35 pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}] start_recovery_op MAX
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 35 start_recovery_op pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 rops=1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}] MAX (0/3 rops)
2018-06-07 21:18:47.306 7fdc6e7bd700  5 osd.1 pg_epoch: 35 pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 rops=1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}] backfill_pos is MIN
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 pg_epoch: 35 pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 rops=1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}] starting new_last_backfill at MIN
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 pg_epoch: 35 pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 rops=1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}] possible new_last_backfill at MIN
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 pg_epoch: 35 pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 rops=1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}] final new_last_backfill at MIN
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 pg_epoch: 35 pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 rops=1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}]  started 1
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 35 do_recovery started 1/1 on pg[1.0( v 30'200 (30'100,30'200] local-lis/les=29/30 n=200 ec=26/26 lis/c 29/29 les/c/f 30/30/0 33/34/26) [1,2]/[1] backfill=[2] r=0 lpr=34 pi=[29,34)/1 rops=1 bft=2 crt=30'200 lcod 30'199 mlcod 0'0 undersized+degraded+remapped+backfilling+peered mbc={}]
2018-06-07 21:18:47.306 7fdc6e7bd700 10 osd.1 35 release_reserved_pushes(1), recovery_ops_reserved 1 -> 0
2018-06-07 21:18:47.306 7fdc6e7bd700 20 osd.1 op_wq(0) _process empty q, waiting

#4 Updated by David Zafman about 1 year ago

  • Assignee changed from David Zafman to Sage Weil

#5 Updated by Sage Weil about 1 year ago

  • Status changed from New to Need Review
  • Priority changed from High to Immediate

#6 Updated by Sage Weil about 1 year ago

  • Duplicated by Bug #24454: failed to recover before timeout expired added

#7 Updated by Kefu Chai about 1 year ago

  • Status changed from Need Review to Resolved

#8 Updated by David Zafman 7 months ago

  • Related to Bug #38155: PG stuck in undersized+degraded+remapped+backfill_toofull+peered added

Also available in: Atom PDF