Project

General

Profile

Actions

Bug #2044

closed

osd: pg stuck in active+backfill

Added by Josh Durgin about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

jmlowe ran into this on his cluster several times. The primary doing backfill failed to requeue the pg for recovery.

This was the last time the recovery thread ran:

2012-02-07 20:51:24.938132 7f0b4a5a6700 osd.10 2387 pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=2386 bft=0 lcod 0'0 mlcod 0'0 active
+backfill]  removing peer b7b82dc/rb.0.13.000000009306/head
2012-02-07 20:51:24.938176 7f0b4a5a6700 osd.10 2387 pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=2386 bft=0 lcod 0'0 mlcod 0'0 active
+backfill] send_remove_op 7b8182dc/rb.0.9.000000000e4d/head from osd.0 tid 2
2012-02-07 20:51:24.938228 7f0b4a5a6700 -- 149.165.228.11:6814/32218 --> osd.0 149.165.228.10:6802/12583 -- osd_sub_op(osd.10.0:2 0.2dc 7b8182dc/rb.0.9.000000000e4d/head [delete] v 219'18557 snapse
t=0=[]:[] snapc=0=[]) v1 -- ?+0 0x42a7b00
2012-02-07 20:51:24.938284 7f0b4a5a6700 osd.10 2387 pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=2386 bft=0 lcod 0'0 mlcod 0'0 active
+backfill] send_remove_op dbe182dc/rb.0.1a.000000000c57/head from osd.0 tid 3
2012-02-07 20:51:24.938361 7f0b4a5a6700 -- 149.165.228.11:6814/32218 --> osd.0 149.165.228.10:6802/12583 -- osd_sub_op(osd.10.0:3 0.2dc dbe182dc/rb.0.1a.000000000c57/head [delete] v 155'15389 snaps
et=0=[]:[] snapc=0=[]) v1 -- ?+0 0x42a8600
2012-02-07 20:51:24.938402 7f0b4a5a6700 osd.10 2387 pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=2386 bft=0 lcod 0'0 mlcod 0'0 active
+backfill] send_remove_op 295282dc/rb.0.18.000000002208/head from osd.0 tid 4
2012-02-07 20:51:24.938443 7f0b4a5a6700 -- 149.165.228.11:6814/32218 --> osd.0 149.165.228.10:6802/12583 -- osd_sub_op(osd.10.0:4 0.2dc 295282dc/rb.0.18.000000002208/head [delete] v 110'6343 snapse
t=0=[]:[] snapc=0=[]) v1 -- ?+0 0x42a8080
2012-02-07 20:51:24.938487 7f0b4a5a6700 osd.10 2387 pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=2386 bft=0 lcod 0'0 mlcod 0'0 active
+backfill] send_remove_op f4a982dc/rb.0.1a.0000000163fa/head from osd.0 tid 5
2012-02-07 20:51:24.938555 7f0b4a5a6700 -- 149.165.228.11:6814/32218 --> osd.0 149.165.228.10:6802/12583 -- osd_sub_op(osd.10.0:5 0.2dc f4a982dc/rb.0.1a.0000000163fa/head [delete] v 155'16264 snaps
et=0=[]:[] snapc=0=[]) v1 -- ?+0 0x42a9680
2012-02-07 20:51:24.938596 7f0b4a5a6700 osd.10 2387 pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=2386 bft=0 lcod 0'0 mlcod 0'0 active
+backfill] send_remove_op b7b82dc/rb.0.13.000000009306/head from osd.0 tid 6
2012-02-07 20:51:24.938636 7f0b4a5a6700 -- 149.165.228.11:6814/32218 --> osd.0 149.165.228.10:6802/12583 -- osd_sub_op(osd.10.0:6 0.2dc b7b82dc/rb.0.13.000000009306/head [delete] v 108'4426 snapset
=0=[]:[] snapc=0=[]) v1 -- ?+0 0x42a9100
2012-02-07 20:51:24.938678 7f0b4a5a6700 -- 149.165.228.11:6814/32218 --> osd.0 149.165.228.10:6802/12583 -- pg_backfill(progress 0.2dc e 2387/2387 lb afa092dc/rb.0.19.000000007971/head) v1 -- ?+0 0
x4b93b40
2012-02-07 20:51:24.938743 7f0b4a5a6700 osd.10 2387 pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=2386 bft=0 lcod 0'0 mlcod 0'0 active
+backfill]  peer num_objects now 0 / 1
2012-02-07 20:51:24.938784 7f0b4a5a6700 osd.10 2387 pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=2386 bft=0 lcod 0'0 mlcod 0'0 active
+backfill]  started 5
2012-02-07 20:51:24.938823 7f0b4a5a6700 osd.10 2387 do_recovery started 5 (0/5 rops) on pg[0.2dc( v 2025'2662 (1119'1662,2025'2662] n=1 ec=1 les/c 2387/2174 2384/2386/2386) [10,0]/[10,0,5] r=0 lpr=
2386 bft=0 lcod 0'0 mlcod 0'0 active+backfill]
Actions #1

Updated by Josh Durgin about 12 years ago

  • Status changed from New to 7
Actions #2

Updated by Sage Weil about 12 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF