Project

General

Profile

Actions

Bug #9179

closed

unfound objects, recovery timeout

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

402/7722 unfound (

all osds up

ubuntu@teuthology:/a/teuthology-2014-08-19_02:30:02-rados-firefly-distro-basic-multi/435301

Actions #1

Updated by Samuel Just over 9 years ago

2014-08-19 13:28:43.734887 7fc31beac700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] recover_primary 1f7a3a78/plana3025712-2298/head//4 61'2448 (unfound) (missing) (missing head)
2014-08-19 13:28:43.735139 7fc31beac700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] recover_primary 1e2ac248/plana3025712-2301/head//4 61'2451 (unfound) (missing) (missing head)
2014-08-19 13:28:43.745228 7fc31beac700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] started 0
2014-08-19 13:28:43.745502 7fc31beac700 10 osd.4 69 do_recovery started 0/1 on pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402]
2014-08-19 13:28:43.748653 7fc31beac700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] discover_all_missing 402 missing, 402 unfound
2014-08-19 13:28:43.749155 7fc31beac700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] discover_all_missing: osd.0: requesting pg_missing_t
2014-08-19 13:28:43.758733 7fc31beac700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] discover_all_missing: osd.3: requesting pg_missing_t
2014-08-19 13:28:43.759025 7fc31beac700 20 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] discover_all_missing: osd.5: we already have pg_missing_t
2014-08-19 13:28:43.762184 7f7218d3a700 10 osd.0 pg_epoch: 69 pg[4.0( v 61'2451 (0'0,61'2451] local-les=50 n=570 ec=11 les/c 50/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1824 lcod 61'2450 inactive NOTIFY] handle_query query(fulllog 0'0) from replica 4
2014-08-19 13:28:43.762294 7f7212d2e700 10 osd.0 pg_epoch: 69 pg[4.0( v 61'2451 (0'0,61'2451] local-les=50 n=570 ec=11 les/c 50/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1824 lcod 61'2450 inactive NOTIFY] handle_peering_event: epoch_sent: 69 epoch_requested: 69 MQuery from 4 query_epoch 69 query: query(fulllog 0'0)
2014-08-19 13:28:43.762322 7f7212d2e700 10 osd.0 pg_epoch: 69 pg[4.0( v 61'2451 (0'0,61'2451] local-les=50 n=570 ec=11 les/c 50/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1824 lcod 61'2450 inactive NOTIFY] log request from 4
2014-08-19 13:28:43.762339 7f7212d2e700 10 osd.0 pg_epoch: 69 pg[4.0( v 61'2451 (0'0,61'2451] local-les=50 n=570 ec=11 les/c 50/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1824 lcod 61'2450 inactive NOTIFY] sending info+missing+full log
2014-08-19 13:28:43.763254 7f7212d2e700 10 osd.0 pg_epoch: 69 pg[4.0( v 61'2451 (0'0,61'2451] local-les=50 n=570 ec=11 les/c 50/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1824 lcod 61'2450 inactive NOTIFY] sending log((0'0,61'2451], crt=47'1824) missing(0)
2014-08-19 13:28:43.766403 7f94d16ec700 10 osd.3 pg_epoch: 69 pg[4.0( v 67'2472 lc 33'946 (0'0,67'2472] local-les=64 n=572 ec=11 les/c 64/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1829 inactive NOTIFY m=198] handle_query query(fulllog 0'0) from replica 4
2014-08-19 13:28:43.766488 7f94cbee1700 10 osd.3 pg_epoch: 69 pg[4.0( v 67'2472 lc 33'946 (0'0,67'2472] local-les=64 n=572 ec=11 les/c 64/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1829 inactive NOTIFY m=198] handle_peering_event: epoch_sent: 69 epoch_requested: 69 MQuery from 4 query_epoch 69 query: query(fulllog 0'0)
2014-08-19 13:28:43.768107 7f94cbee1700 10 osd.3 pg_epoch: 69 pg[4.0( v 67'2472 lc 33'946 (0'0,67'2472] local-les=64 n=572 ec=11 les/c 64/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1829 inactive NOTIFY m=198] log request from 4
2014-08-19 13:28:43.768270 7f94cbee1700 10 osd.3 pg_epoch: 69 pg[4.0( v 67'2472 lc 33'946 (0'0,67'2472] local-les=64 n=572 ec=11 les/c 64/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1829 inactive NOTIFY m=198] sending info+missing+full log
2014-08-19 13:28:43.768994 7f94cbee1700 10 osd.3 pg_epoch: 69 pg[4.0( v 67'2472 lc 33'946 (0'0,67'2472] local-les=64 n=572 ec=11 les/c 64/31 68/68/63) [4,5] r=-1 lpr=68 pi=30-67/5 crt=47'1829 inactive NOTIFY m=198] sending log((0'0,67'2472], crt=47'1829) missing(198)
2014-08-19 13:28:43.801475 7fc31ceae700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] handle_peering_event: epoch_sent: 69 epoch_requested: 69 MLogRec from 3
2014-08-19 13:28:43.801859 7fc31ceae700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] state<Started/Primary/Active>: searching osd.3 log for unfound items
2014-08-19 13:28:43.802147 7fc31ceae700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] proc_replica_log for osd.3: 4.0( v 67'2472 lc 33'946 (0'0,67'2472] local-les=64 n=572 ec=11 les/c 64/31 68/68/63) log((0'0,67'2472], crt=47'1829) missing(198)
2014-08-19 13:28:43.806767 7fc31ceae700 10 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] peer osd.3 now 4.0( v 67'2472 lc 33'946 (0'0,67'2472] local-les=64 n=572 ec=11 les/c 64/31 68/68/63) missing(198)
2014-08-19 13:28:43.807248 7fc31ceae700 20 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] after missing 0/hit_set_4.0_archive_2014-08-19 12:58:46.677440_2014-08-19 12:59:27.439732/head/.ceph-internal/4 need 40'1393 have 40'1393
2014-08-19 13:28:43.807692 7fc31ceae700 20 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] after missing 20484000/plana3025712-1360/head//4 need 40'1424 have 0'0
2014-08-19 13:28:43.808131 7fc31ceae700 20 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] after missing 55275900/plana3025712-1384/head//4 need 40'1459 have 0'0
2014-08-19 13:28:43.808565 7fc31ceae700 20 osd.4 pg_epoch: 69 pg[4.0( v 69'11622 lc 33'847 (33'847,69'11622] local-les=69 n=987 ec=11 les/c 69/31 68/68/63) [4,5] r=0 lpr=68 pi=11-67/6 rops=8 crt=61'2451 mlcod 33'847 active+recovering m=402 u=402] after missing b999ca00/plana3025712-1330/head//4 need 40'1392 have 0'0

Actions #2

Updated by Samuel Just over 9 years ago

  • Assignee set to Samuel Just
Actions #3

Updated by Samuel Just over 9 years ago

For some reason, we waited a really long time to call discover_all_missing.

Actions #4

Updated by Samuel Just over 9 years ago

  • Status changed from New to 7

Ok, discover_all_missing is called in the wrong place, we don't have unfound yet there in activate.

Actions #5

Updated by Samuel Just over 9 years ago

  • Status changed from 7 to Pending Backport
Actions #6

Updated by Samuel Just over 9 years ago

wip-sam-testing-firefly

Actions #7

Updated by Samuel Just over 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF