Bug #37439: Degraded PG does not discover remapped data on originating OSD - RADOS - Ceph

Actions

Copy link

Bug #37439

closed

Degraded PG does not discover remapped data on originating OSD

Added by Jonas Jelten over 5 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Category:

Backfill/Recovery

Target version:

% Done:

Source:

Tags:

Backport:

nautilus, mimic, luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v14.2.0

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

27288

Crash signature (v1):

Crash signature (v2):

Description

There seems to be an issue that an OSD is not queried for missing objects that were remapped, but the OSD for this is up. This happened in two different scenarios for us. In both, data is stored in EC pools (8+3).

Scenario 0¶

To remove a broken disk (e.g. osd.22), it is weighted to 0 with ceph osd out 22. Objects are remapped normally. During object movement, osd.22 is restarted (or crashes and then starts again). Now the bug shows up: Objects will become degraded and stay degraded, because osd.22 is not queried. ceph pq query shows:

    "might_have_unfound": [
      {
        "osd": "22(3)",
        "status": "not queried" 
      }
    ],

A workaround is to in the broken-disk osd temporarily. The osd is then queried and missing objects are discovered. Then, out the osd again. No objects are degraded any more and disk will be emptied.

Scenario 1¶

Add new disks to the cluster. Data is remapped to be transferred from the old disks (e.g. osd.19) to new disks (e.g. > osd.42).
When there is a restart an OSD of the old disks (or it restarts because of a crash), objects become degraded. The missing data is on the osd.19 but again it is not queried. ceph pg query shows:

    "might_have_unfound": [
      {
        "osd": "19(6)",
        "status": "not queried" 
      }
    ],

Only remapped data seems to be missing, if osd.19 is taken down, much more data is degraded. Mind that osd.19 is missing in the acting set in the current state of this PG:

    "up": [38, 36, 28, 17, 13, 39, 48, 10, 29, 5, 47],
    "acting": [36, 15, 28, 17, 13, 32, 2147483647, 10, 29, 5, 20],
    "backfill_targets": [
        "36(1)",
        "38(0)",
        "39(5)",
        "47(10)",
        "48(6)" 
    ],
    "acting_recovery_backfill": [
        "5(9)",
        "10(7)",
        "13(4)",
        "15(1)",
        "17(3)",
        "20(10)",
        "28(2)",
        "29(8)",
        "32(5)",
        "36(0)",
        "36(1)",
        "38(0)",
        "39(5)",
        "47(10)",
        "48(6)" 
    ],

For this scenario, I have not found a workaround yet. The cluster remains degraded until it has recovered by restoring the data.

So, overall I suspect there is a bug which prevents remapped pg data to be discovered. The PG already knows which OSD is the correct candidate, but does not query it.

Files

Download all files

ceph-osd.18.log.xz (27.4 KB) ceph-osd.18.log.xz	log level 20 of OSD 18	Jonas Jelten, 12/13/2018 12:59 PM
ceph-osd.38.log.xz (314 KB) ceph-osd.38.log.xz	log of primary for pg 6.65	Jonas Jelten, 04/01/2019 10:55 PM

Related issues 4 (1 open — 3 closed)

Actions

Copy link

Updated by Jonas Jelten over 5 years ago

As I can't edit the post...

To clarify: With missing I mean the parts of the erasure coded object so the object becomes degraded.

Actions

Copy link

Updated by Greg Farnum over 5 years ago

Priority changed from Normal to High

The first scenario definitely looks like an issue; perhaps we are improperly filtering for out rather than down during peering?

If I've read the second one right it looks like there's actual missing data on the down OSD, and if the OSD is down we obviously can't query it, so I think that's expected?

Actions

Copy link

Updated by Jonas Jelten over 5 years ago

In the second scenario, the cluster was completely healthy before new disks were added. My guess is that non-remapped PGs are found on that OSD, but remapped ones are not queried. The OSD is up, of course.

Actions

Copy link

Updated by Greg Farnum over 5 years ago

See also the ceph-devel mailing list thread "Degraded PG does not discover remapped data on originating OSD".

Actions

Copy link

Updated by Neha Ojha over 5 years ago

Priority changed from High to Urgent

Actions

Copy link

Updated by Jonas Jelten over 5 years ago

Easy steps to reproduce seem to be:

Have a healthy cluster
ceph osd set pause # make sure no writes mess up the test
ceph osd set nobackfill
ceph osd set norecover # make sure the error is not recovered but instead stays
ceph tell 'osd.*' injectargs '--debug_osd=20/20' # turn up logging
ceph osd out $osdid # take out a random osd
observe the state, now objects are degraded already, check pg query.
In my test, I observe that $osdid was "already probed" but it does have the data, the cluster was completely healthy before.
ceph osd down $osdid # repeer this osd
observe the state again, even more objects are degraded now, check pg query.
In my test, $osdid is now "not queried"
ceph osd in 0 # everything turns back to normal and healthy
ceph tell 'osd.*' injectargs '--debug_osd=1/5' # silence logging again

In summary: while preventing recovery, an out osd produces degraded objects. An out and repeered OSD produces even more degraded objects. Taking it in again will discover all missing object copies.

Actions

Copy link

Updated by Jonas Jelten over 5 years ago

please please let us edit issues and comments...

~~I made a mistake in the above post: please ignore the ceph osd set noup line, it doesn't matter.~~

thanks, now I can edit the posts :)

Actions

Copy link

Updated by Jonas Jelten over 5 years ago

File ceph-osd.18.log.xz ceph-osd.18.log.xz added

Tested on a 5-node cluster with 20 OSDs and 14 3-replica pools.

Here's the log file (level 20) of OSD 18, which is the new primary of PG 1.3a and 3.3.

The following output was taken after repeering OSD 0.
All OSDs are up again, OSD 0 is out (and also up).

OSD_STAT USED    AVAIL   TOTAL   HB_PEERS                                        PG_SUM PRIMARY_PG_SUM
19       1.3 GiB 8.7 GiB  10 GiB                [0,1,2,4,5,6,7,9,10,11,12,15,18]     54             12
18       1.3 GiB 8.7 GiB  10 GiB     [0,1,2,3,4,5,6,7,8,10,11,12,13,14,15,17,19]     52             21
17       1.2 GiB 8.8 GiB  10 GiB     [0,1,2,3,4,5,7,8,9,10,11,12,13,14,15,16,18]     55             19
16       1.2 GiB 8.8 GiB  10 GiB      [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17]     56             16
15       1.2 GiB 8.8 GiB  10 GiB                        [1,3,4,5,6,8,9,14,16,19]     53              8
14       1.2 GiB 8.8 GiB  10 GiB   [0,1,2,3,4,5,6,7,8,9,10,11,13,15,16,17,18,19]     53             23
13       1.3 GiB 8.7 GiB  10 GiB     [0,1,2,3,4,5,6,8,9,10,11,12,14,16,17,18,19]     59             21
12       1.3 GiB 8.7 GiB  10 GiB           [0,1,2,3,4,5,7,8,9,10,11,13,16,17,19]     58             17
11       1.3 GiB 8.7 GiB  10 GiB             [2,3,5,6,7,10,12,13,14,15,16,18,19]     51             15
10       1.3 GiB 8.7 GiB  10 GiB  [0,1,2,3,4,5,6,7,9,11,12,13,14,15,16,17,18,19]     52             20
3        1.2 GiB 8.7 GiB  10 GiB [0,2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]     59             20
2        1.3 GiB 8.7 GiB  10 GiB          [0,1,3,5,6,7,8,9,11,12,13,14,17,18,19]     57             11
1        1.3 GiB 8.7 GiB  10 GiB   [0,2,4,5,6,7,8,10,11,12,13,14,15,16,17,18,19]     54             23
0        1.3 GiB 8.7 GiB  10 GiB                          [1,2,3,4,5,6,7,8,9,19]      0              0
4        1.3 GiB 8.7 GiB  10 GiB      [0,1,2,3,5,8,9,10,11,13,14,15,16,17,18,19]     56             17
5        1.2 GiB 8.8 GiB  10 GiB [0,1,2,3,4,6,8,9,10,11,12,13,14,15,16,17,18,19]     56             21
6        1.3 GiB 8.7 GiB  10 GiB    [0,1,2,3,5,7,8,9,10,12,13,14,15,16,17,18,19]     51             18
7        1.2 GiB 8.8 GiB  10 GiB     [0,1,2,3,6,8,10,11,12,13,14,15,16,17,18,19]     49             19
8        1.2 GiB 8.7 GiB  10 GiB       [0,1,2,3,4,6,7,9,12,13,14,15,16,17,18,19]     53             22
9        1.2 GiB 8.8 GiB  10 GiB    [1,2,3,4,5,6,7,8,10,12,13,14,15,16,17,18,19]     51             21

  cluster:
    id:     562ea1a0-3f7f-42ec-876c-2fae6a90ea0e
    health: HEALTH_WARN
            pauserd,pausewr,nobackfill,norecover flag(s) set
            29/3912 objects misplaced (0.741%)
            Degraded data redundancy: 239/3912 objects degraded (6.109%), 35 pgs degraded, 3 pgs undersized

  services:
    mon: 3 daemons, quorum vl-srv1,vl-srv2,vl-srv3
    mgr: vl-srv1(active), standbys: vl-srv2, vl-srv3
    mds: lolfs-1/1/1 up  {0=vl-srv1=up:active}, 4 up:standby
    osd: 20 osds: 20 up, 19 in; 4 remapped pgs
         flags pauserd,pausewr,nobackfill,norecover
    rgw: 2 daemons active

  data:
    pools:   14 pools, 344 pgs
    objects: 1.30 k objects, 1.6 GiB
    usage:   25 GiB used, 175 GiB / 200 GiB avail
    pgs:     239/3912 objects degraded (6.109%)
             29/3912 objects misplaced (0.741%)
             307 active+clean
             28  active+recovery_wait+degraded
             4   active+recovering+degraded
             3   active+undersized+degraded+remapped+backfill_wait
             1   active+recovering
             1   active+remapped+backfill_wait

OSDMAP_FLAGS pauserd,pausewr,nobackfill,norecover flag(s) set
OBJECT_MISPLACED 29/3912 objects misplaced (0.741%)
PG_DEGRADED Degraded data redundancy: 239/3912 objects degraded (6.109%), 35 pgs degraded, 3 pgs undersized
    pg 1.2 is active+recovery_wait+degraded, acting [12,17,4]
    pg 1.11 is active+recovery_wait+degraded, acting [4,3,19]
    pg 1.17 is active+recovering+degraded, acting [13,10,19]
    pg 1.1f is active+recovery_wait+degraded, acting [19,2,10]
    pg 1.2f is active+recovery_wait+degraded, acting [14,4,10]
    pg 1.30 is active+recovery_wait+degraded, acting [12,11,3]
    pg 1.33 is active+recovery_wait+degraded, acting [10,17,15]
    pg 1.39 is active+recovery_wait+degraded, acting [5,11,12]
    pg 1.3a is active+recovery_wait+degraded, acting [18,11,13]
    pg 1.3b is active+recovery_wait+degraded, acting [3,13,18]
    pg 1.3c is active+recovery_wait+degraded, acting [19,9,4]
    pg 1.3d is active+recovery_wait+degraded, acting [2,12,7]
    pg 1.50 is active+recovery_wait+degraded, acting [8,13,16]
    pg 1.51 is active+recovery_wait+degraded, acting [4,1,13]
    pg 1.5a is active+recovery_wait+degraded, acting [12,4,19]
    pg 1.5d is active+recovery_wait+degraded, acting [12,2,10]
    pg 1.60 is active+recovery_wait+degraded, acting [16,14,10]
    pg 1.61 is active+recovery_wait+degraded, acting [17,11,7]
    pg 1.6b is active+recovery_wait+degraded, acting [6,15,17]
    pg 1.75 is active+recovery_wait+degraded, acting [10,12,4]
    pg 1.7e is active+recovery_wait+degraded, acting [14,8,17]
    pg 2.1 is active+recovery_wait+degraded, acting [14,9,2]
    pg 2.5 is active+recovery_wait+degraded, acting [8,4,16]
    pg 3.3 is active+recovering+degraded, acting [18,4,11]
    pg 3.6 is active+recovery_wait+degraded, acting [13,19,4]
    pg 5.3 is stuck undersized for 7118.187866, current state active+undersized+degraded+remapped+backfill_wait, last acting [7,17]
    pg 5.5 is stuck undersized for 7118.196629, current state active+undersized+degraded+remapped+backfill_wait, last acting [19,15]
    pg 8.1d is active+recovery_wait+degraded, acting [1,14,11]
    pg 8.2d is active+recovery_wait+degraded, acting [4,3,11]
    pg 9.1b is active+recovery_wait+degraded, acting [13,5,3]
    pg 10.4 is active+recovering+degraded, acting [8,15,18]
    pg 10.5 is active+recovery_wait+degraded, acting [13,5,9]
    pg 11.0 is active+recovery_wait+degraded, acting [1,4,8]
    pg 12.1 is stuck undersized for 7118.182018, current state active+undersized+degraded+remapped+backfill_wait, last acting [6,14]
    pg 14.6 is active+recovering+degraded, acting [12,17,3]

ceph pg 3.3 query | jq -C .recovery_state | less
  {
    "name": "Started/Primary/Active",
    "enter_time": "2018-12-13 12:09:41.053312",
    "might_have_unfound": [
      {
        "osd": "0",
        "status": "not queried" 
      },
      {
        "osd": "4",
        "status": "already probed" 
      },
      {
        "osd": "11",
        "status": "already probed" 
      }
    ],

Actions

Copy link

Updated by Jonas Jelten about 5 years ago

More findings, now on Nautilus 14.2.0:

OSD.62 once was part of pg 6.65, but content on it got remapped. A restart of OSD.62 once again results in degraded data.

OSD.38 is the primary of 6.65, below is its log (level 10) when OSD.62 comes back online:

2019-04-01 02:38:54.467 7fb19fd7e700  7 osd.38 36469 handle_fast_pg_notify pg_notify((query:36469 sent:36469 6.65s8( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36342/36343 n=198964 ec=26179/18934 lis/c 36342/35479 les/c/f 36343/35480/27050 36444/36445/36275) 8->0)=([35479,36444] intervals=([36255,36258] acting 7(3),12(1),17(6),25(4),38(0),39(5),41(7),51(10),62(8),63(9)),([36267,36273] acting 7(3),12(1),17(6),25(4),36(2),39(5),41(7),51(10),62(8),63(9)),([36278,36281] acting 7(3),12(1),17(6),25(4),36(2),38(0),41(7),51(10),62(8),63(9)),([36285,36291] acting 7(3),12(1),17(6),25(4),36(2),38(0),39(5),51(10),62(8),63(9)),([36340,36341] acting 7(3),12(1),17(6),25(4),36(2),38(0),39(5),41(7),62(8),63(9)),([36441,36443] acting 7(3),12(1),17(6),25(4),36(2),38(0),39(5),41(7),51(10))) epoch 36469) v6 from osd.62
2019-04-01 02:38:54.467 7fb184acb700 10 osd.38 pg_epoch: 36469 pg[6.65s0( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36445/36446 n=198964 ec=26179/18934 lis/c 36445/35479 les/c/f 36446/35480/27050 36444/36445/36275) [38,12,36,7,25,39,17,41,21,63,51]/[38,12,36,7,25,39,17,41,2147483647,63,51]p38(0) backfill=[21(8)] r=0 lpr=36445 pi=[35479,36445)/6 rops=1 crt=36208'736652 lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped+backfilling mbc={0={},1={},2={},3={},4={},5={},6={},7={},8={},9={},10={}}] do_peering_event: epoch_sent: 36469 epoch_requested: 36469 MNotifyRec 6.65s0 from 62(8) notify: (query:36469 sent:36469 6.65s8( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36342/36343 n=198964 ec=26179/18934 lis/c 36342/35479 les/c/f 36343/35480/27050 36444/36445/36275) 8->0) features: 0x3ffddff8ffacffff ([35479,36444] intervals=([36255,36258] acting 7(3),12(1),17(6),25(4),38(0),39(5),41(7),51(10),62(8),63(9)),([36267,36273] acting 7(3),12(1),17(6),25(4),36(2),39(5),41(7),51(10),62(8),63(9)),([36278,36281] acting 7(3),12(1),17(6),25(4),36(2),38(0),41(7),51(10),62(8),63(9)),([36285,36291] acting 7(3),12(1),17(6),25(4),36(2),38(0),39(5),51(10),62(8),63(9)),([36340,36341] acting 7(3),12(1),17(6),25(4),36(2),38(0),39(5),41(7),62(8),63(9)),([36441,36443] acting 7(3),12(1),17(6),25(4),36(2),38(0),39(5),41(7),51(10))) +create_info
2019-04-01 02:38:54.467 7fb184acb700 10 osd.38 pg_epoch: 36469 pg[6.65s0( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36445/36446 n=198964 ec=26179/18934 lis/c 36445/35479 les/c/f 36446/35480/27050 36444/36445/36275) [38,12,36,7,25,39,17,41,21,63,51]/[38,12,36,7,25,39,17,41,2147483647,63,51]p38(0) backfill=[21(8)] r=0 lpr=36445 pi=[35479,36445)/6 rops=1 crt=36208'736652 lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped+backfilling mbc={0={},1={},2={},3={},4={},5={},6={},7={},8={},9={},10={}}] state<Started/Primary/Active>: Active: got notify from 62(8), calling proc_replica_info and discover_all_missing
2019-04-01 02:38:54.467 7fb184acb700 10 osd.38 pg_epoch: 36469 pg[6.65s0( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36445/36446 n=198964 ec=26179/18934 lis/c 36445/35479 les/c/f 36446/35480/27050 36444/36445/36275) [38,12,36,7,25,39,17,41,21,63,51]/[38,12,36,7,25,39,17,41,2147483647,63,51]p38(0) backfill=[21(8)] r=0 lpr=36445 pi=[35479,36445)/6 rops=1 crt=36208'736652 lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped+backfilling mbc={0={},1={},2={},3={},4={},5={},6={},7={},8={},9={},10={}}]  got osd.62(8) 6.65s8( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36342/36343 n=198964 ec=26179/18934 lis/c 36342/35479 les/c/f 36343/35480/27050 36444/36445/36275)
2019-04-01 02:38:54.467 7fb184acb700 10 osd.38 pg_epoch: 36469 pg[6.65s0( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36445/36446 n=198964 ec=26179/18934 lis/c 36445/35479 les/c/f 36446/35480/27050 36444/36445/36275) [38,12,36,7,25,39,17,41,21,63,51]/[38,12,36,7,25,39,17,41,2147483647,63,51]p38(0) backfill=[21(8)] r=0 lpr=36445 pi=[35479,36445)/6 rops=1 crt=36208'736652 lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped+backfilling mbc={0={},1={},2={},3={},4={},5={},6={},7={},8={},9={},10={}}] reg_next_scrub pg 6.65s0 register next scrub, scrub time 2019-04-04 02:38:13.629967, must = 0
2019-04-01 02:38:54.467 7fb184acb700 10 osd.38 pg_epoch: 36469 pg[6.65s0( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36445/36446 n=198964 ec=26179/18934 lis/c 36445/35479 les/c/f 36446/35480/27050 36444/36445/36275) [38,12,36,7,25,39,17,41,21,63,51]/[38,12,36,7,25,39,17,41,2147483647,63,51]p38(0) backfill=[21(8)] r=0 lpr=36445 pi=[35479,36445)/6 rops=1 crt=36208'736652 lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped+backfilling mbc={0={},1={},2={},3={},4={},5={},6={},7={},8={},9={},10={}}]  osd.62(8) has stray content: 6.65s8( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36342/36343 n=198964 ec=26179/18934 lis/c 36342/35479 les/c/f 36343/35480/27050 36444/36445/36275)
2019-04-01 02:38:54.467 7fb184acb700 10 osd.38 pg_epoch: 36469 pg[6.65s0( v 36208'736652 (34791'733595,36208'736652] local-lis/les=36445/36446 n=198964 ec=26179/18934 lis/c 36445/35479 les/c/f 36446/35480/27050 36444/36445/36275) [38,12,36,7,25,39,17,41,21,63,51]/[38,12,36,7,25,39,17,41,2147483647,63,51]p38(0) backfill=[21(8)] r=0 lpr=36445 pi=[35479,36445)/6 rops=1 crt=36208'736652 lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped+backfilling mbc={0={},1={},2={},3={},4={},5={},6={},7={},8={},9={},10={}}] update_heartbeat_peers 7,12,17,21,25,36,38,39,41,51,63 -> 7,12,17,21,25,36,38,39,41,51,62,63
2019-04-01 02:38:54.467 7fb184acb700 10 log is not dirty
2019-04-01 02:38:54.495 7fb19fd7e700  7 osd.38 36469 handle_fast_pg_notify pg_notify((query:36469 sent:36469 6.2cs5( v 36208'738637 (34791'735590,36208'738637] local-lis/les=36462/36463 n=199469 ec=18977/18934 lis/c 36466/36462 les/c/f 36467/36463/27050 36469/36469/36274) 5->0)=([36462,36468] intervals=([36466,36468] acting 15(2),19(7),38(0),39(1),41(10),44(9),46(8),54(3),55(6),69(4))) epoch 36469) v6 from osd.62
2019-04-01 02:38:54.495 7fb1832c8700 10 osd.38 pg_epoch: 36469 pg[6.2cs0( v 36208'738637 (34791'735590,36208'738637] local-lis/les=36466/36467 n=199469 ec=18977/18934 lis/c 36466/36462 les/c/f 36467/36463/27050 36469/36469/36274) [38,39,15,54,69,62,55,19,46,44,41]p38(0) r=0 lpr=36469 pi=[36462,36469)/1 crt=36208'738637 lcod 0'0 mlcod 0'0 peering mbc={}] do_peering_event: epoch_sent: 36469 epoch_requested: 36469 MNotifyRec 6.2cs0 from 62(5) notify: (query:36469 sent:36469 6.2cs5( v 36208'738637 (34791'735590,36208'738637] local-lis/les=36462/36463 n=199469 ec=18977/18934 lis/c 36466/36462 les/c/f 36467/36463/27050 36469/36469/36274) 5->0) features: 0x3ffddff8ffacffff ([36462,36468] intervals=([36466,36468] acting 15(2),19(7),38(0),39(1),41(10),44(9),46(8),54(3),55(6),69(4))) +create_info
2019-04-01 02:38:54.495 7fb1832c8700  7 osd.38 pg_epoch: 36469 pg[6.2cs0( v 36208'738637 (34791'735590,36208'738637] local-lis/les=36466/36467 n=199469 ec=18977/18934 lis/c 36466/36462 les/c/f 36467/36463/27050 36469/36469/36274) [38,39,15,54,69,62,55,19,46,44,41]p38(0) r=0 lpr=36469 pi=[36462,36469)/1 crt=36208'738637 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary>: handle_pg_notify from osd.62(5)
2019-04-01 02:38:54.495 7fb1832c8700 10 osd.38 pg_epoch: 36469 pg[6.2cs0( v 36208'738637 (34791'735590,36208'738637] local-lis/les=36466/36467 n=199469 ec=18977/18934 lis/c 36466/36462 les/c/f 36467/36463/27050 36469/36469/36274) [38,39,15,54,69,62,55,19,46,44,41]p38(0) r=0 lpr=36469 pi=[36462,36469)/1 crt=36208'738637 lcod 0'0 mlcod 0'0 peering mbc={}]  got dup osd.62(5) info 6.2cs5( v 36208'738637 (34791'735590,36208'738637] local-lis/les=36462/36463 n=199469 ec=18977/18934 lis/c 36466/36462 les/c/f 36467/36463/27050 36469/36469/36274), identical to ours
2019-04-01 02:38:54.495 7fb1832c8700 10 log is not dirty

{
  "state": "active+undersized+degraded+remapped+backfilling",
  "snap_trimq": "[]",
  "snap_trimq_len": 0,
  "epoch": 36476,
  "up": [
    38, 12, 36, 7, 25, 39, 17, 41, 21, 63, 51
  ],
  "acting": [
    38, 12, 36, 7, 25, 39, 17, 41, 2147483647, 63, 51
  ],
  "backfill_targets": [
    "21(8)" 
  ],
  "acting_recovery_backfill": [
    "7(3)",
    "12(1)",
    "17(6)",
    "21(8)",
    "25(4)",
    "36(2)",
    "38(0)",
    "39(5)",
    "41(7)",
    "51(10)",
    "63(9)" 
  ],
  "info": {
...
  }
  "recovery_state": [
    {
      "name": "Started/Primary/Active",
      "enter_time": "2019-04-01 01:47:03.518615",
      "might_have_unfound": [
        {
          "osd": "62(8)",
          "status": "not queried" 
        }
      ],
      "recovery_progress": {
        "backfill_targets": [
          "21(8)" 
        ],
        "waiting_on_backfill": [],
...

We see there that 62 announces itself to the primary (38) and 38 detects 62 got stray data.
The log says calling proc_replica_info and discover_all_missing, meaning we are at PG::RecoveryState::Active::react.
The next log line hints that pg->proc_replica_info is called ( got osd.62(8) 6.65s8... ).
But then reg_next_scrub pg 6.65s0 register next scrub follows, which means pg->discover_all_missing was not called as it would have printed discover_all_missing ... missing ... unfound.

This means that in this function pg->have_unfound() did not return true, even though the PG needs recovery (and 62 has the missing data!). discover_all_missing is somehow not called and thus 62 is not queried for its data, leading to this bug.

Actions

Copy link

#10