Project

General

Profile

Actions

Bug #59670

closed

Bug #63334: Recovery starts while norecover flag is set when PG splitting occurs

Ceph status shows PG recovering when norecover flag is set

Added by Aishwarya Mathuria about 1 year ago. Updated 5 days ago.

Status:
Duplicate
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On the Gibba cluster, we observed that ceph -s was showing one PG in recovering state after norecovery flag was set

[root@gibba001 ~]# ceph -s
  cluster:
    id:     7e775b16-ea73-11ed-ac35-3cecef3d8fb8
    health: HEALTH_WARN
            nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
            Degraded data redundancy: 2/27732183 objects degraded (0.000%), 1 pg degraded, 1 pg undersized

  services:
    mon: 5 daemons, quorum gibba001,gibba002,gibba003,gibba006,gibba005 (age 78m)
    mgr: gibba006.oxzbun(active, since 71m), standbys: gibba008.fhfdkj
    osd: 62 osds: 62 up (since 74m), 62 in (since 74m); 1 remapped pgs
         flags nobackfill,norecover,noscrub,nodeep-scrub
    rgw: 6 daemons active (6 hosts, 1 zones)

  data:
    pools:   7 pools, 1217 pgs
    objects: 4.62M objects, 203 GiB
    usage:   446 GiB used, 10 TiB / 11 TiB avail
    pgs:     2/27732183 objects degraded (0.000%)
             1216 active+clean
             1    active+recovering+undersized+degraded+remapped

PG dump:


1.0            2                   0         2          0        0  1114656            0           0  1334      1334  active+recovering+undersized+degraded+remapped  2023-05-04T08:34:56.217391+0000  109'1334  123:1683            [30,15,0]          30               [15,0]              15         0'0  2023-05-04T08:34:39.648644+0000              0'0  2023-05-04T08:34:39.648644+0000              0                    0  periodic scrub scheduled @ 2023-05-05T16:09:53.741099+0000                 0                0
dumped all

From the cluster logs we can see the norecover flag being set and when the OSDs come up we observed the following logs:

2023-05-04T12:16:48.219+0000 7f078e80e700  1 osd.29 82 state: booting -> active
2023-05-04T12:16:48.219+0000 7f078e80e700  1 osd.29 82 pausing recovery (NORECOVER flag set)

And after sometime we can see the state of PG 1.0 in the logs:

2023-05-04T13:24:21.971+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock
2023-05-04T13:25:31.984+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock

However, ceph status and ceph pg dump still show that PG 1.0 is recovering.


Related issues 1 (1 open0 closed)

Copied to RADOS - Backport #66000: quincy: Ceph status shows PG recovering when norecover flag is setNewAishwarya MathuriaActions
Actions

Also available in: Atom PDF