Actions
Bug #59670
closedBug #63334: Recovery starts while norecover flag is set when PG splitting occurs
Ceph status shows PG recovering when norecover flag is set
% Done:
0%
Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
On the Gibba cluster, we observed that ceph -s was showing one PG in recovering state after norecovery flag was set
[root@gibba001 ~]# ceph -s
cluster:
id: 7e775b16-ea73-11ed-ac35-3cecef3d8fb8
health: HEALTH_WARN
nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
Degraded data redundancy: 2/27732183 objects degraded (0.000%), 1 pg degraded, 1 pg undersized
services:
mon: 5 daemons, quorum gibba001,gibba002,gibba003,gibba006,gibba005 (age 78m)
mgr: gibba006.oxzbun(active, since 71m), standbys: gibba008.fhfdkj
osd: 62 osds: 62 up (since 74m), 62 in (since 74m); 1 remapped pgs
flags nobackfill,norecover,noscrub,nodeep-scrub
rgw: 6 daemons active (6 hosts, 1 zones)
data:
pools: 7 pools, 1217 pgs
objects: 4.62M objects, 203 GiB
usage: 446 GiB used, 10 TiB / 11 TiB avail
pgs: 2/27732183 objects degraded (0.000%)
1216 active+clean
1 active+recovering+undersized+degraded+remapped
PG dump:
1.0 2 0 2 0 0 1114656 0 0 1334 1334 active+recovering+undersized+degraded+remapped 2023-05-04T08:34:56.217391+0000 109'1334 123:1683 [30,15,0] 30 [15,0] 15 0'0 2023-05-04T08:34:39.648644+0000 0'0 2023-05-04T08:34:39.648644+0000 0 0 periodic scrub scheduled @ 2023-05-05T16:09:53.741099+0000 0 0
dumped all
From the cluster logs we can see the norecover flag being set and when the OSDs come up we observed the following logs:
2023-05-04T12:16:48.219+0000 7f078e80e700 1 osd.29 82 state: booting -> active
2023-05-04T12:16:48.219+0000 7f078e80e700 1 osd.29 82 pausing recovery (NORECOVER flag set)
And after sometime we can see the state of PG 1.0 in the logs:
2023-05-04T13:24:21.971+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock
2023-05-04T13:25:31.984+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock
However, ceph status and ceph pg dump still show that PG 1.0 is recovering.
Actions