Bug #59670
openCeph status shows PG recovering when norecover flag is set
0%
Description
On the Gibba cluster, we observed that ceph -s was showing one PG in recovering state after norecovery flag was set
[root@gibba001 ~]# ceph -s
cluster:
id: 7e775b16-ea73-11ed-ac35-3cecef3d8fb8
health: HEALTH_WARN
nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
Degraded data redundancy: 2/27732183 objects degraded (0.000%), 1 pg degraded, 1 pg undersized
services:
mon: 5 daemons, quorum gibba001,gibba002,gibba003,gibba006,gibba005 (age 78m)
mgr: gibba006.oxzbun(active, since 71m), standbys: gibba008.fhfdkj
osd: 62 osds: 62 up (since 74m), 62 in (since 74m); 1 remapped pgs
flags nobackfill,norecover,noscrub,nodeep-scrub
rgw: 6 daemons active (6 hosts, 1 zones)
data:
pools: 7 pools, 1217 pgs
objects: 4.62M objects, 203 GiB
usage: 446 GiB used, 10 TiB / 11 TiB avail
pgs: 2/27732183 objects degraded (0.000%)
1216 active+clean
1 active+recovering+undersized+degraded+remapped
PG dump:
1.0 2 0 2 0 0 1114656 0 0 1334 1334 active+recovering+undersized+degraded+remapped 2023-05-04T08:34:56.217391+0000 109'1334 123:1683 [30,15,0] 30 [15,0] 15 0'0 2023-05-04T08:34:39.648644+0000 0'0 2023-05-04T08:34:39.648644+0000 0 0 periodic scrub scheduled @ 2023-05-05T16:09:53.741099+0000 0 0
dumped all
From the cluster logs we can see the norecover flag being set and when the OSDs come up we observed the following logs:
2023-05-04T12:16:48.219+0000 7f078e80e700 1 osd.29 82 state: booting -> active
2023-05-04T12:16:48.219+0000 7f078e80e700 1 osd.29 82 pausing recovery (NORECOVER flag set)
And after sometime we can see the state of PG 1.0 in the logs:
2023-05-04T13:24:21.971+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock
2023-05-04T13:25:31.984+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock
However, ceph status and ceph pg dump still show that PG 1.0 is recovering.
Updated by Radoslaw Zarzynski 12 months ago
Has the PG ultimately went into the proper state? Asking to exclude a race-condition on just reporting via ceph-mgr.
Updated by Wes Dillingham about 1 month ago
I think its more than just a cosmetic issue of the PG showing recovering as its state. It does in fact "recover" objects when the "norecover" flag is set. As a ceph operator I would expect "norecover" to prevent PGs from entering the "recovery" state but perhaps not to prevent "backfill". If this isnt a bug its a confusing usage of the term "norecover" IMO.
Updated by Radoslaw Zarzynski 26 days ago
Bump up. IIRC there was a very similar ticket Aishwarya has poked with.
Updated by Aishwarya Mathuria 22 days ago
We saw this issue again in another setup and it has been fixed here: https://github.com/ceph/ceph/pull/54708.
The problem was that the autoscaler was enabled while the norecover flag was set and client I/O was going on in the cluster.
When there is a read/write to a missing/degraded object recovery starts for that object even if the norecover flag is set, it was decided that this workflow makes sense as stopping recovery in such cases would cause client I/O to hang indefinitely.
The fix made in the PR stops the autoscaler from starting if the user has set the norecover flag.
From my memory, in Gibba cluster we had some read/write workloads going on and the noautoscale flag was not set so it is probably the same issue. I'll try to see if I can confirm that but it was a while back.
Updated by Radoslaw Zarzynski 19 days ago
- Status changed from New to Need More Info
The fix has been merged on 5 Jan 2024, so this could fit. It has been bacported only to Reef.
Wes Dillingham, do you see it on your cluster? If so, what's the version?