Project

General

Profile

Actions

Bug #59670

open

Ceph status shows PG recovering when norecover flag is set

Added by Aishwarya Mathuria 12 months ago. Updated 15 days ago.

Status:
Need More Info
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On the Gibba cluster, we observed that ceph -s was showing one PG in recovering state after norecovery flag was set

[root@gibba001 ~]# ceph -s
  cluster:
    id:     7e775b16-ea73-11ed-ac35-3cecef3d8fb8
    health: HEALTH_WARN
            nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
            Degraded data redundancy: 2/27732183 objects degraded (0.000%), 1 pg degraded, 1 pg undersized

  services:
    mon: 5 daemons, quorum gibba001,gibba002,gibba003,gibba006,gibba005 (age 78m)
    mgr: gibba006.oxzbun(active, since 71m), standbys: gibba008.fhfdkj
    osd: 62 osds: 62 up (since 74m), 62 in (since 74m); 1 remapped pgs
         flags nobackfill,norecover,noscrub,nodeep-scrub
    rgw: 6 daemons active (6 hosts, 1 zones)

  data:
    pools:   7 pools, 1217 pgs
    objects: 4.62M objects, 203 GiB
    usage:   446 GiB used, 10 TiB / 11 TiB avail
    pgs:     2/27732183 objects degraded (0.000%)
             1216 active+clean
             1    active+recovering+undersized+degraded+remapped

PG dump:


1.0            2                   0         2          0        0  1114656            0           0  1334      1334  active+recovering+undersized+degraded+remapped  2023-05-04T08:34:56.217391+0000  109'1334  123:1683            [30,15,0]          30               [15,0]              15         0'0  2023-05-04T08:34:39.648644+0000              0'0  2023-05-04T08:34:39.648644+0000              0                    0  periodic scrub scheduled @ 2023-05-05T16:09:53.741099+0000                 0                0
dumped all

From the cluster logs we can see the norecover flag being set and when the OSDs come up we observed the following logs:

2023-05-04T12:16:48.219+0000 7f078e80e700  1 osd.29 82 state: booting -> active
2023-05-04T12:16:48.219+0000 7f078e80e700  1 osd.29 82 pausing recovery (NORECOVER flag set)

And after sometime we can see the state of PG 1.0 in the logs:

2023-05-04T13:24:21.971+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock
2023-05-04T13:25:31.984+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock

However, ceph status and ceph pg dump still show that PG 1.0 is recovering.

Actions #1

Updated by Radoslaw Zarzynski 12 months ago

Has the PG ultimately went into the proper state? Asking to exclude a race-condition on just reporting via ceph-mgr.

Actions #2

Updated by Wes Dillingham about 1 month ago

I think its more than just a cosmetic issue of the PG showing recovering as its state. It does in fact "recover" objects when the "norecover" flag is set. As a ceph operator I would expect "norecover" to prevent PGs from entering the "recovery" state but perhaps not to prevent "backfill". If this isnt a bug its a confusing usage of the term "norecover" IMO.

Actions #3

Updated by Radoslaw Zarzynski 22 days ago

Bump up. IIRC there was a very similar ticket Aishwarya has poked with.

Actions #4

Updated by Aishwarya Mathuria 19 days ago

We saw this issue again in another setup and it has been fixed here: https://github.com/ceph/ceph/pull/54708.
The problem was that the autoscaler was enabled while the norecover flag was set and client I/O was going on in the cluster.
When there is a read/write to a missing/degraded object recovery starts for that object even if the norecover flag is set, it was decided that this workflow makes sense as stopping recovery in such cases would cause client I/O to hang indefinitely.
The fix made in the PR stops the autoscaler from starting if the user has set the norecover flag.

From my memory, in Gibba cluster we had some read/write workloads going on and the noautoscale flag was not set so it is probably the same issue. I'll try to see if I can confirm that but it was a while back.

Actions #5

Updated by Radoslaw Zarzynski 15 days ago

  • Status changed from New to Need More Info

The fix has been merged on 5 Jan 2024, so this could fit. It has been bacported only to Reef.

Wes Dillingham, do you see it on your cluster? If so, what's the version?

Actions

Also available in: Atom PDF