Bug #59670: Ceph status shows PG recovering when norecover flag is set - RADOS - Ceph

Actions

Copy link

Bug #59670

closed

Bug #63334: Recovery starts while norecover flag is set when PG splitting occurs

Ceph status shows PG recovering when norecover flag is set

Added by Aishwarya Mathuria about 1 year ago. Updated 5 days ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Aishwarya Mathuria

Category:

Target version:

% Done:

Source:

Tags:

backport_processed

Backport:

quincy

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

On the Gibba cluster, we observed that ceph -s was showing one PG in recovering state after norecovery flag was set

[root@gibba001 ~]# ceph -s
  cluster:
    id:     7e775b16-ea73-11ed-ac35-3cecef3d8fb8
    health: HEALTH_WARN
            nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
            Degraded data redundancy: 2/27732183 objects degraded (0.000%), 1 pg degraded, 1 pg undersized

  services:
    mon: 5 daemons, quorum gibba001,gibba002,gibba003,gibba006,gibba005 (age 78m)
    mgr: gibba006.oxzbun(active, since 71m), standbys: gibba008.fhfdkj
    osd: 62 osds: 62 up (since 74m), 62 in (since 74m); 1 remapped pgs
         flags nobackfill,norecover,noscrub,nodeep-scrub
    rgw: 6 daemons active (6 hosts, 1 zones)

  data:
    pools:   7 pools, 1217 pgs
    objects: 4.62M objects, 203 GiB
    usage:   446 GiB used, 10 TiB / 11 TiB avail
    pgs:     2/27732183 objects degraded (0.000%)
             1216 active+clean
             1    active+recovering+undersized+degraded+remapped

PG dump:


1.0            2                   0         2          0        0  1114656            0           0  1334      1334  active+recovering+undersized+degraded+remapped  2023-05-04T08:34:56.217391+0000  109'1334  123:1683            [30,15,0]          30               [15,0]              15         0'0  2023-05-04T08:34:39.648644+0000              0'0  2023-05-04T08:34:39.648644+0000              0                    0  periodic scrub scheduled @ 2023-05-05T16:09:53.741099+0000                 0                0
dumped all

From the cluster logs we can see the norecover flag being set and when the OSDs come up we observed the following logs:

2023-05-04T12:16:48.219+0000 7f078e80e700  1 osd.29 82 state: booting -> active
2023-05-04T12:16:48.219+0000 7f078e80e700  1 osd.29 82 pausing recovery (NORECOVER flag set)

And after sometime we can see the state of PG 1.0 in the logs:

2023-05-04T13:24:21.971+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock
2023-05-04T13:25:31.984+0000 7f07909cb700 30 osd.29 pg_epoch: 121 pg[1.0( v 106'1334 lc 80'163 (0'0,106'1334] local-lis/les=0/0 n=2 ec=74/74 lis/c=0/78 les/c/f=0/79/0 sis=84) [29,15,32]/[15,32] r=-1 lpr=84 pi=[78,84)/1 luod=0'0 lua=106'1324 crt=106'1334 mlcod 80'163 *active+remapped* m=2 mbc={}] lock

However, ceph status and ceph pg dump still show that PG 1.0 is recovering.

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Radoslaw Zarzynski about 1 year ago

Has the PG ultimately went into the proper state? Asking to exclude a race-condition on just reporting via ceph-mgr.

Actions

Copy link

Updated by Wes Dillingham about 2 months ago

I think its more than just a cosmetic issue of the PG showing recovering as its state. It does in fact "recover" objects when the "norecover" flag is set. As a ceph operator I would expect "norecover" to prevent PGs from entering the "recovery" state but perhaps not to prevent "backfill". If this isnt a bug its a confusing usage of the term "norecover" IMO.

Actions

Copy link

Updated by Radoslaw Zarzynski about 1 month ago

Bump up. IIRC there was a very similar ticket Aishwarya has poked with.

Actions

Copy link

Updated by Aishwarya Mathuria about 1 month ago

We saw this issue again in another setup and it has been fixed here: https://github.com/ceph/ceph/pull/54708.
The problem was that the autoscaler was enabled while the norecover flag was set and client I/O was going on in the cluster.
When there is a read/write to a missing/degraded object recovery starts for that object even if the norecover flag is set, it was decided that this workflow makes sense as stopping recovery in such cases would cause client I/O to hang indefinitely.
The fix made in the PR stops the autoscaler from starting if the user has set the norecover flag.

From my memory, in Gibba cluster we had some read/write workloads going on and the noautoscale flag was not set so it is probably the same issue. I'll try to see if I can confirm that but it was a while back.

Actions

Copy link

Updated by Radoslaw Zarzynski about 1 month ago

Status changed from New to Need More Info

The fix has been merged on 5 Jan 2024, so this could fit. It has been bacported only to Reef.

Wes Dillingham, do you see it on your cluster? If so, what's the version?

Actions

Copy link

Updated by Wes Dillingham 11 days ago

Radoslaw Zarzynski wrote in #note-5:

The fix has been merged on 5 Jan 2024, so this could fit. It has been bacported only to Reef.

Wes Dillingham, do you see it on your cluster? If so, what's the version?

17.2.7

Actions

Copy link