Bug #59513: Scrubbing PGs from device_health_metrics takes suspiciously long - Ceph - Ceph

Actions

Copy link

Bug #59513

open

Scrubbing PGs from device_health_metrics takes suspiciously long

Added by Niklas Hambuechen about 1 year ago. Updated about 1 year ago.

Status:

New

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v16.2.7

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I have in "ceph df":

POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics   1  512  257 MiB       36  772 MiB      0     17 TiB

This pool stores extremely little data. I'd expect that scrubbing it is instantaneous.

Yet, in "ceph pg ls" I can see that:

PG     OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES         OMAP_BYTES*  OMAP_KEYS*  LOG    STATE                        SINCE  VERSION         REPORTED        UP             ACTING         SCRUB_STAMP                      DEEP_SCRUB_STAMP               
...
1.18         0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:409663   [28,16,6]p28   [28,16,6]p28  2023-04-14T21:38:21.593846+0000  2023-04-14T21:38:21.593846+0000
1.19         0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:483155   [29,9,16]p29   [29,9,16]p29  2023-04-14T21:42:16.403577+0000  2023-04-14T21:42:16.403577+0000
1.1a         0         0          0        0             0            0           0      0  active+clean+scrubbing+deep    78m             0'0    16802:394661  [26,10,16]p26  [26,10,16]p26  2023-04-07T20:11:51.880176+0000  2023-04-07T20:11:51.880176+0000
1.1b         0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:645676   [16,5,26]p16   [16,5,26]p16  2023-04-15T04:01:17.971270+0000  2023-04-15T04:01:17.971270+0000
1.1c         0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:376623  [18,31,10]p18  [18,31,10]p18  2023-04-20T18:47:32.289124+0000  2023-04-20T18:47:32.289124+0000
...
1.1b9        0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:462744   [18,5,33]p18   [18,5,33]p18  2023-04-12T13:31:38.817328+0000  2023-04-12T13:31:38.817328+0000
1.1ba        1         0          0        0             0      8067648         180    539                 active+clean     2d       16770'539    16802:378419   [14,6,30]p14   [14,6,30]p14  2023-04-19T07:27:41.530105+0000  2023-04-19T07:27:41.530105+0000
1.1bb        0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:351857   [32,8,17]p32   [32,8,17]p32  2023-04-13T21:08:25.449731+0000  2023-04-13T21:08:25.449731+0000
1.1bc        0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:468128   [31,3,23]p31   [31,3,23]p31  2023-04-15T03:55:38.432160+0000  2023-04-15T03:55:38.432160+0000
1.1bd        0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:254273  [23,31,10]p23  [23,31,10]p23  2023-04-14T15:04:21.328568+0000  2023-04-14T15:04:21.328568+0000
1.1be        0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:442864   [26,16,5]p26   [26,16,5]p26  2023-04-11T05:57:27.832297+0000  2023-04-11T05:57:27.832297+0000
1.1bf        0         0          0        0             0            0           0      0  active+clean+scrubbing+deep    78m             0'0      16802:5378    [3,17,27]p3    [3,17,27]p3  2023-04-11T00:22:41.764823+0000  2023-04-11T00:22:41.764823+0000
1.1c0        0         0          0        0             0            0           0      0                 active+clean     2d             0'0    16802:424055   [28,23,3]p28   [28,23,3]p28  2023-04-19T01:42:34.012205+0000  2023-04-19T01:42:34.012205+0000

It seems suspicious to me that PG 1.1bf has been scrubbing for "78m" even though it contains "0 BYTES".

Actions

Copy link

Updated by Niklas Hambuechen about 1 year ago

I am suspecting this might be https://tracker.ceph.com/issues/54172#note-14:

In short, this happens whenever a deep scrub is started while the noscrub flag is set. The stuck scrub can be cleared by restarting the primary OSD associated with the PG

I had set `noscrub`, then restarted all OSDs, and then removed `noscrub`. That brought me into the situatio of the issue description.

Doing another full restart of all OSDs seems to have fixed the issue, so I think the chances are quite high that that provided workaround worked.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #59513

Scrubbing PGs from device_health_metrics takes suspiciously long

Updated by Niklas Hambuechen about 1 year ago