Bug #62811: PGs stuck in backfilling state after their primary OSD is removed by setting its crush weight to 0. - RADOS - Ceph

Actions

Copy link

Bug #62811

open

PGs stuck in backfilling state after their primary OSD is removed by setting its crush weight to 0.

Added by Sridhar Seshasayee 8 months ago. Updated 8 months ago.

Status:

New

Priority:

High

Assignee:

Sridhar Seshasayee

Category:

Backfill/Recovery

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I am pasting the problem description from the original BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2233777

Description of problem:

We are observing that the OSDs get stuck in draining process, and all the PGs are not removed the the OSD. it looks like the drain is stuck on PGs where the OSD being removed, is the Primary OSD in that PG.

We had these configs set on the cluster during testing.

mon                                            advanced  mon_osd_down_out_subtree_limit         host
osd                                            advanced  osd_async_recovery_min_cost            1099511627776

Procedure:
1. Deploy a 4 node RHCS cluster, with 2+2 EC pool and fill data.
# ceph -s
  cluster:
    id:     66070a80-2f84-11ee-bc2c-0cc47af3ea56
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum argo012,argo013,argo014 (age 8d)
    mgr: argo013.akdhka(active, since 12d), standbys: argo012.odttqx, argo014.xfhnzv
    osd: 35 osds: 35 up (since 67m), 35 in (since 30h)
    rgw: 4 daemons active (4 hosts, 1 zones)

  data:
    pools:   7 pools, 673 pgs
    objects: 4.28M objects, 3.0 TiB
    usage:   6.6 TiB used, 7.6 TiB / 14 TiB avail
    pgs:     665 active+clean
             7   active+clean+scrubbing+deep
             1   active+clean+scrubbing

  io:
    client:   7.2 KiB/s rd, 0 B/s wr, 7 op/s rd, 4 op/s wr

[ceph: root@argo012 /]# ceph df
--- RAW STORAGE ---
CLASS    SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    14 TiB  7.6 TiB  6.6 TiB   6.6 TiB      46.42
TOTAL  14 TiB  7.6 TiB  6.6 TiB   6.6 TiB      46.42

--- POOLS ---
POOL                       ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                        1    1   23 MiB        5   68 MiB      0    1.8 TiB
.rgw.root                   2   32  1.3 KiB        4   48 KiB      0    1.8 TiB
default.rgw.log             3   32  3.6 KiB      209  408 KiB      0    1.8 TiB
default.rgw.control         4   32      0 B        8      0 B      0    1.8 TiB
default.rgw.meta            5   32  8.9 KiB       18  207 KiB      0    1.8 TiB
ec22-pool                   6  512  3.0 TiB    4.28M  6.1 TiB  52.35    2.8 TiB
default.rgw.buckets.index   7   32  1.1 GiB       90  3.2 GiB   0.06    1.8 TiB

# ceph osd df tree
ID  CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
-1         14.21661         -   14 TiB  6.6 TiB  6.1 TiB  3.2 GiB   61 GiB  7.6 TiB  46.42  1.00    -          root default
-7          3.65570         -  3.7 TiB  1.7 TiB  1.5 TiB  642 MiB   15 GiB  2.0 TiB  45.23  0.97    -              host argo012
 3    hdd   0.40619   1.00000  416 GiB  183 GiB  167 GiB  111 MiB  1.6 GiB  233 GiB  43.89  0.95   74      up          osd.3
 6    hdd   0.40619   1.00000  416 GiB  194 GiB  178 GiB   57 MiB  1.4 GiB  222 GiB  46.68  1.01   64      up          osd.6
10    hdd   0.40619   1.00000  416 GiB  176 GiB  160 GiB   23 MiB  1.4 GiB  240 GiB  42.32  0.91   61      up          osd.10
15    hdd   0.40619   1.00000  416 GiB  191 GiB  175 GiB   26 MiB  2.0 GiB  225 GiB  45.95  0.99   71      up          osd.15
19    hdd   0.40619   1.00000  416 GiB  189 GiB  173 GiB  180 MiB  1.5 GiB  227 GiB  45.32  0.98   69      up          osd.19
23    hdd   0.40619   1.00000  416 GiB  177 GiB  161 GiB   33 KiB  1.5 GiB  239 GiB  42.43  0.91   67      up          osd.23
27    hdd   0.40619   1.00000  416 GiB  185 GiB  169 GiB  2.9 MiB  1.4 GiB  231 GiB  44.58  0.96   69      up          osd.27
31    hdd   0.40619   1.00000  416 GiB  204 GiB  188 GiB  233 MiB  1.6 GiB  212 GiB  49.06  1.06   81      up          osd.31
35    hdd   0.40619   1.00000  416 GiB  195 GiB  179 GiB  9.8 MiB  2.2 GiB  221 GiB  46.87  1.01   75      up          osd.35
-3          3.65570         -  3.7 TiB  1.7 TiB  1.5 TiB  979 MiB   15 GiB  2.0 TiB  45.23  0.97    -              host argo013
 0    hdd   0.40619   1.00000  416 GiB  179 GiB  163 GiB  312 MiB  1.8 GiB  237 GiB  43.05  0.93   67      up          osd.0
 4    hdd   0.40619   1.00000  416 GiB  191 GiB  175 GiB  149 MiB  1.8 GiB  225 GiB  46.01  0.99   77      up          osd.4
 8    hdd   0.40619   1.00000  416 GiB  186 GiB  170 GiB  134 MiB  1.3 GiB  230 GiB  44.63  0.96   72      up          osd.8
12    hdd   0.40619   1.00000  416 GiB  192 GiB  176 GiB    1 KiB  1.5 GiB  224 GiB  46.04  0.99   67      up          osd.12
16    hdd   0.40619   1.00000  416 GiB  188 GiB  172 GiB   10 MiB  1.9 GiB  228 GiB  45.27  0.98   71      up          osd.16
20    hdd   0.40619   1.00000  416 GiB  186 GiB  170 GiB   46 MiB  1.7 GiB  230 GiB  44.70  0.96   71      up          osd.20
24    hdd   0.40619   1.00000  416 GiB  195 GiB  179 GiB   77 MiB  1.5 GiB  221 GiB  46.78  1.01   72      up          osd.24
28    hdd   0.40619   1.00000  416 GiB  179 GiB  163 GiB   90 MiB  1.1 GiB  237 GiB  43.06  0.93   58      up          osd.28
32    hdd   0.40619   1.00000  416 GiB  198 GiB  182 GiB  161 MiB  1.9 GiB  218 GiB  47.56  1.02   76      up          osd.32
-5          3.65570         -  3.7 TiB  1.7 TiB  1.5 TiB  683 MiB   16 GiB  2.0 TiB  45.23  0.97    -              host argo014
 2    hdd   0.40619   1.00000  416 GiB  183 GiB  167 GiB  136 MiB  1.8 GiB  233 GiB  44.02  0.95   72      up          osd.2
 7    hdd   0.40619   1.00000  416 GiB  185 GiB  169 GiB  9.7 MiB  1.6 GiB  231 GiB  44.54  0.96   69      up          osd.7
11    hdd   0.40619   1.00000  416 GiB  206 GiB  190 GiB    1 KiB  2.1 GiB  210 GiB  49.59  1.07   78      up          osd.11
14    hdd   0.40619   1.00000  416 GiB  188 GiB  172 GiB  5.2 MiB  1.8 GiB  228 GiB  45.25  0.97   61      up          osd.14
18    hdd   0.40619   1.00000  416 GiB  188 GiB  172 GiB  176 MiB  1.4 GiB  228 GiB  45.27  0.98   77      up          osd.18
22    hdd   0.40619   1.00000  416 GiB  182 GiB  166 GiB  152 MiB  1.8 GiB  234 GiB  43.79  0.94   70      up          osd.22
26    hdd   0.40619   1.00000  416 GiB  194 GiB  178 GiB   25 MiB  1.9 GiB  222 GiB  46.66  1.01   66      up          osd.26
30    hdd   0.40619   1.00000  416 GiB  192 GiB  176 GiB  107 MiB  1.8 GiB  224 GiB  46.20  1.00   73      up          osd.30
34    hdd   0.40619   1.00000  416 GiB  174 GiB  158 GiB   72 MiB  1.5 GiB  242 GiB  41.77  0.90   68      up          osd.34
-9          3.24951         -  3.2 TiB  1.6 TiB  1.5 TiB  934 MiB   16 GiB  1.6 TiB  50.41  1.09    -              host argo017
 5    hdd   0.40619   1.00000  416 GiB  234 GiB  218 GiB  162 MiB  2.2 GiB  182 GiB  56.26  1.21   86      up          osd.5
 9    hdd   0.40619   1.00000  416 GiB  207 GiB  191 GiB  124 MiB  2.3 GiB  209 GiB  49.75  1.07   83      up          osd.9
13    hdd   0.40619   1.00000  416 GiB  194 GiB  178 GiB   45 MiB  1.9 GiB  222 GiB  46.61  1.00   72      up          osd.13
17    hdd   0.40619   1.00000  416 GiB  195 GiB  179 GiB   48 MiB  1.6 GiB  221 GiB  46.86  1.01   73      up          osd.17
21    hdd   0.40619   1.00000  416 GiB  219 GiB  203 GiB  112 MiB  2.1 GiB  197 GiB  52.60  1.13   84      up          osd.21
25    hdd   0.40619   1.00000  416 GiB  195 GiB  179 GiB   36 MiB  1.7 GiB  221 GiB  46.78  1.01   72      up          osd.25
29    hdd   0.40619   1.00000  416 GiB  203 GiB  187 GiB  221 MiB  2.2 GiB  213 GiB  48.92  1.05   81      up          osd.29
33    hdd   0.40619   1.00000  416 GiB  231 GiB  215 GiB  185 MiB  2.0 GiB  185 GiB  55.48  1.20   84      up          osd.33
                        TOTAL   14 TiB  6.6 TiB  6.1 TiB  3.2 GiB   61 GiB  7.6 TiB  46.42
MIN/MAX VAR: 0.90/1.21  STDDEV: 3.24

2. In a healthy cluster, where all PGs are in active+ clean state, try removing a OSD from 1 host.
In the below example, OSD 0 from argo013 was being removed. 
# date
Tue Aug 22 12:10:27 UTC 2023

# ceph orch osd rm 0 --force --zap
Scheduled OSD(s) for removal.
[ceph: root@argo012 /]# date
Tue Aug 22 12:11:28 UTC 2023
[ceph: root@argo012 /]# ceph orch osd rm status
OSD  HOST     STATE     PGS  REPLACE  FORCE  ZAP   DRAIN STARTED AT
0    argo013  draining   58  False    True   True  2023-08-22 12:11:27.848424

3. Initially, the PGs start draining, but observing that the drain stops after it is done draining around 75% PGs. When I checked the PG dump at this point, it looked like the drain is stuck on PGs where the OSD being removed, is the Primary OSD in that PG.

# ceph orch osd rm status
OSD  HOST     STATE     PGS  REPLACE  FORCE  ZAP   DRAIN STARTED AT
0    argo013  draining   22  False    True   True  2023-08-22 12:11:27.848424
[root@argo012 ~]# date
Wed Aug 23 04:07:22 AM UTC 2023

# ceph orch osd rm status
OSD  HOST     STATE     PGS  REPLACE  FORCE  ZAP   DRAIN STARTED AT
0    argo013  draining   22  False    True   True  2023-08-22 12:11:27.848424
[root@argo012 ~]# date
Wed Aug 23 05:38:27 AM UTC 2023

[root@argo012 ~]# ceph orch osd rm status
OSD  HOST     STATE     PGS  REPLACE  FORCE  ZAP   DRAIN STARTED AT
0    argo013  draining   22  False    True   True  2023-08-22 12:11:27.848424
[root@argo012 ~]# date
Wed Aug 23 11:01:42 AM UTC 2023

At this point, we do not see any recovery activity in ceph -s o/p also.

# ceph -s
  cluster:
    id:     66070a80-2f84-11ee-bc2c-0cc47af3ea56
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum argo012,argo013,argo014 (age 9d)
    mgr: argo013.akdhka(active, since 13d), standbys: argo012.odttqx, argo014.xfhnzv
    osd: 35 osds: 35 up (since 21h), 35 in (since 2d); 22 remapped pgs
    rgw: 4 daemons active (4 hosts, 1 zones)

  data:
    pools:   7 pools, 673 pgs
    objects: 4.42M objects, 3.0 TiB
    usage:   6.6 TiB used, 7.6 TiB / 14 TiB avail
    pgs:     171950/17688441 objects misplaced (0.972%)
             651 active+clean
             16  active+remapped+backfilling
             6   active+remapped+backfill_wait

  io:
    client:   10 KiB/s rd, 143 KiB/s wr, 14 op/s rd, 27 op/s wr

[root@argo012 ~]# ceph pg dump | grep backfill
6.1bd       8679                   0         0       8679        0  6519914496            0           0   7605      7605    active+remapped+backfilling  2023-08-22T12:11:40.514461+0000   19194'35959   19194:222764     [4,3,9,18]           4     [0,3,9,18]               0   18421'35707  2023-08-20T18:45:07.674753+0000      18421'35707  2023-08-18T14:56:15.799803+0000              0                   36  queued for scrub                                                             8427                0
6.1b8       8517                   0         0      17034        0  6200819712            0           0   7393      7393    active+remapped+backfilling  2023-08-22T12:11:41.238886+0000   19194'35586   19194:218781  [20,23,26,13]          20   [0,23,26,25]               0   18422'35313  2023-08-20T17:33:29.237283+0000      18422'35313  2023-08-20T17:33:29.237283+0000              0                  175  queued for scrub                                                             8244                0
6.1af       8479                   0         0       8479        0  6370885632            0           0   7261      7261    active+remapped+backfilling  2023-08-22T12:11:40.475878+0000   19194'35370   19194:192817    [12,18,3,9]          12     [0,18,3,9]               0   18422'35131  2023-08-22T12:09:30.568001+0000      18422'35131  2023-08-22T12:09:30.568001+0000              0                  560  periodic scrub scheduled @ 2023-08-23T15:52:04.077250+0000                   8240                0
6.1aa       8662                   0         0       8662        0  6562643968            0           0   7425      7425    active+remapped+backfilling  2023-08-22T12:11:41.434841+0000   19194'35518   19194:193262   [12,11,6,13]          12    [0,11,6,13]               0   18422'35250  2023-08-20T16:57:40.112433+0000      18337'33987  2023-08-14T17:18:34.663615+0000              0                   36  queued for deep scrub                                                        8394                0
6.19d       8627                   0         0       8627        0  6525550592            0           0   7314      7314    active+remapped+backfilling  2023-08-22T12:11:39.263735+0000   19194'35434   19194:197745   [16,25,26,6]          16    [0,25,26,6]               0   18422'35151  2023-08-20T09:25:48.183947+0000      18396'34264  2023-08-16T11:51:33.787266+0000              0                   35  queued for scrub                                                             8344                0
6.180       8826                   0         0       8826        0  6708264960            0           0   6784      6784    active+remapped+backfilling  2023-08-22T12:11:40.553126+0000   19194'34345   19194:182555  [32,30,17,31]          32   [0,30,17,31]               0   18421'34084  2023-08-20T01:22:52.567792+0000      18421'34084  2023-08-20T01:22:52.567792+0000              0                  176  queued for scrub                                                             8565                0
6.59        8661                   0         0       8661        0  6445924352            0           0   7796      7796    active+remapped+backfilling  2023-08-22T14:53:10.041447+0000   19194'36355   19194:117635  [28,18,13,15]          28   [0,18,13,15]               0   18421'36082  2023-08-22T11:26:01.536220+0000      18421'36082  2023-08-22T11:26:01.536220+0000              0                  382  periodic scrub scheduled @ 2023-08-23T22:36:35.973627+0000                   8388                0
6.4d        8510                   0         0       8510        0  6429343744            0           0   7680      7680  active+remapped+backfill_wait  2023-08-22T14:48:22.748410+0000   19194'35958   19194:159698  [12,25,18,19]          12   [0,25,18,19]               0   18422'35685  2023-08-22T11:38:43.676947+0000      18411'35524  2023-08-17T05:34:58.535235+0000              0                   35  periodic scrub scheduled @ 2023-08-23T11:46:45.368400+0000                   8237                0
6.4a        8603                   0         0      17206        0  6389039104            0           0   7624      7624  active+remapped+backfill_wait  2023-08-22T14:52:58.531620+0000   19194'35895   19194:145778  [12,22,25,27]          12   [0,22,33,27]               0   18422'35640  2023-08-22T11:29:58.371470+0000      18422'35640  2023-08-19T23:11:14.163755+0000              0                   35  periodic scrub scheduled @ 2023-08-23T21:00:28.665562+0000                   8348                0
7.1b           3                   0         0          3        0           0     41666444      189764  10048     10048    active+remapped+backfilling  2023-08-22T12:11:41.250313+0000  19194'812502   19194:916015      [28,3,13]          28       [0,3,13]               0  18951'800381  2023-08-22T11:26:05.828515+0000     18951'800381  2023-08-22T11:26:05.828515+0000              0                    4  periodic scrub scheduled @ 2023-08-23T23:25:02.991379+0000                      3                0
3.6            6                   0         0         12        0           0            0           0   9495      9495  active+remapped+backfill_wait  2023-08-22T14:53:10.040326+0000  19194'153995   19194:252905     [26,13,27]          26       [0,13,2]               0  18951'148821  2023-08-22T11:37:46.197728+0000     18510'134991  2023-08-20T11:05:55.008644+0000              0                    1  periodic scrub scheduled @ 2023-08-23T19:10:36.035866+0000                      6                0
7.4            4                   0         0          4        0           0     31724167      138831  10068     10068    active+remapped+backfilling  2023-08-22T12:11:43.275135+0000  19194'416071   19194:486357        [8,9,2]           8        [0,9,2]               0  18422'403982  2023-08-20T14:21:33.629927+0000     18396'351102  2023-08-16T12:45:54.473320+0000              0                    1  queued for scrub                                                                4                0
7.6            7                   0         0          7        0           0     67901442      315891   4422      4422    active+remapped+backfilling  2023-08-22T12:11:41.434578+0000  18508'472722   19193:565304      [12,22,5]          12       [0,22,5]               0  18508'472722  2023-08-22T11:30:09.680199+0000     18508'472722  2023-08-20T09:18:04.840742+0000              0                    1  periodic scrub scheduled @ 2023-08-23T20:39:20.602754+0000                      7                0
6.16        8668                   0         0       8668        0  6560940032            0           0   6920      6920  active+remapped+backfill_wait  2023-08-22T14:48:33.793550+0000   19194'34730    19194:69433   [28,7,23,29]          28    [0,7,23,29]               0   18422'34477  2023-08-21T02:38:11.482845+0000      18333'33321  2023-08-14T10:17:48.909163+0000              0                   36  queued for deep scrub                                                        8415                0
6.b3        8588                   0         0       8588        0  6560677888            0           0   7315      7315  active+remapped+backfill_wait  2023-08-22T14:48:41.968101+0000   19194'35707   19194:159836    [4,3,17,34]           4    [0,3,17,34]               0   18422'35433  2023-08-21T00:43:44.822688+0000      18422'35433  2023-08-21T00:43:44.822688+0000              0                  151  queued for scrub                                                             8314                0
6.c2        8542                   0         0       8542        0  6335627264            0           0   7566      7566  active+remapped+backfill_wait  2023-08-22T14:52:39.323031+0000   19194'35591   19194:187451    [16,3,2,13]          16     [0,3,2,13]               0   18422'35321  2023-08-21T00:15:30.088884+0000      18406'34902  2023-08-16T22:56:36.951863+0000              0                   35  queued for scrub                                                             8272                0
6.c8        8385                   0         0       8385        0  6342639616            0           0   7740      7740    active+remapped+backfilling  2023-08-22T13:53:02.847391+0000   19194'36287   19194:155784     [8,9,3,22]           8     [0,9,3,22]               0   18422'36007  2023-08-20T18:07:00.914425+0000      18337'34831  2023-08-14T15:39:09.546106+0000              0                   35  queued for deep scrub                                                        8105                0
6.ea        8700                   0         0       8700        0  6701318144            0           0   8400      8400    active+remapped+backfilling  2023-08-22T12:37:57.250097+0000   19194'36507   19194:215173   [12,2,29,27]          12    [0,2,29,27]               0   18422'36214  2023-08-20T08:30:19.565009+0000      18422'36214  2023-08-19T02:49:58.782983+0000              0                   36  queued for scrub                                                             8407                0
6.f5        8692                   0         0       8692        0  6698369024            0           0   8315      8315    active+remapped+backfilling  2023-08-22T12:11:40.501888+0000   19194'36206   19194:305781  [12,30,31,17]          12   [0,30,31,17]               0   18422'35937  2023-08-20T17:30:28.074025+0000      18422'35937  2023-08-20T17:30:28.074025+0000              0                  170  queued for scrub                                                             8423                0
6.f7        8521                   0         0       8521        0  6457851904            0           0   8230      8230    active+remapped+backfilling  2023-08-22T12:11:40.120079+0000   19194'36752   19194:206013     [4,23,2,5]           4     [0,23,2,5]               0   18422'36454  2023-08-19T23:33:43.864135+0000      17969'35091  2023-08-13T21:55:02.307873+0000              0                   35  queued for deep scrub                                                        8223                0
dumped all
6.173       8556                   0         0       8556        0  6451757056            0           0   8113      8113    active+remapped+backfilling  2023-08-22T12:11:40.476004+0000   19194'36649   19194:163973   [24,31,5,22]          24    [0,31,5,22]               0   18422'36378  2023-08-20T06:28:28.826979+0000      18422'36378  2023-08-19T00:02:15.118486+0000              0                   35  queued for scrub                                                             8285                0
6.177       8588                   0         0       8588        0  6445072384            0           0   8132      8132    active+remapped+backfilling  2023-08-22T12:11:40.491822+0000   19194'36654   19194:177980    [8,22,29,3]           8    [0,22,29,3]               0   18422'36388  2023-08-20T05:53:14.247617+0000      18422'36388  2023-08-20T05:53:14.247617+0000              0                  155  queued for scrub                                                             8322                0

4. The PG would remain in the same state, and there would be no improvement in the drain and OSD removal. The OSD service is running and the target host during the entire procedure.

Version-Release number of selected component (if applicable):
ceph version 17.2.6-100.el9cp (ea4e3ef8df2cf26540aae06479df031dcfc80343) quincy (stable)

How reproducible:
2/2 OSDs on 1 Cluster

Steps to Reproduce:
1. Deploy 4 node RHCS cluster, deploy EC 2+2 pool, and fill with data.
2. Remove a running OSD from any host, cmd : ceph orch osd rm <id> --force
3. Observe that drain does not complete and is stuck for a few PGs on the OSD.

Actual results:
PGs stuck draining and removal does not go through

Expected results:
PGs drained and OSD removed.

Additional info:

Observed that, If i stop the service, the drain completes within a few hours, and the OSD is removed from the cluster.

Files

Download all files

stuck_pgs_pg_query_6.15a_acting_primary_osd34.txt (32 KB) stuck_pgs_pg_query_6.15a_acting_primary_osd34.txt	Stuck PGs with osd.34 as acting primary and pg 6.15a details	Sridhar Seshasayee, 09/12/2023 04:37 AM
bz2233777_osd.34_data.tgz (210 KB) bz2233777_osd.34_data.tgz		Sridhar Seshasayee, 09/12/2023 04:44 AM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #62811

PGs stuck in backfilling state after their primary OSD is removed by setting its crush weight to 0.

Updated by Sridhar Seshasayee 8 months ago