Project

General

Profile

Actions

Bug #39249

closed

Some PGs stuck in active+remapped state

Added by Марк Коренберг about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Sometimes my PGs stuck in this state. When I stop primary OSD containig this PG, it becomes `active+undersized+degraded` and does not get remapped even when I start this OSD back again.

How to debug that? I have plenty of space on other OSDs. Restarting all OSDs does not help.

```
$ ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME
-1 14.15028 - 14 TiB 5.6 TiB 8.1 TiB 40.80 1.00 - root default
-2 3.19478 - 3.2 TiB 1.3 TiB 1.9 TiB 41.32 1.01 - host node1
6 blue_ssd 0.45599 1.00000 467 GiB 202 GiB 265 GiB 43.26 1.06 256 osd.6
1 prod 1.82419 0.93387 1.8 TiB 764 GiB 1.1 TiB 40.90 1.00 223 osd.1
2 prod 0.91460 0.79158 937 GiB 386 GiB 551 GiB 41.19 1.01 107 osd.2
-3 2.28519 - 2.3 TiB 1000 GiB 1.3 TiB 42.72 1.05 - host node2
0 blue_ssd 0.45599 1.00000 467 GiB 202 GiB 265 GiB 43.28 1.06 256 osd.0
3 prod 0.91460 0.83400 937 GiB 396 GiB 541 GiB 42.29 1.04 104 osd.3
4 prod 0.91460 0.72214 937 GiB 402 GiB 535 GiB 42.88 1.05 119 osd.4
-4 2.28996 - 1.8 TiB 826 GiB 1.0 TiB 44.05 1.08 - host node3
7 blue_ssd 0.45599 1.00000 467 GiB 202 GiB 265 GiB 43.26 1.06 256 osd.7
11 prod 0.45969 0 0 B 0 B 0 B 0 0 0 osd.11
13 prod 0.45969 0.84837 471 GiB 216 GiB 255 GiB 45.86 1.12 57 osd.13
14 prod 0.91460 0.65007 937 GiB 408 GiB 529 GiB 43.53 1.07 97 osd.14
-9 3.63689 - 3.6 TiB 1.4 TiB 2.3 TiB 37.66 0.92 - host node4
5 prod 0.90919 1.00000 931 GiB 350 GiB 581 GiB 37.58 0.92 97 osd.5
9 prod 1.81850 1.00000 1.8 TiB 745 GiB 1.1 TiB 40.00 0.98 207 osd.9
10 prod 0.90919 1.00000 931 GiB 308 GiB 623 GiB 33.04 0.81 92 osd.10
-16 2.74347 - 2.7 TiB 1.1 TiB 1.6 TiB 40.57 0.99 - host node5
8 prod 0.91449 0.94768 936 GiB 387 GiB 549 GiB 41.36 1.01 120 osd.8
12 prod 0.91449 0.84109 936 GiB 377 GiB 559 GiB 40.28 0.99 91 osd.12
16 prod 0.91449 0.70984 936 GiB 375 GiB 561 GiB 40.07 0.98 93 osd.16
TOTAL 14 TiB 5.6 TiB 8.6 TiB 40.80
```

So, my question is: how to debug such cases. My crushmap does not contain anything special (like upmaps) except two classes defined (prod and blue_ssd)

Actions

Also available in: Atom PDF