Project

General

Profile

Actions

Bug #58366

closed

Crimson: unable to initialize pool for rbd due to inactive pgs

Added by Harsh Kumar over 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
crimson
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While trying to test Crimson-osd on DSAL lab cluster, it was observed that once OSDs were brought up and OSD pool was created, 31 pgs remained in inactive state and eventually progressed into unknown state. Due to pgs being unavailable, association of pool for rbd application is not possible

The issue is not reproducible with Quincy build on the same lab setup.

Crimson image - https://shaman.ceph.com/repos/ceph/main/aa49dee4e60f69d68f1c8252eef8f1c6cd991c08/crimson/267610/

Cephadm shell -

[root@dell-r730-043 /]# cephadm shell
Inferring fsid 129128f4-816f-11ed-ae0e-801844e02b40
Inferring config /var/lib/ceph/129128f4-816f-11ed-ae0e-801844e02b40/mon.dell-r730-043.dsal.lab.eng.rdu2.redhat.com/config
Using ceph image with id 'd92233276102' and tag 'aa49dee4e60f69d68f1c8252eef8f1c6cd991c08-crimson' created on 2022-12-13 16:56:05 +0000 UTC
quay.ceph.io/ceph-ci/ceph@sha256:7b703795d72ebf9fb6e9c28a88f6b50d10161225951107951541631dd2640a1b

[ceph: root@dell-r730-043 /]# ceph -v
ceph version 18.0.0-1417-gaa49dee4 (aa49dee4e60f69d68f1c8252eef8f1c6cd991c08) reef (dev)

ceph status shows a Health warning (Reduced data availability: 31 pgs inactive) and the pgs status as inactive

[ceph: root@dell-r730-043 /]# ceph -s
  cluster:
    id:     129128f4-816f-11ed-ae0e-801844e02b40
    health: HEALTH_WARN
            Reduced data availability: 31 pgs inactive

  services:
    mon: 3 daemons, quorum dell-r730-043.dsal.lab.eng.rdu2.redhat.com,dell-r730-006,dell-r730-026 (age 5d)
    mgr: dell-r730-043.dsal.lab.eng.rdu2.redhat.com.bhyhhe(active, since 5d), standbys: dell-r730-006.efjfln
    osd: 9 osds: 9 up (since 5d), 9 in (since 5d)

  data:
    pools:   2 pools, 33 pgs
    objects: 0 objects, 0 B
    usage:   53 MiB used, 900 GiB / 900 GiB avail
    pgs:     93.939% pgs unknown
             31 unknown
             2  active+clean

  progress:
    Global Recovery Event (0s)
      [............................]

DSAL Lab setup -

HOST                                        ADDR       LABELS                                                                            STATUS  
dell-r730-006.dsal.lab.eng.rdu2.redhat.com  10.1.8.16  crash alertmanager mon mgr osd node-exporter                                              
dell-r730-026.dsal.lab.eng.rdu2.redhat.com  10.1.8.36  crash osd mon node-exporter                                                               
dell-r730-043.dsal.lab.eng.rdu2.redhat.com  10.1.8.53  _admin crash alertmanager mon mgr prometheus osd grafana installer node-exporter          
3 hosts in cluster

Ceph OSDs up and running -

[ceph: root@dell-r730-043 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME               STATUS  REWEIGHT  PRI-AFF
-1         0.87918  root default                                     
-2         0.29306      host dell-r730-006                           
 2         0.09769          osd.2               up   1.00000  1.00000
 3         0.09769          osd.3               up   1.00000  1.00000
 6         0.09769          osd.6               up   1.00000  1.00000
-4         0.29306      host dell-r730-026                           
 0         0.09769          osd.0               up   1.00000  1.00000
 5         0.09769          osd.5               up   1.00000  1.00000
 7         0.09769          osd.7               up   1.00000  1.00000
-3         0.29306      host dell-r730-043                           
 1         0.09769          osd.1               up   1.00000  1.00000
 4         0.09769          osd.4               up   1.00000  1.00000
 8         0.09769          osd.8               up   1.00000  1.00000

Ceph Health Details -

[ceph: root@dell-r730-043 /]# ceph health detail
HEALTH_WARN Reduced data availability: 31 pgs inactive
[WRN] PG_AVAILABILITY: Reduced data availability: 31 pgs inactive
    pg 2.1 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.2 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.3 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.4 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.5 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.6 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.7 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.8 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.9 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.a is stuck inactive for 5d, current state unknown, last acting []
    pg 2.b is stuck inactive for 5d, current state unknown, last acting []
    pg 2.c is stuck inactive for 5d, current state unknown, last acting []
    pg 2.d is stuck inactive for 5d, current state unknown, last acting []
    pg 2.e is stuck inactive for 5d, current state unknown, last acting []
    pg 2.f is stuck inactive for 5d, current state unknown, last acting []
    pg 2.10 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.11 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.12 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.13 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.14 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.15 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.16 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.17 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.18 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.19 is stuck inactive for 5d, current state unknown, last acting []
    pg 2.1a is stuck inactive for 5d, current state unknown, last acting []
    pg 2.1b is stuck inactive for 5d, current state unknown, last acting []
    pg 2.1c is stuck inactive for 5d, current state unknown, last acting []
    pg 2.1d is stuck inactive for 5d, current state unknown, last acting []
    pg 2.1e is stuck inactive for 5d, current state unknown, last acting []
    pg 2.1f is stuck inactive for 5d, current state unknown, last acting []

Ceph ps stats -

[ceph: root@dell-r730-043 /]# ceph pg stat 
33 pgs: 31 unknown, 2 active+clean; 0 B data, 53 MiB used, 900 GiB / 900 GiB avail

Ceph pg dump stats -

[ceph: root@dell-r730-043 /]# ceph pg dump_stuck  
PG_STAT  STATE    UP  UP_PRIMARY  ACTING  ACTING_PRIMARY
2.e      unknown  []          -1      []              -1
2.d      unknown  []          -1      []              -1
2.c      unknown  []          -1      []              -1
2.b      unknown  []          -1      []              -1
2.a      unknown  []          -1      []              -1
2.9      unknown  []          -1      []              -1
2.8      unknown  []          -1      []              -1
2.7      unknown  []          -1      []              -1
2.6      unknown  []          -1      []              -1
2.5      unknown  []          -1      []              -1
2.3      unknown  []          -1      []              -1
2.1      unknown  []          -1      []              -1
2.2      unknown  []          -1      []              -1
2.4      unknown  []          -1      []              -1
2.f      unknown  []          -1      []              -1
2.1b     unknown  []          -1      []              -1
2.1c     unknown  []          -1      []              -1
2.1a     unknown  []          -1      []              -1
2.1d     unknown  []          -1      []              -1
2.1f     unknown  []          -1      []              -1
2.1e     unknown  []          -1      []              -1
2.19     unknown  []          -1      []              -1
2.18     unknown  []          -1      []              -1
2.17     unknown  []          -1      []              -1
2.16     unknown  []          -1      []              -1
2.15     unknown  []          -1      []              -1
2.14     unknown  []          -1      []              -1
2.13     unknown  []          -1      []              -1
2.12     unknown  []          -1      []              -1
2.11     unknown  []          -1      []              -1
2.10     unknown  []          -1      []              -1
ok

Ceph Config Dump -

[ceph: root@dell-r730-043 /]# ceph config dump
WHO     MASK                LEVEL     OPTION                                 VALUE                                                                                              RO
global                      basic     container_image                        quay.ceph.io/ceph-ci/ceph@sha256:7b703795d72ebf9fb6e9c28a88f6b50d10161225951107951541631dd2640a1b  * 
global                      basic     log_to_file                            true                                                                                                 
mon                         advanced  auth_allow_insecure_global_id_reclaim  false                                                                                                
mon                         advanced  public_network                         10.1.8.0/22                                                                                        * 
mgr                         advanced  mgr/cephadm/container_init             True                                                                                               * 
mgr                         advanced  mgr/cephadm/migration_current          5                                                                                                  * 
mgr                         advanced  mgr/dashboard/ALERTMANAGER_API_HOST    http://host.containers.internal:9093                                                               * 
mgr                         advanced  mgr/dashboard/GRAFANA_API_SSL_VERIFY   false                                                                                              * 
mgr                         advanced  mgr/dashboard/GRAFANA_API_URL          https://host.containers.internal:3000                                                              * 
mgr                         advanced  mgr/dashboard/PROMETHEUS_API_HOST      http://host.containers.internal:9095                                                               * 
mgr                         advanced  mgr/dashboard/ssl_server_port          8443                                                                                               * 
mgr                         advanced  mgr/orchestrator/orchestrator          cephadm                                                                                              
osd     host:dell-r730-006  basic     osd_memory_target                      29262227456                                                                                          
osd     host:dell-r730-026  basic     osd_memory_target                      30693042176                                                                                          
osd     host:dell-r730-043  basic     osd_memory_target                      28187644586                                                                                          
osd                         advanced  osd_memory_target_autotune             true

Ceph pg dump attached
Ceph /var/logs - https://drive.google.com/file/d/1bVTcnDwnVHZCtvuz1V55Hisz0-MTGzuz/view?usp=share_link


Files

pg_dump_all.txt (15.4 KB) pg_dump_all.txt Harsh Kumar, 12/27/2022 09:25 PM
Actions

Also available in: Atom PDF