Bug #57506: crimson: vstart cluster pgs stuck in +wait - crimson - Ceph

Bug #57506

Main (c49b81c7d619cea23e9707d1f5bcc7de3049c4fd) + sjust/wip-io-hang (https://github.com/ceph/ceph/pull/48057) 

 <pre> 
   MDS=0 MGR=1 OSD=3 MON=1 ../src/vstart.sh --without-dashboard -X --crimson --redirect-output --debug -n --no-restart --crimson-smp 3 
   ./bin/ceph osd pool create rbd 32 32 replicated replicated_rule 2 2 2 
   ./bin/rados bench 1000 write --admin-socket asok/bench.asok -p rbd -b 4096 --debug-ms=1 2>out/bench.stderr 
   kill -9 688982 # kill osd.2 
   ./bin/ceph osd down 2 
   ./bin/ceph osd out 2 

 ./bin/ceph -s 

 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
 2022-09-12T23:02:44.425+0000 7fa6c5a4c640 -1 WARNING: all dangerous and experimental features are enabled. 
 2022-09-12T23:02:44.428+0000 7fa6c5a4c640 -1 WARNING: all dangerous and experimental features are enabled. 
   cluster: 
     id:       07d892de-d0fb-4b86-b74d-cbc11d240a7a 
     health: HEALTH_WARN 
             Degraded data redundancy: 14449/44218 objects degraded (32.677%), 17 pgs degraded 
             1 pool(s) do not have an application enabled 
 
   services: 
     mon: 1 daemons, quorum a (age 7m) 
     mgr: x(active, since 7m) 
     osd: 3 osds: 2 up (since 62s), 2 in (since 21s); 17 remapped pgs 
 
   data: 
     pools:     2 pools, 33 pgs 
     objects: 22.11k objects, 86 MiB 
     usage:     2.2 GiB used, 200 GiB / 202 GiB avail 
     pgs:       14449/44218 objects degraded (32.677%) 
              15 active+clean 
              14 active+recovery_wait+undersized+degraded+remapped+wait 
              2    active+recovering+undersized+degraded+remapped+wait 
              1    active+undersized+wait 
              1    active+recovery_wait+undersized+degraded+remapped 
 
   io: 
     client:     122 KiB/s wr, 0 op/s rd, 30 op/s wr 
     recovery: 85 KiB/s, 21 objects/s 
 </pre> 

 Interestingly, IO continues anyway.

Back

Project

General

Profile

Ceph » crimson

Bug #57506