Bug #14992
closed"rados ls -p xxx" hangs forever when all "rbd_directory" obj replica osds down
0%
Description
It is easy to reproduce:
1.ceph osd pool create testpool 128 128
2.rbd create testpool/testimg --size 10240
3.ceph osd map testpool rbd_directory
If the result was ----> osdmap e11539 pool 'testpool' (7) object 'rbd_directory' > pg 7.682da7d1 (7.51) -> up ([6,4,16], p6) acting ([6,4,16], p6)>hangs forever.
4.stop osd.6 osd.4 osd.16
5.rados ls -p testpool -----
Updated by Kefu Chai about 8 years ago
- Status changed from New to Rejected
in this case, rbd_directory is mapped to osd.{6,4,16}. and all osds serving that pg are down and in at that moment. the rados client is waiting for an osdmap with active primary osd so it can send pgls message to it.
OSDs serving a PG are supposed to deployed to different failure domains, hence all OSDs down at the same time should not happen with a well designed crushmap. i think it's the expected behaviour.
let me know if you think otherwise.