Project

General

Profile

Actions

Bug #14992

closed

"rados ls -p xxx" hangs forever when all "rbd_directory" obj replica osds down

Added by science luo about 8 years ago. Updated about 8 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It is easy to reproduce:
1.ceph osd pool create testpool 128 128
2.rbd create testpool/testimg --size 10240
3.ceph osd map testpool rbd_directory
If the result was ----> osdmap e11539 pool 'testpool' (7) object 'rbd_directory' > pg 7.682da7d1 (7.51) -> up ([6,4,16], p6) acting ([6,4,16], p6)
4.stop osd.6 osd.4 osd.16
5.rados ls -p testpool -----
>hangs forever.

Actions #1

Updated by Kefu Chai about 8 years ago

  • Status changed from New to Rejected

in this case, rbd_directory is mapped to osd.{6,4,16}. and all osds serving that pg are down and in at that moment. the rados client is waiting for an osdmap with active primary osd so it can send pgls message to it.

OSDs serving a PG are supposed to deployed to different failure domains, hence all OSDs down at the same time should not happen with a well designed crushmap. i think it's the expected behaviour.

let me know if you think otherwise.

Actions

Also available in: Atom PDF