Bug #14992: "rados ls -p xxx" hangs forever when all "rbd_directory" obj replica osds down - Ceph - Ceph

Actions

Copy link

Bug #14992

closed

"rados ls -p xxx" hangs forever when all "rbd_directory" obj replica osds down

Added by science luo about 8 years ago. Updated about 8 years ago.

Status:

Rejected

Priority:

High

Assignee:

Kefu Chai

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

It is easy to reproduce:
1.ceph osd pool create testpool 128 128
2.rbd create testpool/testimg --size 10240
3.ceph osd map testpool rbd_directory
If the result was ----> osdmap e11539 pool 'testpool' (7) object 'rbd_directory' > pg 7.682da7d1 (7.51) -> up ([6,4,16], p6) acting ([6,4,16], p6)
4.stop osd.6 osd.4 osd.16
5.rados ls -p testpool ----->hangs forever.

Actions

Copy link

Updated by Kefu Chai about 8 years ago

Status changed from New to Rejected

in this case, rbd_directory is mapped to osd.{6,4,16}. and all osds serving that pg are down and in at that moment. the rados client is waiting for an osdmap with active primary osd so it can send pgls message to it.

OSDs serving a PG are supposed to deployed to different failure domains, hence all OSDs down at the same time should not happen with a well designed crushmap. i think it's the expected behaviour.

let me know if you think otherwise.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #14992

"rados ls -p xxx" hangs forever when all "rbd_directory" obj replica osds down

Updated by Kefu Chai about 8 years ago