Project

General

Profile

Bug #18515

Ceph -s give us wrong information about the cluster when OSDs in a cluster are all removed.

Added by James Liu over 4 years ago. Updated over 4 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We stoped all of OSD services in a cluster but just keep Monitor. When we tried to use Ceph -s, we still can see the cluster is alive with some osds up.

#ceph -s
cluster b5e97941-6c74-4853-8c60-bc3c0bd6a12e
health HEALTH_ERR
378 pgs are stuck inactive for more than 300 seconds
480 pgs degraded
21 pgs peering
138 pgs stale
378 pgs stuck inactive
134 pgs stuck unclean
480 pgs undersized
monmap e1: 1 mons at {a27f140009.eu13=172.29.222.16:6789/0}
election epoch 3, quorum 0 a27f140009.eu13
osdmap e164: 24 osds: 7 up, 7 in; 375 remapped pgs
flags sortbitwise
pgmap v634: 512 pgs, 1 pools, 0 bytes data, 0 objects
273 MB used, 12373 GB / 12373 GB avail
149 undersized+degraded+peered
112 stale+undersized+degraded+peered
86 active+undersized+degraded+remapped
76 undersized+degraded+remapped+peered
35 active+undersized+degraded
18 stale+undersized+degraded+remapped+peered
15 remapped+peering
11 active
4 stale+remapped+peering
2 stale+peering
2 activating+undersized+degraded
1 stale+active+undersized+degraded+remapped
1 stale+active+undersized+degraded´╝Ü

History

#1 Updated by Kefu Chai over 4 years ago

  • Status changed from New to Rejected

it's expected. we rely on osd peers to report the failure to mon. and mon will mark an osd down if it has not received the pg stat for over "mon_osd_report_timeout".

if you shut all the OSDs down at once, monitor can hardly mark them down very soon without the help of OSD's failure reports.

Also available in: Atom PDF