Bug #23049
ceph Status shows only WARN when traffic to cluster fails
0%
Description
Hello,
While using Kraken, i have seen the status change to ERR but in luminous we do not see the status of ceph change to ERROR from WARN in failure cases..
Environment used - 3 node cluster
Erasure Coding - 2+1
Steps :
1. I have stopped osds on all the three nodes using systemctl stop ceph-osd.target
2. In this state, All read/write failed to the cluster. I expected the status to change to ERR since no read/write is possible at this state. But after 15 minutes, i could see the status was still as WARN and not ERR. (copied ceph status below)
cluster:
id: c36fb424-038a-4c38-84a4-1469481ad5c8
health: HEALTH_WARN
26 osds down
2 hosts (24 osds) down
Reduced data availability: 362 pgs inactive, 1024 pgs down
Degraded data redundancy: 1024 pgs unclean
services:
mon: 3 daemons, quorum pl12-cn1,pl12-cn2,pl12-cn3
mgr: pl12-cn3(active), standbys: pl12-cn2, pl12-cn1
osd: 36 osds: 10 up, 36 in
data:
pools: 1 pools, 1024 pgs
objects: 1512 objects, 2607 MB
usage: 44427 MB used, 196 TB / 196 TB avail
pgs: 100.000% pgs not active
869 down
155 stale+down
Also one more thing i noticed in this status is that, even though no osd processes were running on the system , the status still shows as 10 up. It did not change to 0 up. . Is this the expected?
Related issues
History
#1 Updated by Nokia ceph-users about 6 years ago
Please let me know of the required logs/info to be added if any.
#2 Updated by Josh Durgin about 6 years ago
- Project changed from Ceph to RADOS
- Priority changed from Normal to High
Can reproduce easily - thanks for the report.
2 bugs here - 1) the monitor is still enforcing the mon_osd_min_up_ratio even with polite shutdowns
2) health_warn doesn't turn into health_error even when all osds are down
#3 Updated by Josh Durgin about 6 years ago
- Category set to Administration/Usability
#4 Updated by Nokia ceph-users almost 6 years ago
hi,
which is the expected fix release version?
Thanks,
#5 Updated by Josh Durgin almost 6 years ago
- Priority changed from High to Urgent
#6 Updated by Josh Durgin over 4 years ago
- Priority changed from Urgent to Normal
#7 Updated by Greg Farnum over 4 years ago
- Related to Bug #23565: Inactive PGs don't seem to cause HEALTH_ERR added