Feature #16760: detect osd failures when all osds are down - RADOS - Ceph

Actions

Copy link

Feature #16760

open

detect osd failures when all osds are down

Added by David Peraza almost 8 years ago. Updated over 6 years ago.

Status:

New

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Reviewed:

Affected Versions:

Ceph - v0.94.5

Component(RADOS):

OSD

Pull request ID:

Description

During our system test at Cisco one of the things we do is simulate a complete storage failure. We do this by taking all storage nodes offline from the CIMC or UCSM. I our test we have 4 storage nodes 3 of them with 7 osds each and a forth one with 4 osds. After taking all storage nodes donw and waiting for more than 6 hours. We still see the ceph status in warning and the osds of the last storage node we took down still show up and in:

cephmon_4612 [ceph@david_server-3 /]$ ceph status
cluster dbc29438-d3e0-4e0c-852b-170aaf4bd935
health HEALTH_WARN
810 pgs degraded
3 pgs stale
810 pgs stuck degraded
386 pgs stuck inactive
3 pgs stuck stale
1024 pgs stuck unclean
810 pgs stuck undersized
810 pgs undersized
monmap e1: 3 mons at {ceph-david_server-1=20.0.0.7:6789/0,ceph-david_server-2=20.0.0.6:6789/0,ceph-david_server-3=20.0.0.5:6789/0}
election epoch 6, quorum 0,1,2 ceph-david_server-3,ceph-david_server-2,ceph-david_server-1
osdmap e230: 25 osds: 7 up, 7 in; 656 remapped pgs
pgmap v498: 1024 pgs, 5 pools, 0 bytes data, 0 objects
328 MB used, 19391 GB / 19392 GB avail
422 active+undersized+degraded+remapped
383 undersized+degraded+peered
211 active
5 active+undersized+degraded
3 stale

And ceph monitors never get notified that those osd are down:
[ceph@david_server-3 /]$ ceph osd stat
osdmap e230: 25 osds: 7 up, 7 in; 656 remapped pgs

This could stay like this for hours. The documentation says that after 900 Seconds if OSD are not reporting it will mark them down and eventually after 300 seconds out.

I looked around to check if there is a config option to allow for osd daemon to force the report to the monitor but I can't find anything

We will like to be able to know that the cluster is down and in Error state but instead we see as warning.

Here is my ceph info

[ceph@david_server-3 /]$ ceph version
ceph version 0.94.5-9.el7cp (deef183a81111fa5e128ec88c90a32c9587c615d)

[ceph@david_server-3 /]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Here is my ceph config:

[ceph@david_server-3 /]$ cat /etc/ceph/ceph.conf

Please do not change this file directly since it is managed by Ansible and will be overwritten

[global]
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
cephx require signatures = True # Kernel RBD does NOT support signatures!
cephx cluster require signatures = True
cephx service require signatures = False
fsid = dbc29438-d3e0-4e0c-852b-170aaf4bd935
max open files = 131072
osd pool default pg num = 64
osd pool default pgp num = 64
osd pool default size = 3
osd pool default min size = 2
osd pool default crush rule = 0 # Disable in-memory logs
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0

[client]
rbd cache = true
rbd cache writethrough until flush = true
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/qemu/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor

[mon]
mon osd down out interval = 300
mon osd min down reporters = 7
[mon.ceph-david_server-1]
host = david_server-1
mon addr = 20.0.0.7
[mon.ceph-david_server-3]
host = david_server-3
mon addr = 20.0.0.5
[mon.ceph-david_server-2]
host = david_server-2
mon addr = 20.0.0.6

[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = noatime,largeio,inode64,swalloc
osd journal size = 20000
cluster_network = 30.0.0.0/24
public_network = 20.0.0.0/24
osd mon heartbeat interval = 30 # Performance tuning
filestore merge threshold = 40
filestore split multiple = 8
osd op threads = 8
filestore op threads = 8
filestore max sync interval = 5
osd max scrubs = 1 # Recovery tuning
osd recovery max active = 5
osd max backfills = 2
osd recovery op priority = 2
osd recovery max chunk = 1048576
osd recovery threads = 1
osd objectstore = filestore
osd crush update on start = true

Actions

Copy link

Updated by Samuel Just almost 8 years ago

Tracker changed from Bug to Feature
Subject changed from OSDs not reported down when all storage nodes are offline to detect osd failures when all osds are down
Category deleted (~~Monitor~~)

This isn't really a bug. We use the osds to do failure detection and there weren't any left around. I'm changing this to a feature request for an external failure monitoring thing.

Actions

Copy link