Support #49268: Blocked IOs up to 30 seconds when host powered down - RADOS - Ceph

Actions

Copy link

Support #49268

closed

Blocked IOs up to 30 seconds when host powered down

Added by Julien Demais about 3 years ago. Updated almost 3 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Tags:

Reviewed:

Affected Versions:

Ceph - v15.2.7

Component(RADOS):

Pull request ID:

Description

Hello all,

I am facing an "issue" with my ceph cluster.

I have a small 6 nodes cluster.
Each node has 2 OSDs (1 TB each) and runs a radosgw. The five first nodes are also MON and MGR. I know we are not supposed to mix roles but due to the small number of nodes. I don't have any other choice.
I am doing some failover/resilience tests and I am facing a rather "long" I/O outage when a node is "hardly" shutdown.

When node1 (who is MON leader) is hardly powered down, I/Os are stuck up to 30 seconds for objects stored on this node's OSDs.

I have read Ceph's documentation and tuned the following parameters in order to speed up the new MON leader election has well as down OSDs detection:

mon_osd_adjust_heartbeat_grace = false
mon_lease = 2
mon_election_timeout = 2
osd_heartbeat_interval = 2
osd_heartbeat_grace = 5
osd_mon_heartbeat_interval = 10

This did improve my situation in terms or fast OSD down detection as well as monitor election but I still face a rather "long" I/O interruption due to peering/inactive PGs has seen in below ceph health output:

2021-02-11T13:05:43.328841+0000 mon.node3-1 [INF] mon.node3-1 calling monitor election
2021-02-11T13:05:43.329370+0000 mon.node5-1 [INF] mon.node5-1 calling monitor election
2021-02-11T13:05:43.330975+0000 mon.node2-1 [INF] mon.node2-1 calling monitor election
2021-02-11T13:05:43.344531+0000 mon.node4-1 [INF] mon.node4-1 calling monitor election
2021-02-11T13:05:47.363606+0000 mon.node2-1 [INF] mon.node2-1 is new leader, mons node2-1,node3-1,node5-1,node4-1 in quorum (ranks 1,2,3,4)
2021-02-11T13:05:47.376127+0000 mon.node2-1 [WRN] Health check failed: 1/5 mons down, quorum node2-1,node3-1,node5-1,node4-1 (MON_DOWN)
2021-02-11T13:05:47.380482+0000 mon.node2-1 [INF] osd.1 failed (root=default,host=node1-1) (2 reporters from different host after 7.008152 >= grace 5.000000)
2021-02-11T13:05:47.380710+0000 mon.node2-1 [INF] osd.4 failed (root=default,host=node1-1) (2 reporters from different host after 7.008317 >= grace 5.000000)
2021-02-11T13:05:47.385631+0000 mon.node2-1 [WRN] overall HEALTH_WARN 1/5 mons down, quorum node2-1,node3-1,node5-1,node4-1
2021-02-11T13:05:47.433987+0000 mon.node2-1 [WRN] Health check failed: 2 osds down (OSD_DOWN)
2021-02-11T13:05:47.434035+0000 mon.node2-1 [WRN] Health check failed: 1 host (2 osds) down (OSD_HOST_DOWN)
2021-02-11T13:05:49.462627+0000 mon.node2-1 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 12 pgs peering (PG_AVAILABILITY)
2021-02-11T13:05:49.462677+0000 mon.node2-1 [WRN] Health check failed: Degraded data redundancy: 6933/276867 objects degraded (2.504%), 28 pgs degraded (PG_DEGRADED)
2021-02-11T13:05:55.367734+0000 mon.node2-1 [WRN] Health check update: Reduced data availability: 5 pgs inactive, 10 pgs peering (PG_AVAILABILITY)
2021-02-11T13:05:55.367795+0000 mon.node2-1 [WRN] Health check update: Degraded data redundancy: 41533/276867 objects degraded (15.001%), 163 pgs degraded (PG_DEGRADED)
2021-02-11T13:06:05.607535+0000 mon.node2-1 [WRN] Health check update: Degraded data redundancy: 45603/276867 objects degraded (16.471%), 178 pgs degraded (PG_DEGRADED)
2021-02-11T13:06:05.607587+0000 mon.node2-1 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 5 pgs inactive, 10 pgs peering)

Between 13:05:41 when the node is shutdown and 13:06:05 when the PG availability health check is cleared, I therefore have a 24 seconds service interruption on some files.

Are there some ways to even reduce this downtime to less than 10 secondes or is it just a fantasy of mine?

Actions

Copy link

Updated by Greg Farnum almost 3 years ago

Project changed from Ceph to RADOS
Status changed from New to Closed

You can also tune how quickly the OSDs report their peers down from missing heartbeats, but in general losing a monitor (especially the leader) at the same time as some OSDs will result in slower timeouts due to the simultaneous loss of status information.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Support #49268

Blocked IOs up to 30 seconds when host powered down

Updated by Greg Farnum almost 3 years ago