Project

General

Profile

Actions

Support #49268

closed

Blocked IOs up to 30 seconds when host powered down

Added by Julien Demais about 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

Hello all,

I am facing an "issue" with my ceph cluster.

I have a small 6 nodes cluster.
Each node has 2 OSDs (1 TB each) and runs a radosgw. The five first nodes are also MON and MGR. I know we are not supposed to mix roles but due to the small number of nodes. I don't have any other choice.
I am doing some failover/resilience tests and I am facing a rather "long" I/O outage when a node is "hardly" shutdown.

When node1 (who is MON leader) is hardly powered down, I/Os are stuck up to 30 seconds for objects stored on this node's OSDs.

I have read Ceph's documentation and tuned the following parameters in order to speed up the new MON leader election has well as down OSDs detection:

mon_osd_adjust_heartbeat_grace = false
mon_lease = 2
mon_election_timeout = 2
osd_heartbeat_interval = 2
osd_heartbeat_grace = 5
osd_mon_heartbeat_interval = 10

This did improve my situation in terms or fast OSD down detection as well as monitor election but I still face a rather "long" I/O interruption due to peering/inactive PGs has seen in below ceph health output:

2021-02-11T13:05:43.328841+0000 mon.node3-1 [INF] mon.node3-1 calling monitor election
2021-02-11T13:05:43.329370+0000 mon.node5-1 [INF] mon.node5-1 calling monitor election
2021-02-11T13:05:43.330975+0000 mon.node2-1 [INF] mon.node2-1 calling monitor election
2021-02-11T13:05:43.344531+0000 mon.node4-1 [INF] mon.node4-1 calling monitor election
2021-02-11T13:05:47.363606+0000 mon.node2-1 [INF] mon.node2-1 is new leader, mons node2-1,node3-1,node5-1,node4-1 in quorum (ranks 1,2,3,4)
2021-02-11T13:05:47.376127+0000 mon.node2-1 [WRN] Health check failed: 1/5 mons down, quorum node2-1,node3-1,node5-1,node4-1 (MON_DOWN)
2021-02-11T13:05:47.380482+0000 mon.node2-1 [INF] osd.1 failed (root=default,host=node1-1) (2 reporters from different host after 7.008152 >= grace 5.000000)
2021-02-11T13:05:47.380710+0000 mon.node2-1 [INF] osd.4 failed (root=default,host=node1-1) (2 reporters from different host after 7.008317 >= grace 5.000000)
2021-02-11T13:05:47.385631+0000 mon.node2-1 [WRN] overall HEALTH_WARN 1/5 mons down, quorum node2-1,node3-1,node5-1,node4-1
2021-02-11T13:05:47.433987+0000 mon.node2-1 [WRN] Health check failed: 2 osds down (OSD_DOWN)
2021-02-11T13:05:47.434035+0000 mon.node2-1 [WRN] Health check failed: 1 host (2 osds) down (OSD_HOST_DOWN)
2021-02-11T13:05:49.462627+0000 mon.node2-1 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 12 pgs peering (PG_AVAILABILITY)
2021-02-11T13:05:49.462677+0000 mon.node2-1 [WRN] Health check failed: Degraded data redundancy: 6933/276867 objects degraded (2.504%), 28 pgs degraded (PG_DEGRADED)
2021-02-11T13:05:55.367734+0000 mon.node2-1 [WRN] Health check update: Reduced data availability: 5 pgs inactive, 10 pgs peering (PG_AVAILABILITY)
2021-02-11T13:05:55.367795+0000 mon.node2-1 [WRN] Health check update: Degraded data redundancy: 41533/276867 objects degraded (15.001%), 163 pgs degraded (PG_DEGRADED)
2021-02-11T13:06:05.607535+0000 mon.node2-1 [WRN] Health check update: Degraded data redundancy: 45603/276867 objects degraded (16.471%), 178 pgs degraded (PG_DEGRADED)
2021-02-11T13:06:05.607587+0000 mon.node2-1 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 5 pgs inactive, 10 pgs peering)

Between 13:05:41 when the node is shutdown and 13:06:05 when the PG availability health check is cleared, I therefore have a 24 seconds service interruption on some files.

Are there some ways to even reduce this downtime to less than 10 secondes or is it just a fantasy of mine?

Actions #1

Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to RADOS
  • Status changed from New to Closed

You can also tune how quickly the OSDs report their peers down from missing heartbeats, but in general losing a monitor (especially the leader) at the same time as some OSDs will result in slower timeouts due to the simultaneous loss of status information.

Actions

Also available in: Atom PDF