Project

General

Profile

Bug #23713

High MON cpu usage when cluster is changing

Added by Xiaoxi Chen over 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
04/13/2018
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:

Description

After upgrading to Luminous 12.2.4 (from Jewel 10.2.5), we consistently see high cpu usage when OSDMap changes , especially from DOWN to UP.

At first the behavior was when one OSD gets down and up (trigger by `ceph osd down osd.x`), monitor quorum will stuck and re-election due to lease timeout. Further investigation shows
1. lease message is not getting handled for 3s+ during this period.
2. the CPU usage reach 300% in that periods, indicating 3 async worker thread is not enough.

2018-04-09 23:46:51.030583 7fecad4de700 10 mon.lvs3b03c-2gsx@1(peon).paxos(paxos active c 112056948..112057621) handle_lease on 112057621 now 2018-04-09 23:46:52.480946

Then we changed to use 8 worker threads , or changed back to simple messenger(as jewel default to simple), both looks better in above cases, i.e , single OSD restart will not causing lease timeout. But, if we reboot a OSD node with 24 osds, or just
`for i in seq 1 10 ; do ceph osd down $i; done;` can cause monitors use massive CPU (we saw 1600% CPU -> 16 cores in simple messenger) and the last 1 or 2 osd take very long time for `boot` message.

It looks like the ms_dispatch thread is busy with handling mon_subscribe from clients ( we have 7000+ volumes), interestingly, it usually take 0.2ms(counted by the interval between each 2 `handle_subscribe mon_subscribe` log from logfile) per mon_subscribe request in both 12.2.4 and 10.2.5, but it may take 15ms + sometimes, during above mentioned case.

@
2018-04-13 02:42:51.740904 7f173c8f5700 10 mon.lvs3b03c-2fht@0(leader) e37 handle_subscribe mon_subscribe({osdmap=2038709}) v2
2018-04-13 02:42:51.740910 7f173c8f5700 10 mon.lvs3b03c-2fht@0(leader).osd e2038709 check_osdmap_sub 0x7f178079f100 next 2038709 (onetime)
2018-04-13 02:42:51.740912 7f173c8f5700 5 mon.lvs3b03c-2fht@0(leader).osd e2038709 send_incremental [2038709..2038709] to client.? 10.173.84.46:0/524267465
2018-04-13 02:42:51.740915 7f173c8f5700 10 mon.lvs3b03c-2fht@0(leader).osd e2038709 build_incremental [2038709..2038709]
2018-04-13 02:42:51.754653 7f173c8f5700 10 mon.lvs3b03c-2fht@0(leader) e37 handle_subscribe mon_subscribe({osdmap=2038708}) v2
2018-04-13 02:42:51.754658 7f173c8f5700 10 mon.lvs3b03c-2fht@0(leader).osd e2038709 check_osdmap_sub 0x7f175dd8db80 next 2038708 (onetime)
2018-04-13 02:42:51.754660 7f173c8f5700 5 mon.lvs3b03c-2fht@0(leader).osd e2038709 send_incremental [2038708..2038709] to client.? 10.161.98.161:0/559412026
2018-04-13 02:42:51.754683 7f173c8f5700 10 mon.lvs3b03c-2fht@0(leader).osd e2038709 build_incremental [2038708..2038709]

@
I dont have further idea about what is going on, and the cluster now is fairly unstable if a bunch of OSDs get up/down in a short interval.

i can get whatever logs for debugging, as it is fair easy to reproduce


Related issues

Copied to RADOS - Backport #23912: luminous: mon: High MON cpu usage when cluster is changing Resolved

History

#1 Updated by Sage Weil over 1 year ago

My guess is that this is the compat reencoding of the OSDMap for the pre-luminous clients.

Are you by chance making use of the crush-compat balancer? That would additionally require a reencoded crush map.

#2 Updated by Greg Farnum over 1 year ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)
  • Component(RADOS) Monitor added

#3 Updated by Sage Weil about 1 year ago

  • Status changed from New to Need Review

#4 Updated by Sage Weil about 1 year ago

  • Status changed from Need Review to Pending Backport
  • Backport set to luminous

#5 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #23912: luminous: mon: High MON cpu usage when cluster is changing added

#6 Updated by Nathan Cutler 10 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF