Project

General

Profile

Actions

Bug #23519

closed

mds: mds got laggy because of MDSBeacon stuck in mqueue

Added by dongdong tao about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

the MDSBeacon message from monitor may stuck for long time in mqueue.
because DispatcherQueue is currently dispatching a MDSMap(rejoin) message, most of the time is
spend in process_imported_caps during rejoin.

Below is the log, see the first line and last line.

2018-03-29 12:51:13.600301 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d0b5e6080 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 29 v8293) v7
2018-03-29 12:51:13.600344 7fc0c8b06700  1 mds.0.8291 handle_mds_map i am now mds.0.8291
2018-03-29 12:51:13.600352 7fc0c8b06700  1 mds.0.8291 handle_mds_map state change up:reconnect --> up:rejoin
2018-03-29 12:51:13.600362 7fc0c8b06700  1 mds.0.8291 rejoin_start
2018-03-29 12:51:17.020712 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d0b5e63c0 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 30 v8293) v7
2018-03-29 12:51:18.420713 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd mdsmap enqueue
0x557d266d58c0 mdsmap(e 8295) v1
2018-03-29 12:51:21.041255 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557cfc186340 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 31 v8295) v7
2018-03-29 12:51:25.005205 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6acd00 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 32 v8295) v7
2018-03-29 12:51:29.024516 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6ad040 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 33 v8295) v7
2018-03-29 12:51:33.032054 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6ad380 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 34 v8295) v7
2018-03-29 12:51:37.025607 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6ad6c0 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 35 v8295) v7
2018-03-29 12:51:41.025065 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6ada00 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 36 v8295) v7
2018-03-29 12:51:45.024202 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6add40 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 37 v8295) v7
2018-03-29 12:51:49.137766 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6ae080 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 38 v8295) v7
2018-03-29 12:51:53.141025 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6ae3c0 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 39 v8295) v7
2018-03-29 12:51:57.024013 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6ae700 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 40 v8295) v7
2018-03-29 12:52:01.031026 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6aea40 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 41 v8295) v7
2018-03-29 12:52:05.027415 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6aed80 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 42 v8295) v7
2018-03-29 12:52:09.129624 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6af0c0 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 43 v8295) v7
2018-03-29 12:52:13.124805 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6af400 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 44 v8295) v7
2018-03-29 12:52:17.039225 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6af740 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 45 v8295) v7
2018-03-29 12:52:21.025163 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d2d6afa80 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 46 v8295) v7
2018-03-29 12:52:25.025566 7fc0cb280700  0 -- 10.19.248.31:6804/731296555 >> 10.19.248.31:6789/0 conn(0x557cfc0dc800 :-1 s=STATE_OPEN pgs=7475863 cs=1 l=1).taodd beacon enqueue
0x557d87ba0080 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 47 v8295) v7
2018-03-29 12:52:25.337149 7fc0c8b06700  1 mds.0.8291 rejoin_joint_start
2018-03-29 12:52:25.337199 7fc0c8b06700  0 -- 10.19.248.31:6804/731296555 taodd outqueue beacon 0x557d0b5e6080 mdsbeacon(31079818/enn-yc-31 up:rejoin seq 29 v8293) v7

Related issues 3 (0 open3 closed)

Related to CephFS - Bug #19706: Laggy mon daemons causing MDS failover (symptom: failed to set counters on mds daemons: set(['mds.dir_split']))Can't reproduce04/20/2017

Actions
Copied to CephFS - Backport #26923: mimic: mds: mds got laggy because of MDSBeacon stuck in mqueueResolvedPrashant DActions
Copied to CephFS - Backport #26924: luminous: mds: mds got laggy because of MDSBeacon stuck in mqueueResolvedPatrick DonnellyActions
Actions

Also available in: Atom PDF