Project

General

Profile

Bug #17308

MDSMonitor should tolerate paxos delays without failing daemons (Was: Unexplained delay forwarding message between mons)

Added by John Spray about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have had a couple of mysterious new issues crop up in CephFS that I now think might be the result of some weirdness happening in the mon.
http://tracker.ceph.com/issues/17253
http://tracker.ceph.com/issues/17236

For 17253 I looked closely at the failure here:
http://qa-proxy.ceph.com/teuthology/jspray-2016-09-12_14:31:30-fs-wip-jcsp-testing-20160912-distro-basic-mira/411825/teuthology.log
...and found that there was a 15 second delay happening in forwarding a message between mons when tracing an mdsbeacon message:

remote/mira032/log/ceph-mds.a-s.log.gz:2016-09-12 22:40:33.392905 7fba36e86700  1 -- 172.21.4.112:6808/103879 >> 172.21.4.112:6790/0 conn(0x7fba4a9eb800 sd=18 :-1 s=STATE_OPEN pgs=11 cs=1 l=1). == tx == 0x7fba4aa82340 mdsbeacon(4115/a-s up:active seq 286 v7) v7

remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:33.393058 7fbc737df700  1 -- 172.21.4.112:6790/0 >> 172.21.4.112:6808/103879 conn(0x7fbc854b8800 sd=33 :6790 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1 cs=1 l=1). == rx == mds.0 seq 323 0x7fbc86e0f6c0 mdsbeacon(4115/a-s up:active seq 286 v7) v7
remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:33.393187 7fbc727dd700  1 -- 172.21.4.112:6790/0 <== mds.0 172.21.4.112:6808/103879 323 ==== mdsbeacon(4115/a-s up:active seq 286 v7) v7 ==== 367+0+0 (2027675394 0 0) 0x7fbc86e0f6c0 con 0x7fbc854b8800
remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:33.393246 7fbc727dd700 10 mon.c@2(peon).paxosservice(mdsmap 1..7) dispatch 0x7fbc86e0f6c0 mdsbeacon(4115/a-s up:active seq 286 v7) v7 from mds.0 172.21.4.112:6808/103879 con 0x7fbc854b8800
remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:48.544674 7fbc727dd700 10 mon.c@2(peon).paxosservice(mdsmap 1..8) dispatch 0x7fbc86e0f6c0 mdsbeacon(4115/a-s up:active seq 286 v7) v7 from mds.0 172.21.4.112:6808/103879 con 0x7fbc854b8800
remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:48.544686 7fbc727dd700 10 mon.c@2(peon).mds e8 preprocess_query mdsbeacon(4115/a-s up:active seq 286 v7) v7 from mds.0 172.21.4.112:6808/103879
remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:48.544696 7fbc727dd700 12 mon.c@2(peon).mds e8 preprocess_beacon mdsbeacon(4115/a-s up:active seq 286 v7) v7 from mds.0 172.21.4.112:6808/103879 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:48.544710 7fbc727dd700 10 mon.c@2(peon) e1 forward_request 765 request mdsbeacon(4115/a-s up:active seq 286 v7) v7 features 1152921504336314367
remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:48.544714 7fbc727dd700  1 -- 172.21.4.112:6790/0 _send_message--> mon.0 172.21.4.112:6789/0 -- forward(mdsbeacon(4115/a-s up:active seq 286 v7) v7 caps allow * tid 765 con_features 1152921504336314367) to leader v3 -- ?+0 0x7fbc86f1ab00
remote/mira032/log/ceph-mon.c.log.gz:2016-09-12 22:40:48.544721 7fbc727dd700  1 -- 172.21.4.112:6790/0 >> 172.21.4.112:6789/0 conn(0x7fbc854ad000 sd=21 :-1 s=STATE_OPEN pgs=5 cs=1 l=0). == tx == 0x7fbc86f1ab00 forward(mdsbeacon(4115/a-s up:active seq 286 v7) v7 caps allow * tid 765 con_features 1152921504336314367) to leader v3

remote/mira032/log/ceph-mon.b.log.gz:2016-09-12 22:40:48.544817 7fe340fe8700  1 -- 172.21.4.112:6789/0 >> 172.21.4.112:6790/0 conn(0x7fe353203800 sd=30 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=3 cs=1 l=0). == rx == mon.2 seq 2510 0x7fe354495600 forward(mdsbeacon(4115/a-s up:active seq 286 v7) v7 caps allow * tid 765 con_features 1152921504336314367) to leader v3
remote/mira032/log/ceph-mon.b.log.gz:2016-09-12 22:40:48.545342 7fe33ffe6700  1 -- 172.21.4.112:6789/0 <== mon.2 172.21.4.112:6790/0 2510 ==== forward(mdsbeacon(4115/a-s up:active seq 286 v7) v7 caps allow * tid 765 con_features 1152921504336314367) to leader v3 ==== 634+0+0 (1224176700 0 0) 0x7fe354495600 con 0x7fe353203800
remote/mira032/log/ceph-mon.b.log.gz:2016-09-12 22:40:48.545391 7fe33ffe6700 10 mon.b@0(leader).paxosservice(mdsmap 1..8) dispatch 0x7fe354892340 mdsbeacon(4115/a-s up:active seq 286 v7) v7 from mds.0 172.21.4.112:6808/103879 con 0x7fe35359b100
remote/mira032/log/ceph-mon.b.log.gz:2016-09-12 22:40:48.545401 7fe33ffe6700 10 mon.b@0(leader).mds e8 preprocess_query mdsbeacon(4115/a-s up:active seq 286 v7) v7 from mds.0 172.21.4.112:6808/103879
remote/mira032/log/ceph-mon.b.log.gz:2016-09-12 22:40:48.545411 7fe33ffe6700 12 mon.b@0(leader).mds e8 preprocess_beacon mdsbeacon(4115/a-s up:active seq 286 v7) v7 from mds.0 172.21.4.112:6808/103879 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
remote/mira032/log/ceph-mon.b.log.gz:2016-09-12 22:40:48.545416 7fe33ffe6700  7 mon.b@0(leader).mds e8 mds_beacon mdsbeacon(4115/a-s up:active seq 286 v7) v7 is not in fsmap (state up:active)
remote/mira032/log/ceph-mon.b.log.gz:2016-09-12 22:40:48.545431 7fe33ffe6700 15 mon.b@0(leader) e1 send_reply routing reply to 172.21.4.112:6808/103879 via 172.21.4.112:6790/0 for request mdsbeacon(4115/a-s up:active seq 286 v7) v7


Related issues

Related to CephFS - Feature #17309: qa: mon_thrash test for CephFS Resolved

History

#1 Updated by John Spray about 6 years ago

  • Project changed from CephFS to Ceph
  • Category set to Monitor

#2 Updated by Kefu Chai about 6 years ago

  • Assignee set to Kefu Chai

#3 Updated by Kefu Chai about 6 years ago

  1. leader (mon.b@0) sent begin to peons
    • 2016-09-12 22:40:32.630175 7fe3407e7700 10 mon.b@0(leader).paxos(paxos updating c 252..811) sending begin to mon.1
    • 2016-09-12 22:40:32.630269 7fe3407e7700 10 mon.b@0(leader).paxos(paxos updating c 252..811) sending begin to mon.2
  2. peon (mon.c@2) accepted and replied right away
    • 2016-09-12 22:40:32.630521 7fbc727dd700 10 mon.c@2(peon).paxos(paxos active c 252..811) handle_begin paxos(begin lc 811 fc 0 pn 1000 opn 0) v3
    • 2016-09-12 22:40:32.756201 7fbc727dd700 1 -- 172.21.4.112:6790/0 >> 172.21.4.112:6789/0 conn(0x7fbc854ad000 sd=21 :-1 s=STATE_OPEN pgs=5 cs=1 l=0). tx 0x7fbc85f5d200 paxos(accept lc 811 fc 0 pn 1000 opn 0) v3
    • 2016-09-12 22:40:41.539687 7fbc72fde700 1 mon.c@2(peon).paxos(paxos updating c 252..811) lease_timeout -- calling new election
  3. peon (mon.a@1) accepted but replied after 8 seconds. the only blocking call is "get_store()->apply_transaction(t);".
    • 2016-09-12 22:40:32.630903 7f30f4250700 10 mon.a@1(peon).paxos(paxos updating c 252..811) accepting value for 812 pn 1000
    • 2016-09-12 22:40:40.578897 7f30f4250700 1 -- 172.21.5.130:6789/0 >> 172.21.4.112:6789/0 conn(0x7f31066cd000 sd=21 :-1 s=STATE_OPEN pgs=3 cs=1 l=0). tx 0x7f3108072800 paxos(accept lc 811 fc 0 pn 1000 opn 0) v3

so before the leader committed and updated the lease time of its peons, mon.c's lease timed out and it proposed an election.

  1. mon.c@2 stopped for 6 seconds for applying a transaction for updating election_epoch, (mon->store->apply_transaction(t); in bump_epoch()"
    • 2016-09-12 22:40:41.540652 7fbc727dd700 5 mon.c@2(electing).elector(20) start -- can i be leader?
    • 2016-09-12 22:40:41.540662 7fbc727dd700 1 mon.c@2(electing).elector(20) init, last seen epoch 20
    • 2016-09-12 22:40:41.540663 7fbc727dd700 10 mon.c@2(electing).elector(20) bump_epoch 20 to 21
    • 2016-09-12 22:40:47.587977 7fbc727dd700 10 mon.c@2(electing) e1 join_election

that's why it took 15 seconds to recover from this problem:

  1. 8 seconds for mon.a@1 to write the pending proposal to leveldb
  2. 6 seconds for mon.c@2 to write the new election epoch to leveldb and continued with the election.

#4 Updated by John Spray about 6 years ago

  • Project changed from Ceph to CephFS
  • Subject changed from Unexplained delay forwarding message between mons to MDSMonitor should tolerate paxos delays without failing daemons (Was: Unexplained delay forwarding message between mons)
  • Category changed from Monitor to Correctness/Safety
  • Assignee changed from Kefu Chai to John Spray
  • Component(FS) MDSMonitor added

#5 Updated by John Spray about 6 years ago

  • Status changed from New to Fix Under Review

#6 Updated by John Spray about 6 years ago

  • Status changed from Fix Under Review to Resolved

#7 Updated by Patrick Donnelly over 4 years ago

Also available in: Atom PDF