Bug #43216
MDSMonitor: removes MDS coming out of quorum election
0%
Description
Event sequence:
- 2019-12-07T12:26:26.854 mon_thrash kills mon.a(leader)
- 2019-12-07T12:27:07.843 mon_thrash revives mon.a
- 2019-12-07T12:27:12.478 socket fault causes mds.a to reconnect to revived mon.a(probing)
- 2019-12-07T12:27:32.074 mon.b replaces mds.a
- 2019-12-07T12:27:33.494 mon.a back in quorum, drops beacons from evicted mds.a
2019-12-07T11:07:02.936 INFO:teuthology.orchestra.run.smithi146.stderr:dumped fsmap epoch 6 2019-12-07T11:07:02.936 INFO:teuthology.orchestra.run.smithi146.stdout: 2019-12-07T11:07:02.936 INFO:teuthology.orchestra.run.smithi146.stdout:{"epoch":6,"default_fscid":1,"compat":{"compat":{},"ro_compat":{},"incompat":{"feature_1":"base v0.20","feature_2":"client writeable ranges","feature_3":"default file layouts on dirs","feature_4":"dir inode in separate object","feature_5":"mds uses versioned encoding","feature_6":"dirfrag is stored in omap","feature_8":"no anchor table","feature_9":"file layout v2","feature_10":"snaprealm v2"}},"feature_flags":{"enable_multiple":false,"ever_enabled_multiple":false},"standbys":[{"gid":4350,"name":"b","rank":-1,"incarnation":0,"state":"up:standby","state_seq":1,"addr":"172.21.15.194:6833/1676386680","addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.194:6832","nonce":1676386680},{"type":"v1","addr":"172.21.15.194:6833","nonce":1676386680}]},"export_targets":[],"features":4540138292836696063,"flags":0,"epoch":4}],"filesystems":[{"mdsmap":{"epoch":6,"flags":18,"ever_allowed_features":0,"explicitly_allowed_features":0,"created":"2019-12-07T11:06:38.797089+0000","modified":"2019-12-07T11:06:44.297113+0000","tableserver":0,"root":0,"session_timeout":60,"session_autoclose":300,"min_compat_client":"0 (unknown)","max_file_size":1099511627776,"last_failure":0,"last_failure_osd_epoch":0,"compat":{"compat":{},"ro_compat":{},"incompat":{"feature_1":"base v0.20","feature_2":"client writeable ranges","feature_3":"default file layouts on dirs","feature_4":"dir inode in separate object","feature_5":"mds uses versioned encoding","feature_6":"dirfrag is stored in omap","feature_8":"no anchor table","feature_9":"file layout v2","feature_10":"snaprealm v2"}},"max_mds":1,"in":[0],"up":{"mds_0":4341},"failed":[],"damaged":[],"stopped":[],"info":{"gid_4341":{"gid":4341,"name":"a","rank":0,"incarnation":5,"state":"up:active","state_seq":2,"addr":"172.21.15.146:6835/4036894910","addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.146:6834","nonce":4036894910},{"type":"v1","addr":"172.21.15.146:6835","nonce":4036894910}]},"export_targets":[],"features":4540138292836696063,"flags":0}},"data_pools":[3],"metadata_pool":2,"enabled":true,"fs_name":"cephfs","balancer":"","standby_count_wanted":0},"id":1}]} 2019-12-07T11:07:02.941 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 -- 172.21.15.146:0/2284702334 >> [v2:172.21.15.146:6800/8811,v1:172.21.15.146:6801/8811] conn(0x7f46f8023630 msgr2=0x7f46f8025ac0 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down 2019-12-07T11:07:02.941 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 --2- 172.21.15.146:0/2284702334 >> [v2:172.21.15.146:6800/8811,v1:172.21.15.146:6801/8811] conn(0x7f46f8023630 0x7f46f8025ac0 crc :-1 s=READY pgs=96 cs=0 l=1 rx=0 tx=0).stop 2019-12-07T11:07:02.941 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 -- 172.21.15.146:0/2284702334 >> [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] conn(0x7f471410ac20 msgr2=0x7f47141452c0 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down 2019-12-07T11:07:02.941 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 --2- 172.21.15.146:0/2284702334 >> [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] conn(0x7f471410ac20 0x7f47141452c0 secure :-1 s=READY pgs=193 cs=0 l=1 rx=0x7f470c0103b0 tx=0x7f470c010750).stop 2019-12-07T11:07:02.941 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 -- 172.21.15.146:0/2284702334 shutdown_connections 2019-12-07T11:07:02.941 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 --2- 172.21.15.146:0/2284702334 >> [v2:172.21.15.146:6800/8811,v1:172.21.15.146:6801/8811] conn(0x7f46f8023630 0x7f46f8025ac0 unknown :-1 s=CLOSED pgs=96 cs=0 l=1 rx=0 tx=0).stop 2019-12-07T11:07:02.941 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 --2- 172.21.15.146:0/2284702334 >> [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] conn(0x7f471410ac20 0x7f47141452c0 unknown :-1 s=CLOSED pgs=193 cs=0 l=1 rx=0 tx=0).stop 2019-12-07T11:07:02.942 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 -- 172.21.15.146:0/2284702334 shutdown_connections 2019-12-07T11:07:02.942 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 -- 172.21.15.146:0/2284702334 wait complete. 2019-12-07T11:07:02.942 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T11:07:02.941+0000 7f4718d81700 1 -- 172.21.15.146:0/2284702334 >> 172.21.15.146:0/2284702334 conn(0x7f471411ab50 msgr2=0x7f4714111310 unknown :-1 s=STATE_NONE l=0).mark_down 2019-12-07T11:07:02.951 INFO:tasks.mon_thrash.mon_thrasher:start thrashing 2019-12-07T11:07:02.951 INFO:tasks.mon_thrash.mon_thrasher:seed: 1575716820, revive delay: 20.0, thrash delay: 10.0 thrash many: False, maintain quorum: True store thrash: False, probability: 50 freeze mon: prob 10.0 duration 15.0 ... 2019-12-07T12:26:26.854 INFO:tasks.mon_thrash.mon_thrasher:monitors to thrash: ['a'] 2019-12-07T12:26:26.854 INFO:tasks.mon_thrash.mon_thrasher:monitors to freeze: [] 2019-12-07T12:26:26.854 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.a 2019-12-07T12:26:26.854 INFO:tasks.mon_thrash.mon_thrasher:killing mon.a 2019-12-07T12:26:26.855 DEBUG:tasks.ceph.mon.a:waiting for process to exit 2019-12-07T12:26:26.855 INFO:teuthology.orchestra.run:waiting for 300 2019-12-07T12:26:26.873 INFO:tasks.ceph.mon.a:Stopped 2019-12-07T12:26:26.874 INFO:tasks.mon_thrash.ceph_manager:waiting for quorum size 2 ... 2019-12-07T12:27:07.843 INFO:tasks.mon_thrash.mon_thrasher:reviving mon.a 2019-12-07T12:27:07.843 INFO:tasks.ceph.mon.a:Restarting daemon 2019-12-07T12:27:07.844 INFO:teuthology.orchestra.run.smithi146:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i a ... 2019-12-07T13:06:08.201 INFO:teuthology.orchestra.run.smithi146.stderr:dumped fsmap epoch 56 2019-12-07T13:06:08.201 INFO:teuthology.orchestra.run.smithi146.stdout: 2019-12-07T13:06:08.202 INFO:teuthology.orchestra.run.smithi146.stdout:{"epoch":56,"default_fscid":1,"compat":{"compat":{},"ro_compat":{},"incompat":{"feature_1":"base v0.20","feature_2":"client writeable ranges","feature_3":"default file layouts on dirs","feature_4":"dir inode in separate object","feature_5":"mds uses versioned encoding","feature_6":"dirfrag is stored in omap","feature_8":"no anchor table","feature_9":"file layout v2","feature_10":"snaprealm v2"}},"feature_flags":{"enable_multiple":false,"ever_enabled_multiple":false},"standbys":[{"gid":774256,"name":"a","rank":-1,"incarnation":0,"state":"up:standby","state_seq":1,"addr":"172.21.15.146:6835/2775721496","addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.146:6834","nonce":2775721496},{"type":"v1","addr":"172.21.15.146:6835","nonce":2775721496}]},"export_targets":[],"features":4540138292836696063,"flags":0,"epoch":52}],"filesystems":[{"mdsmap":{"epoch":56,"flags":18,"ever_allowed_features":2,"explicitly_allowed_features":2,"created":"2019-12-07T11:06:38.797089+0000","modified":"2019-12-07T12:27:43.614231+0000","tableserver":0,"root":0,"session_timeout":60,"session_autoclose":300,"min_compat_client":"0 (unknown)","max_file_size":1099511627776,"last_failure":0,"last_failure_osd_epoch":87,"compat":{"compat":{},"ro_compat":{},"incompat":{"feature_1":"base v0.20","feature_2":"client writeable ranges","feature_3":"default file layouts on dirs","feature_4":"dir inode in separate object","feature_5":"mds uses versioned encoding","feature_6":"dirfrag is stored in omap","feature_8":"no anchor table","feature_9":"file layout v2","feature_10":"snaprealm v2"}},"max_mds":1,"in":[0],"up":{"mds_0":4350},"failed":[],"damaged":[],"stopped":[],"info":{"gid_4350":{"gid":4350,"name":"b","rank":0,"incarnation":51,"state":"up:active","state_seq":1219,"addr":"172.21.15.194:6833/1676386680","addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.194:6832","nonce":1676386680},{"type":"v1","addr":"172.21.15.194:6833","nonce":1676386680}]},"export_targets":[],"features":4540138292836696063,"flags":0}},"data_pools":[3],"metadata_pool":2,"enabled":true,"fs_name":"cephfs","balancer":"","standby_count_wanted":0},"id":1}]} 2019-12-07T13:06:08.204 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 -- 172.21.15.146:0/1562519687 >> [v2:172.21.15.146:6800/8811,v1:172.21.15.146:6801/8811] conn(0x7f798c023630 msgr2=0x7f798c025ae0 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down 2019-12-07T13:06:08.205 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 --2- 172.21.15.146:0/1562519687 >> [v2:172.21.15.146:6800/8811,v1:172.21.15.146:6801/8811] conn(0x7f798c023630 0x7f798c025ae0 crc :-1 s=READY pgs=2216 cs=0 l=1 rx=0 tx=0).stop 2019-12-07T13:06:08.205 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 -- 172.21.15.146:0/1562519687 >> [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] conn(0x7f79a811c840 msgr2=0x7f79a811b870 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down 2019-12-07T13:06:08.205 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 --2- 172.21.15.146:0/1562519687 >> [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] conn(0x7f79a811c840 0x7f79a811b870 secure :-1 s=READY pgs=114 cs=0 l=1 rx=0x7f79a00107d0 tx=0x7f79a0010600).stop 2019-12-07T13:06:08.205 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 -- 172.21.15.146:0/1562519687 shutdown_connections 2019-12-07T13:06:08.205 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 --2- 172.21.15.146:0/1562519687 >> [v2:172.21.15.146:6800/8811,v1:172.21.15.146:6801/8811] conn(0x7f798c023630 0x7f798c025ae0 unknown :-1 s=CLOSED pgs=2216 cs=0 l=1 rx=0 tx=0).stop 2019-12-07T13:06:08.205 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 --2- 172.21.15.146:0/1562519687 >> [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] conn(0x7f79a811c840 0x7f79a811b870 unknown :-1 s=CLOSED pgs=114 cs=0 l=1 rx=0 tx=0).stop 2019-12-07T13:06:08.206 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 --2- 172.21.15.146:0/1562519687 >> [v2:172.21.15.194:3301/0,v1:172.21.15.194:6790/0] conn(0x7f79a8123a20 0x7f79a811bdb0 unknown :-1 s=CLOSED pgs=0 cs=0 l=1 rx=0 tx=0).stop 2019-12-07T13:06:08.206 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 -- 172.21.15.146:0/1562519687 shutdown_connections 2019-12-07T13:06:08.206 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 -- 172.21.15.146:0/1562519687 wait complete. 2019-12-07T13:06:08.206 INFO:teuthology.orchestra.run.smithi146.stderr:2019-12-07T13:06:08.201+0000 7f79afe07700 1 -- 172.21.15.146:0/1562519687 >> 172.21.15.146:0/1562519687 conn(0x7f79a81062d0 msgr2=0x7f79a810c6f0 unknown :-1 s=STATE_NONE l=0).mark_down 2019-12-07T13:06:08.216 ERROR:tasks.mon_thrash.mon_thrasher:exception: Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20191205.235703/qa/tasks/mon_thrash.py", line 232, in do_thrash self._do_thrash() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20191205.235703/qa/tasks/mon_thrash.py", line 345, in _do_thrash 'MDS Failover' AssertionError: MDS Failover 2019-12-07T13:06:08.227 INFO:tasks.mon_thrash.ceph_manager:waiting for quorum size 3
From failed mon's log:
2019-12-07T12:27:28.947+0000 7fd04893a700 1 -- [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] <== mds.0 v2:172.21.15.146:6834/4036894910 3 ==== mdsbeacon(4341/a up:active seq 1213 v50) v7 ==== 416+0+0 (crc 0 0 0) 0x55df3e537500 con 0x55df3e7f5800 2019-12-07T12:27:28.947+0000 7fd04893a700 20 mon.a@0(probing) e1 _ms_dispatch existing session 0x55df3e7dfdc0 for mds.0 2019-12-07T12:27:28.947+0000 7fd04893a700 20 mon.a@0(probing) e1 entity mds.a caps allow * 2019-12-07T12:27:28.947+0000 7fd04893a700 10 mon.a@0(probing).paxosservice(mdsmap 1..50) dispatch 0x55df3e537500 mdsbeacon(4341/a up:active seq 1213 v50) v7 from mds.0 v2:172.21.15.146:6834/4036894910 con 0x55df3e7f5800 2019-12-07T12:27:28.947+0000 7fd04893a700 5 mon.a@0(probing).paxos(paxos recovering c 3013..3743) is_readable = 0 - now=2019-12-07T12:27:28.948423+0000 lease_expire=1970-01-01T00:00:00.000000+0000 has v0 lc 3743 2019-12-07T12:27:28.947+0000 7fd04893a700 10 mon.a@0(probing).paxosservice(mdsmap 1..50) waiting for paxos -> readable (v50) 2019-12-07T12:27:28.947+0000 7fd04893a700 5 mon.a@0(probing).paxos(paxos recovering c 3013..3743) is_readable = 0 - now=2019-12-07T12:27:28.948443+0000 lease_expire=1970-01-01T00:00:00.000000+0000 has v0 lc 3743 ... 2019-12-07T12:27:32.477+0000 7fd04893a700 1 -- [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] <== mds.0 v2:172.21.15.146:6834/4036894910 4 ==== mdsbeacon(4341/a up:active seq 1214 v50) v7 ==== 416+0+0 (crc 0 0 0) 0x55df3e537800 con 0x55df3e7f5800 2019-12-07T12:27:32.477+0000 7fd04893a700 20 mon.a@0(probing) e1 _ms_dispatch existing session 0x55df3e7dfdc0 for mds.0 2019-12-07T12:27:32.477+0000 7fd04893a700 20 mon.a@0(probing) e1 entity mds.a caps allow * 2019-12-07T12:27:32.477+0000 7fd04893a700 10 mon.a@0(probing).paxosservice(mdsmap 1..50) dispatch 0x55df3e537800 mdsbeacon(4341/a up:active seq 1214 v50) v7 from mds.0 v2:172.21.15.146:6834/4036894910 con 0x55df3e7f5800 2019-12-07T12:27:32.477+0000 7fd04893a700 5 mon.a@0(probing).paxos(paxos recovering c 3013..3743) is_readable = 0 - now=2019-12-07T12:27:32.478790+0000 lease_expire=1970-01-01T00:00:00.000000+0000 has v0 lc 3743 2019-12-07T12:27:32.477+0000 7fd04893a700 10 mon.a@0(probing).paxosservice(mdsmap 1..50) waiting for paxos -> readable (v50) 2019-12-07T12:27:32.477+0000 7fd04893a700 5 mon.a@0(probing).paxos(paxos recovering c 3013..3743) is_readable = 0 - now=2019-12-07T12:27:32.478797+0000 lease_expire=1970-01-01T00:00:00.000000+0000 has v0 lc 3743 ... 2019-12-07T12:27:33.494+0000 7fd04893a700 20 mon.a@0(leader).paxos(paxos updating c 3264..3799) finish_round waiting_for_readable 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) dispatch 0x55df3e536900 mdsbeacon(4341/a up:active seq 1210 v50) v7 from mds.0 v2:172.21.15.146:6834/4036894910 con 0x55df3e669c00 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) discarding message from disconnected client mds.0 v2:172.21.15.146:6834/4036894910 mdsbeacon(4341/a up:active seq 1210 v50) v7 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) dispatch 0x55df3e536600 mdsbeacon(4341/a up:active seq 1211 v50) v7 from mds.0 v2:172.21.15.146:6834/4036894910 con 0x55df3e7f4400 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) discarding message from disconnected client mds.0 v2:172.21.15.146:6834/4036894910 mdsbeacon(4341/a up:active seq 1211 v50) v7 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) dispatch 0x55df3e537200 mdsbeacon(4341/a up:active seq 1212 v50) v7 from mds.0 v2:172.21.15.146:6834/4036894910 con 0x55df3e850400 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) discarding message from disconnected client mds.0 v2:172.21.15.146:6834/4036894910 mdsbeacon(4341/a up:active seq 1212 v50) v7 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) dispatch 0x55df3e537500 mdsbeacon(4341/a up:active seq 1213 v50) v7 from mds.0 v2:172.21.15.146:6834/4036894910 con 0x55df3e7f5800 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) discarding message from disconnected client mds.0 v2:172.21.15.146:6834/4036894910 mdsbeacon(4341/a up:active seq 1213 v50) v7 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) dispatch 0x55df3e537800 mdsbeacon(4341/a up:active seq 1214 v50) v7 from mds.0 v2:172.21.15.146:6834/4036894910 con 0x55df3e7f5800 2019-12-07T12:27:33.494+0000 7fd04893a700 10 mon.a@0(leader).paxosservice(mdsmap 1..51) discarding message from disconnected client mds.0 v2:172.21.15.146:6834/4036894910 mdsbeacon(4341/a up:active seq 1214 v50) v7
From: /ceph/teuthology-archive/pdonnell-2019-12-06_03:07:34-fs-wip-pdonnell-testing-20191205.235703-distro-basic-smithi/4573014/remote/smithi146/log/ceph-mon.a.log.gz
2019-12-07T12:27:12.479+0000 7fe907327700 1 -- [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] <== mds.0 v2:172.21.15.146:6834/4036894910 14 ==== mdsbeacon(4341/a up:active seq 1209 v50) v7 ==== 416+0+0 (crc 0 0 0) 0x5570be8d1500 con 0x5570c09d2400 2019-12-07T12:27:12.479+0000 7fe907327700 20 mon.b@1(leader) e1 _ms_dispatch existing session 0x5570c0986c40 for mds.0 2019-12-07T12:27:12.479+0000 7fe907327700 20 mon.b@1(leader) e1 entity mds.a caps allow * 2019-12-07T12:27:12.479+0000 7fe907327700 10 mon.b@1(leader).paxosservice(mdsmap 1..50) dispatch 0x5570be8d1500 mdsbeacon(4341/a up:active seq 1209 v50) v7 from mds.0 v2:172.21.15.146:6834/4036894910 con 0x5570c09d2400 2019-12-07T12:27:12.479+0000 7fe907327700 5 mon.b@1(leader).paxos(paxos active c 3264..3774) is_readable = 1 - now=2019-12-07T12:27:12.480273+0000 lease_expire=2019-12-07T12:27:17.106786+0000 has v0 lc 3774 2019-12-07T12:27:12.479+0000 7fe907327700 10 mon.b@1(leader).mds e50 preprocess_query mdsbeacon(4341/a up:active seq 1209 v50) v7 from mds.0 [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] 2019-12-07T12:27:12.479+0000 7fe907327700 20 is_capable service=mds command= exec addr v2:172.21.15.146:6834/4036894910 on cap allow * 2019-12-07T12:27:12.479+0000 7fe907327700 20 allow so far , doing grant allow * 2019-12-07T12:27:12.479+0000 7fe907327700 20 allow all 2019-12-07T12:27:12.479+0000 7fe907327700 5 mon.b@1(leader).mds e50 preprocess_beacon mdsbeacon(4341/a up:active seq 1209 v50) v7 from mds.0 [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} 2019-12-07T12:27:12.479+0000 7fe907327700 10 mon.b@1(leader).mds e50 preprocess_beacon: GID exists in map: 4341 2019-12-07T12:27:12.479+0000 7fe907327700 5 mon.b@1(leader).mds e50 _note_beacon mdsbeacon(4341/a up:active seq 1209 v50) v7 noting time 2019-12-07T12:27:12.479+0000 7fe907327700 2 mon.b@1(leader) e1 send_reply 0x5570c0d8b5f0 0x5570be8d1200 mdsbeacon(4341/a up:active seq 1209 v50) v7 2019-12-07T12:27:12.479+0000 7fe907327700 1 -- [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] --> [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] -- mdsbeacon(4341/a up:active seq 1209 v50) v7 -- 0x5570be8d1200 con 0x5570c09d2400 2019-12-07T12:27:12.479+0000 7fe906325700 1 -- [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] >> [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] conn(0x5570c09d2400 msgr2=0x5570be8b3180 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 41 2019-12-07T12:27:12.479+0000 7fe906325700 1 -- [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] >> [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] conn(0x5570c09d2400 msgr2=0x5570be8b3180 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 2019-12-07T12:27:12.479+0000 7fe906325700 1 --2- [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] >> [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] conn(0x5570c09d2400 0x5570be8b3180 crc :-1 s=READY pgs=679 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted) 2019-12-07T12:27:12.479+0000 7fe906325700 1 --2- [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] >> [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] conn(0x5570c09d2400 0x5570be8b3180 crc :-1 s=READY pgs=679 cs=0 l=1 rx=0 tx=0).stop 2019-12-07T12:27:12.479+0000 7fe907327700 10 mon.b@1(leader) e1 ms_handle_reset 0x5570c09d2400 v2:172.21.15.146:6834/4036894910 2019-12-07T12:27:12.479+0000 7fe907327700 10 mon.b@1(leader) e1 reset/close on session mds.0 [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] 2019-12-07T12:27:12.479+0000 7fe907327700 10 mon.b@1(leader) e1 remove_session 0x5570c0986c40 mds.0 [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] features 0x3f01cfb8ffadffff ... 2019-12-07T12:27:32.074+0000 7fe909b2c700 20 mon.b@1(leader).mds e50 maybe_resize_cluster in 1 max 1 2019-12-07T12:27:32.074+0000 7fe909b2c700 1 mon.b@1(leader).mds e50 no beacon from mds.0.5 (gid: 4341 addr: [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] state: up:active) since 19.5945s 2019-12-07T12:27:32.074+0000 7fe909b2c700 1 mon.b@1(leader).mds e50 replacing 4341 [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] mds.0.5 up:active with 4350/b [v2:172.21.15.194:6832/1676386680,v1:172.21.15.194:6833/1676386680] 2019-12-07T12:27:32.074+0000 7fe909b2c700 0 log_channel(cluster) log [WRN] : daemon mds.a is not responding, replacing it as rank 0 with standby daemon mds.b
From new leader: /ceph/teuthology-archive/pdonnell-2019-12-06_03:07:34-fs-wip-pdonnell-testing-20191205.235703-distro-basic-smithi/4573014/remote/smithi194/log/ceph-mon.b.log.gz
From MDS:
2019-12-07T12:27:12.478+0000 7f3ccadb9700 1 -- [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] <== mon.1 v2:172.21.15.194:3300/0 15 ==== mdsbeacon(4341/a up:active seq 1209 v50) v7 ==== 126+0+0 (crc 0 0 0) 0x55b6ac770900 con 0x55b6abb97000 2019-12-07T12:27:12.478+0000 7f3ccadb9700 5 mds.beacon.a received beacon reply up:active seq 1209 rtt 0.000999984 2019-12-07T12:27:12.478+0000 7f3ccadb9700 0 -- [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] >> [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] conn(0x55b6abb97000 msgr2=0x55b6a6efb600 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until injecting socket failure 2019-12-07T12:27:12.478+0000 7f3ccadb9700 1 -- [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] >> [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] conn(0x55b6abb97000 msgr2=0x55b6a6efb600 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 37 2019-12-07T12:27:12.478+0000 7f3ccadb9700 1 -- [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] >> [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] conn(0x55b6abb97000 msgr2=0x55b6a6efb600 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 2019-12-07T12:27:12.478+0000 7f3ccadb9700 1 --2- [v2:172.21.15.146:6834/4036894910,v1:172.21.15.146:6835/4036894910] >> [v2:172.21.15.194:3300/0,v1:172.21.15.194:6789/0] conn(0x55b6abb97000 0x55b6a6efb600 crc :-1 s=READY pgs=49 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
Injected socket fault caused MDS to connected to revived mon.a which was in synchronizing state. This caused beacons to be left on the floor.
Related issues
History
#1 Updated by Patrick Donnelly over 4 years ago
- Description updated (diff)
#2 Updated by Patrick Donnelly about 4 years ago
- Status changed from In Progress to New
- Assignee deleted (
Patrick Donnelly)
#3 Updated by Patrick Donnelly about 4 years ago
- Target version deleted (
v15.0.0)
#4 Updated by Patrick Donnelly almost 3 years ago
/ceph/teuthology-archive/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/6148290/teuthology.log
#5 Updated by Patrick Donnelly almost 3 years ago
/ceph/teuthology-archive/pdonnell-2021-06-11_18:02:10-fs-wip-pdonnell-testing-20210611.162716-distro-basic-smithi/6166663/teuthology.log
#6 Updated by Patrick Donnelly almost 3 years ago
- Status changed from New to Triaged
- Assignee set to Patrick Donnelly
- Target version set to v17.0.0
- Backport deleted (
nautilus,mimic)
/ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-20210616.191804-distro-basic-smithi/6175768/screen
#7 Updated by Patrick Donnelly over 2 years ago
- Status changed from Triaged to Fix Under Review
- Backport set to pacific
- Pull request ID set to 42943
#8 Updated by Patrick Donnelly over 2 years ago
- Status changed from Fix Under Review to Pending Backport
#9 Updated by Backport Bot over 2 years ago
- Copied to Backport #52636: pacific: MDSMonitor: removes MDS coming out of quorum election added
#10 Updated by Patrick Donnelly over 1 year ago
- Status changed from Pending Backport to Resolved