Project

General

Profile

Bug #20624

Updated by Kefu Chai almost 7 years ago

mgr.x 
 <pre> 
 2017-07-13 18:33:19.097142 7f7539f55700 10 mgr tick tick 
 2017-07-13 18:33:19.097154 7f7539f55700    1 mgr send_beacon active 
 2017-07-13 18:33:19.097228 7f7539f55700 10 mgr send_beacon sending beacon as gid 4104 modules dashboard,restful,status,zabbix 
 2017-07-13 18:33:19.097248 7f7539f55700    1 -- 172.21.15.24:0/981164730 --> 172.21.15.59:6791/0 -- mgrbeacon mgr.x(a33ec6ea-dfe0-400c-93d5-8d56f813c9c0,4104, 172.21.15.24:6800/19514 
 0, 1) v3 -- ?+0 0x7f7550983400 con 0x7f755078d200 
 ... 
 2017-07-13 18:33:19.300288 7f7545762700    0 -- 172.21.15.24:0/981164730 >> 172.21.15.59:6791/0 pipe(0x7f7550414000 sd=8 :60336 s=2 pgs=74 cs=1 l=1 c=0x7f755078d200).injecting socket 
  failure 
 2017-07-13 18:33:19.302524 7f753e260700    1 -- 172.21.15.24:0/981164730 mark_down 0x7f755078d200 -- pipe dne 
 2017-07-13 18:33:19.302672 7f753e260700    1 -- 172.21.15.24:0/981164730 --> 172.21.15.24:6790/0 -- auth(proto 0 26 bytes epoch 1) v1 -- ?+0 0x7f7550985c00 con 0x7f7550686200 
 2017-07-13 18:33:19.302685 7f753e260700    1 -- 172.21.15.24:0/981164730 --> 172.21.15.24:6792/0 -- auth(proto 0 26 bytes epoch 1) v1 -- ?+0 0x7f7550985980 con 0x7f754edd4a00 
 2017-07-13 18:33:19.302708 7f753e260700    0 client.0 ms_handle_reset on 172.21.15.59:6791/0 
 2017-07-13 18:33:19.302712 7f753e260700    0 client.0 ms_handle_reset on 172.21.15.59:6791/0 
 2017-07-13 18:33:19.305209 7f753ba5b700    1 -- 172.21.15.24:0/981164730 >> 172.21.15.24:6790/0 pipe(0x7f75508fa000 sd=32 :54300 s=2 pgs=205 cs=1 l=1 c=0x7f7550686200).setting up a d 
 elay queue on Pipe 0x7f75508fa000 
 2017-07-13 18:33:19.309341 7f7545762700    1 -- 172.21.15.24:0/981164730 >> 172.21.15.24:6792/0 pipe(0x7f75508fc800 sd=8 :57656 s=2 pgs=158 cs=1 l=1 c=0x7f754edd4a00).setting up a de 
 lay queue on Pipe 0x7f75508fc800 
 ... 
 2017-07-13 18:33:21.097582 7f7539f55700 10 mgr tick tick 
 2017-07-13 18:33:21.097593 7f7539f55700    1 mgr send_beacon active 
 2017-07-13 18:33:21.097671 7f7539f55700 10 mgr send_beacon sending beacon as gid 4104 modules dashboard,restful,status,zabbix 
 2017-07-13 18:33:21.097682 7f7539f55700 10 mgr tick 
 ... 
 2017-07-13 18:33:21.885585 7f753ca5d700    1 -- 172.21.15.24:0/981164730 mark_down 0x7f754edd4a00 -- 0x7f75508fc800 
 2017-07-13 18:33:21.885634 7f753ca5d700    1 -- 172.21.15.24:0/981164730 mark_down 0x7f7550686200 -- 0x7f75508fa000 
 2017-07-13 18:33:21.885910 7f753ca5d700    1 -- 172.21.15.24:0/981164730 --> 172.21.15.24:6789/0 -- auth(proto 0 26 bytes epoch 1) v1 -- ?+0 0x7f7550983b80 con 0x7f75506dd000 
 2017-07-13 18:33:21.885926 7f753ca5d700    1 -- 172.21.15.24:0/981164730 --> 172.21.15.59:6793/0 -- auth(proto 0 26 bytes epoch 1) v1 -- ?+0 0x7f7550983900 con 0x7f75506dc200 
 2017-07-13 18:33:21.887931 7f753b95a700    0 -- 172.21.15.24:0/981164730 >> 172.21.15.24:6789/0 pipe(0x7f7550834800 sd=32 :0 s=1 pgs=0 cs=0 l=0 c=0x7f75506dd000).fault 
 2017-07-13 18:33:21.888159 7f7545762700    0 -- 172.21.15.24:0/981164730 >> 172.21.15.59:6793/0 pipe(0x7f7550414000 sd=8 :0 s=1 pgs=0 cs=0 l=0 c=0x7f75506dc200).fault 
 </pre> 

 mon.g 
 <pre> 
 ====== mon.g 
 2017-07-13 18:25:39.311329 7f167ee2c700    4 mon.g@2(peon).mgr e2 active server: -(4104) 
 .. 
 2017-07-13 18:30:33.435350 7f167ee2c700 10 mon.g@2(peon) e1 ms_handle_reset 0x7f169327a900 172.21.15.24:0/981164730 
 2017-07-13 18:30:33.435362 7f167ee2c700 10 mon.g@2(peon) e1 reset/close on session client.4104 172.21.15.24:0/981164730 
 2017-07-13 18:30:33.435370 7f167ee2c700 10 mon.g@2(peon) e1 remove_session 0x7f16925d8480 client.4104 172.21.15.24:0/981164730 features 0xffddff8eea4fffb 

 2017-07-13 18:30:33.369089 7f167ee2c700 10 mon.g@2(peon).paxosservice(mgr 1..4)    discarding message from disconnected client client.4104 172.21.15.24:0/981164730 mgrbeacon mgr.x(a33ec6ea-dfe0-400c-93d5-8d56f813c9c0,4104, 172.21.15.24:6800/195140, 1) v3 

 2017-07-13 18:34:01.942682 7f1681631700    4 mon.g@2(leader).mgr e4 Dropping active0 
 2017-07-13 18:34:01.942685 7f1681631700    4 mon.g@2(leader).mgr e4 Active is laggy but have no standbys to replace it 
 2017-07-13 18:34:01.942687 7f1681631700 10 mon.g@2(leader).mgr e4    exceeded mon_mgr_mkfs_grace 60 seconds 
 2017-07-13 18:34:01.942689 7f1681631700 10 mon.g@2(leader).paxosservice(mgr 1..4) propose_pending 

 2017-07-13 18:34:01.942757 7f1681631700    0 log_channel(cluster) log [WRN] : Health check failed: no active mgr (MGR_DOWN) 


 2017-07-13 18:34:22.040297 7f167ee2c700    1 -- 172.21.15.24:6790/0 <== mon.7 172.21.15.59:6792/0 304 ==== forward(mgrbeacon mgr.x(a33ec6ea-dfe0-400c-93d5-8d56f813c9c0,94138, -, 0) v3 caps allow * tid 33 con_features 1152323339925389307) v3 ==== 307+0+0 (1330286810 0 0) 0x7f1692dc8000 con 0x7f16921b2f00 
 </pre> 

 mon.f 
 <pre> 
 ====== mon.f 
 2017-07-13 18:30:35.062253 7f39cf22e700    1 -- 172.21.15.24:6789/0 <== mon.1 172.21.15.59:6789/0 711233028 ==== forward(mgrbeacon mgr.x(a33ec6ea-dfe0-400c-93d5-8d56f813c9c0,4104, 172.21.15.24:6800/195140, 1) v3 caps allow * tid 23 con_features 1152323339925389307) v3 ==== 295+0+0 (3561408359 0 0) 0x7f39e375ea80 con 0x7f39e34e5000 
 .. 
 2017-07-13 18:30:35.062315 7f39cf22e700    4 mon.f@0(leader).mgr e4 beacon from 4104 
 .. 
 2017-07-13 18:33:11.095775 7f39cf22e700 10 mon.f@0(leader).paxosservice(mgr 1..4) dispatch 0x7f39e3dc3e00 mgrbeacon mgr.x(a33ec6ea-dfe0-400c-93d5-8d56f813c9c0,4104, 172.21.15.24:6800/195140, 1) v3 from client.4104 172.21.15.24:0/981164730 con 0x7f39e3711000 
 .. 
 2017-07-13 18:33:11.542530 7f39cb226700 10 _calc_signature seq 1973526481 front_crc_ = 2008202662 middle_crc = 0 data_crc = 0 sig = 412644305609923577 
 ------ restarted. 
 2017-07-13 18:35:03.596725 7ff828b2be40    0 ceph version 12.1.0-956-g5f5ec76 (5f5ec7631ad0dfc4e378730e75572c0a1a065661) luminous (rc), process (unknown), pid 197457 
 </pre> 

 i think the mgr somehow failed to connect to a monitor after being disconnected. that's why the mon believed that it's laggy and dropped it. 

 <pre> 
         ms inject delay max: 1 
         ms inject delay probability: 0.005 
         ms inject delay type: mon 
         ms inject internal delays: 0.002 
         ms inject socket failures: 2500 
 </pre> 

 seems that the ms settings delayed the transmission of messages in a destructive way.. 

 /a/kchai-2017-07-13_18:13:10-rados-wip-kefu-testing-distro-basic-smithi/1396188

Back