The MDS went into reconnect at 4:59:50
2014-09-17 04:59:50.571702 7f80658ad700 1 mds.0.107 handle_mds_map state change up:replay --> up:reconnect
2014-09-17 04:59:50.571705 7f80658ad700 1 mds.0.107 reconnect_start
2014-09-17 04:59:50.571706 7f80658ad700 1 mds.0.107 reopen_log
2014-09-17 04:59:50.571711 7f80658ad700 10 mds.0.cache rollback_uncommitted_fragments: 0 pending
2014-09-17 04:59:50.572274 7f80658ad700 1 mds.0.server reconnect_clients -- 4 sessions
2014-09-17 04:59:50.572289 7f80658ad700 10 mds.0.sessionmap dump
2014-09-17 04:59:50.572291 7f80658ad700 10 mds.0.sessionmap client.1633185 0x7f8048004710 state open completed {122544=0} prealloc_inos [10002c55db7~226,10002c55fe7~2,10002c5601e~1c,10002c5612b~1,10002c58260~2,10002c61672~3,10002c61676~4,10002c6167b~4,10002c61680~6,10002c61687~f,10003069331~2,1000306933b~21,1000306a5
70~2,1000306a58a~2d,1000306a5bb~10,1000306a5cc~1,1000306b6df~36,1000306b716~6,1000306b71d~a,1000306b7a3~2,1000306b7bb~a,100030e9f84~3,100030e9f8f~5,100030eeb98~a,100030efd1e~6,100030efe07~c,100030efe57~1f,100030efe77~2,100030efe7f~1f,100030efe9f~1,100030efea7~1,100030efed2~4,100030efed7~6,100030f03a3~3,100030f03ab~2,
100030f03b1~2,100030f03b4~4,100030f03bf~2,100030f0767~9,100030f07ba~1,100030f07c3~1,100030f0845~3,100030f0849~2,100030f0950~5,100030f0db8~5,100030f0f2e~3,100030f0f3c~2,100030f0f58~1,100030f0f5a~4,100030f1095~2,100030f109b~2] used_ions []
2014-09-17 04:59:50.572331 7f80658ad700 10 mds.0.sessionmap client.1756771 0x7f80480074b0 state open completed {20420926=0,20420927=0,20420928=0,20420929=0,20420930=0,20420931=0,20420932=0,20420933=0} prealloc_inos [10002d05cec~32e,10002d6a6ce~2,10002d6a6d1~a,10002dddd1b~d,10002ec1ea8~c,10002f34df6~1,10002f34e17~1,10
002f34e27~1,10002f34e2a~1,10002f34e3a~1,10002f34e40~1,10002f34e43~1,10002f34e46~1,10002f34e49~1,10002f34e4c~1,10002f34e4f~1,10002f34e52~1,10002fa9a4f~c,1000301cade~d,1000307ec9b~1,1000307ecb2~3,1000307ecb6~4,1000307ecbb~4,10003148c55~1,10003148c5a~1,10003148c5c~1,10003148c62~1,10003148c81~9] used_ions []
2014-09-17 04:59:50.572354 7f80658ad700 10 mds.0.sessionmap client.2651662 0x7f8048008020 state open completed {1915007=0,1915009=0,1915011=0,1915013=0,1915015=0,1915017=0,1915019=0,1915021=0,1915023=0,1915025=0,1915027=0,1915029=0,1915031=0,1915033=0,1915035=0,1915037=0,1915039=0,1915041=0,1915043=0,1915045=0,191504
7=0,1915049=0,1915051=0,1915053=0,1915055=0,1915057=0,1915059=0,1915061=0,1915063=0,1915065=0,1915067=0,1915069=0,1915071=0,1915073=0,1915075=0,1915077=0,1915079=0,1915081=0,1915083=0,1915085=0,1915087=0,1915089=0,1915091=0,1915093=0,1915095=0,1915097=0,1915099=0,1915101=0,1915103=0,1915105=0,1915107=0,1915109=0,1915
111=0,1915113=0,1915115=0,1915117=0,1915119=0,1915121=0,1915123=0,1915125=0,1915127=0,1915129=0,1915131=0,1915133=0,1915135=0,1915137=0,1915139=0,1915141=0,1915143=0,1915145=0,1915147=0,1915149=0,1915151=0,1915153=0,1915155=0,1915157=0,1915159=0,1915161=0,1915163=0,1915165=0,1915167=0,1915169=0,1915171=0,1915173=0,19
15175=0,1915177=0,1915179=0,1915181=0,1915183=0,1915185=0,1915187=0,1915189=0,1915191=0,1915193=0,1915195=0,1915197=0,1915199=0,1915201=0,1915203=0,1915205=0,1915207=0,1915209=0,1915211=0,1915213=0,1915215=0,1915217=0} prealloc_inos [10003453e6a~3dd] used_ions []
2014-09-17 04:59:50.572408 7f80658ad700 10 mds.0.sessionmap client.1756816 0x7f80480084d0 state open completed {} prealloc_inos [] used_ions []
It got reconnect messages from two clients (backup01, then gitbuilder-archive), and appears to have processed them correctly
2014-09-17 04:59:50.576813 7f80658ad700 1 -- 10.214.134.10:6801/29320 <== client.1756816 10.214.137.27:0/2508210603 1 ==== client_reconnect(1 caps) v2 ==== 0+0+84 (0 0 41256884) 0x7f7fb00016e0 con 0x7f7fa80011f0
...
2014-09-17 04:59:50.630188 7f80658ad700 1 -- 10.214.134.10:6801/29320 <== client.1756771 10.214.137.25:0/1841820156 1 ==== client_reconnect(2081 caps) v2 ==== 0+0+321792 (0 0 776300702) 0x7f7fb8006a60 con 0x7f7fa8001760
But notice that the backup01 machine only claimed one cap -- perhaps it really wasn't using anything.
Also, flab (I'm assuming; that's what has the IP now) sent a bad message in between the two successful ones:
2014-09-17 04:59:50.602241 7f804d3e1700 -1 failed to decode message of type 24 v2: buffer::end_of_buffer
2014-09-17 04:59:50.602283 7f804d3e1700 0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=37 :6801 s=2 pgs=18231 cs=1 l=0 c=0x7f7fa8000c10).fault with nothing to send, going to standby
And that IP showed up again at least once. Similarly busted is teuthology:
2014-09-17 04:59:50.900718 7f804cbdd700 0 -- 10.214.134.10:6801/29320 >> 10.214.137.23:0/2533113401 pipe(0x7f7fa8001a40 sd=40 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8001cd0).accept peer addr is really 10.214.137.23:0/2533113401 (socket is 10.214.137.23:56050/0)
2014-09-17 04:59:50.901068 7f804cbdd700 10 mds.0.107 existing session 0x7f8048008020 for client.2651662 10.214.137.23:0/2533113401 existing con 0, new/authorizing con 0x7f7fa8001cd0
2014-09-17 04:59:50.901119 7f80658ad700 10 mds.0.107 ms_handle_accept 10.214.137.23:0/2533113401 con 0x7f7fa8001cd0 session 0x7f8048008020
2014-09-17 04:59:50.901134 7f80658ad700 10 mds.0.107 session connection 0 -> 0x7f7fa8001cd0
2014-09-17 04:59:50.901375 7f804cbdd700 -1 failed to decode message of type 24 v2: buffer::end_of_buffer
The teuthology machine then sent bunches of cap messages, but never a reconnect.