Actions
Subtask #2633
closedFeature #2611: mon: Single-Paxos
mon: Single-Paxos: ceph tool unable to connect to monitor
% Done:
0%
Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:
Description
This is what usually happens on the monitor side. Every now and then, the ceph tool is able to connect, but we haven't been able to pinpoint why.
2012-06-21 10:04:05.835524 7fffeeffd700 1 -- 127.0.0.1:6791/0 <== client.? 127.0.0.1:0/11486 1 ==== auth(proto 0 30 bytes epoch 0) v1 ==== 60+0+0 (673663173 0 0) 0x7fffd0002f00 con 0x7fffe0001c20 2012-06-21 10:04:05.835554 7fffeeffd700 10 mon.c@2(peon) e1 do not have session, making new one 2012-06-21 10:04:05.835562 7fffeeffd700 10 mon.c@2(peon) e1 ms_dispatch new session MonSession: client.? 127.0.0.1:0/11486 is open for client.? 127.0.0.1:0/11486 2012-06-21 10:04:05.835573 7fffeeffd700 10 mon.c@2(peon) e1 setting timeout on session 2012-06-21 10:04:05.835576 7fffeeffd700 10 mon.c@2(peon).paxosservice(auth) dispatch auth(proto 0 30 bytes epoch 0) v1 from client.? 127.0.0.1:0/11486 2012-06-21 10:04:05.835617 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos 2012-06-21 10:04:05.835641 7fffeeffd700 10 mon.c@2(peon).auth v2 preprocess_query auth(proto 0 30 bytes epoch 0) v1 from client.? 127.0.0.1:0/11486 2012-06-21 10:04:05.835658 7fffeeffd700 10 mon.c@2(peon).auth v2 prep_auth() blob_size=30 2012-06-21 10:04:05.835678 7fffeeffd700 10 mon.c@2(peon).auth v2 AuthMonitor::assign_global_id m=auth(proto 0 30 bytes epoch 0) v1 mon=2/3 last_allocated=4096 max_global_id=4096 2012-06-21 10:04:05.835693 7fffeeffd700 10 mon.c@2(peon).auth v2 not the leader, requesting more ids from leader 2012-06-21 10:04:05.835710 7fffeeffd700 1 -- 127.0.0.1:6791/0 --> mon.0 127.0.0.1:6789/0 -- global_id (4096) v1 -- ?+0 0x7fffd802db50 2012-06-21 10:04:07.720672 7fffeeffd700 1 -- 127.0.0.1:6791/0 <== mon.0 127.0.0.1:6789/0 46 ==== paxos(lease lc 10 fc 1 pn 0 opn 0) v2 ==== 80+0+0 (1813914843 0 0) 0x7fffe4001b30 con 0x7fffe0000b30 2012-06-21 10:04:07.720703 7fffeeffd700 10 mon.c@2(peon).paxos(paxos active c 1..10) handle_lease on 10 now 2012-06-21 10:04:12.720304 2012-06-21 10:04:07.720713 7fffeeffd700 1 -- 127.0.0.1:6791/0 --> mon.0 127.0.0.1:6789/0 -- paxos(lease_ack lc 10 fc 1 pn 0 opn 0) v2 -- ?+0 0x7fffd802dd70 2012-06-21 10:04:07.720740 7fffeeffd700 10 mon.c@2(peon).paxos(paxos active c 1..10) trim_to 1 (was 1), latest_stashed 7885630734175789420 2012-06-21 10:04:07.720745 7fffeeffd700 1 mon.c@2(peon).paxos(paxos active c 1..10) is_readable now=2012-06-21 10:04:07.720745 lease_expire=2012-06-21 10:04:12.720304 has v0 lc 10 2012-06-21 10:04:07.720825 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos 2012-06-21 10:04:08.644195 7fffeeffd700 10 mon.c@2(peon) e1 ms_handle_reset 0x7fffe0001c20 127.0.0.1:0/11486 2012-06-21 10:04:08.645232 7fffeeffd700 10 mon.c@2(peon) e1 reset/close on session client.? 127.0.0.1:0/11486 2012-06-21 10:04:08.645249 7fffeeffd700 10 mon.c@2(peon) e1 remove_session MonSession: client.? 127.0.0.1:0/11486 is open client.? 127.0.0.1:0/11486
This is what is seen on the client:
2012-06-21 10:03:50.831570 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect read peer addr 127.0.0.1:6791/0 on socket 4 2012-06-21 10:03:50.831588 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect peer addr for me is 127.0.0.1:45583/0 2012-06-21 10:03:50.831643 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect sent my addr 127.0.0.1:0/11486 2012-06-21 10:03:50.831654 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect sending gseq=24 cseq=0 proto=15 2012-06-21 10:03:50.831684 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect wrote (self +) cseq, waiting for reply 2012-06-21 10:03:50.831827 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect got reply tag 1 connect_seq 1 global_seq 31 proto 15 flags 1 2012-06-21 10:03:50.831854 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).connect success 1, lossy = 1, features 65535 2012-06-21 10:03:50.831871 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> :/0 pipe(0x19817c0 sd=-1 pgs=0 cs=0 l=0).queue_received queuing pipe 2012-06-21 10:03:50.831891 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).connect starting reader 2012-06-21 10:03:50.831899 7f8dfa073700 20 -- 127.0.0.1:0/11486 dispatch_entry pipe 0x19817c0 dequeued 0 2012-06-21 10:03:50.831940 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer: state = 2 policy.server=0 2012-06-21 10:03:50.831968 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer encoding 1 0x7f8de00088c0 auth(proto 0 30 bytes epoch 0) v1 2012-06-21 10:03:50.832000 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer sending 1 0x7f8de00088c0 2012-06-21 10:03:50.832014 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).write_message 0x7f8de00088c0 2012-06-21 10:03:50.832066 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer: state = 2 policy.server=0 2012-06-21 10:03:50.832085 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer sleeping 2012-06-21 10:03:50.832109 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader reading tag... 2012-06-21 10:03:51.050347 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader got ACK 2012-06-21 10:03:51.050384 7f8dfd9c1700 15 reader got ack seq 1 2012-06-21 10:03:51.050387 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader reading tag... 2012-06-21 10:03:53.829890 7f8df9872700 1 -- 127.0.0.1:0/11486 mark_down 0x7f8de0008590 -- 0x7f8de0008320 2012-06-21 10:03:53.829911 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).unregister_pipe 2012-06-21 10:03:53.829924 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).stop 2012-06-21 10:03:53.829971 7f8df9872700 10 -- 127.0.0.1:0/11486 connect_rank to 127.0.0.1:6790/0, creating pipe and registering 2012-06-21 10:03:53.830010 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).register_pipe 2012-06-21 10:03:53.830021 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).writer: state = 1 policy.server=0 2012-06-21 10:03:53.830032 7f8df9872700 1 -- 127.0.0.1:0/11486 --> 127.0.0.1:6790/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8de0007330 con 0x7f8de0007000 2012-06-21 10:03:53.830038 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).connect 0 2012-06-21 10:03:53.830028 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer finishing 2012-06-21 10:03:53.830062 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer done 2012-06-21 10:03:53.830067 7f8df9872700 20 -- 127.0.0.1:0/11486 submit_message auth(proto 0 30 bytes epoch 0) v1 remote 127.0.0.1:6790/0 2012-06-21 10:03:53.830076 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connecting to 127.0.0.1:6790/0 2012-06-21 10:03:53.830071 7f8dfd9c1700 2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader couldn't read tag, Success 2012-06-21 10:03:53.830097 7f8dfd9c1700 2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).fault 0: Success 2012-06-21 10:03:53.830144 7f8df3efe700 2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect error 127.0.0.1:6790/0, 111: Connection refused 2012-06-21 10:03:53.830180 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).writer: state = 1 policy.server=0 2012-06-21 10:03:53.830188 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect 0 2012-06-21 10:03:53.830219 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connecting to 127.0.0.1:6790/0 2012-06-21 10:03:53.830109 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).fault already closed|closing 2012-06-21 10:03:53.830322 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 queue_reap 0x7f8de0008320 2012-06-21 10:03:53.830294 7f8df3efe700 2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect error 127.0.0.1:6790/0, 111: Connection refused 2012-06-21 10:03:53.830341 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).fault waiting 0.200000 2012-06-21 10:03:53.830357 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper 2012-06-21 10:03:53.830366 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper reaping pipe 0x7f8de0008320 127.0.0.1:6791/0 2012-06-21 10:03:53.830375 7f8dfa874700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).discard_queue 2012-06-21 10:03:53.830387 7f8dfa874700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1). dequeued pipe 2012-06-21 10:03:53.830399 7f8dfa874700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).unregister_pipe - not registered 2012-06-21 10:03:53.830418 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader done 2012-06-21 10:03:53.830476 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper reaped pipe 0x7f8de0008320 127.0.0.1:6791/0 2012-06-21 10:03:53.830490 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper deleted pipe 0x7f8de0008320 2012-06-21 10:03:53.830495 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper done
Actions