Subtask #2633
Feature #2611: mon: Single-Paxos
mon: Single-Paxos: ceph tool unable to connect to monitor
0%
Description
This is what usually happens on the monitor side. Every now and then, the ceph tool is able to connect, but we haven't been able to pinpoint why.
2012-06-21 10:04:05.835524 7fffeeffd700 1 -- 127.0.0.1:6791/0 <== client.? 127.0.0.1:0/11486 1 ==== auth(proto 0 30 bytes epoch 0) v1 ==== 60+0+0 (673663173 0 0) 0x7fffd0002f00 con 0x7fffe0001c20 2012-06-21 10:04:05.835554 7fffeeffd700 10 mon.c@2(peon) e1 do not have session, making new one 2012-06-21 10:04:05.835562 7fffeeffd700 10 mon.c@2(peon) e1 ms_dispatch new session MonSession: client.? 127.0.0.1:0/11486 is open for client.? 127.0.0.1:0/11486 2012-06-21 10:04:05.835573 7fffeeffd700 10 mon.c@2(peon) e1 setting timeout on session 2012-06-21 10:04:05.835576 7fffeeffd700 10 mon.c@2(peon).paxosservice(auth) dispatch auth(proto 0 30 bytes epoch 0) v1 from client.? 127.0.0.1:0/11486 2012-06-21 10:04:05.835617 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos 2012-06-21 10:04:05.835641 7fffeeffd700 10 mon.c@2(peon).auth v2 preprocess_query auth(proto 0 30 bytes epoch 0) v1 from client.? 127.0.0.1:0/11486 2012-06-21 10:04:05.835658 7fffeeffd700 10 mon.c@2(peon).auth v2 prep_auth() blob_size=30 2012-06-21 10:04:05.835678 7fffeeffd700 10 mon.c@2(peon).auth v2 AuthMonitor::assign_global_id m=auth(proto 0 30 bytes epoch 0) v1 mon=2/3 last_allocated=4096 max_global_id=4096 2012-06-21 10:04:05.835693 7fffeeffd700 10 mon.c@2(peon).auth v2 not the leader, requesting more ids from leader 2012-06-21 10:04:05.835710 7fffeeffd700 1 -- 127.0.0.1:6791/0 --> mon.0 127.0.0.1:6789/0 -- global_id (4096) v1 -- ?+0 0x7fffd802db50 2012-06-21 10:04:07.720672 7fffeeffd700 1 -- 127.0.0.1:6791/0 <== mon.0 127.0.0.1:6789/0 46 ==== paxos(lease lc 10 fc 1 pn 0 opn 0) v2 ==== 80+0+0 (1813914843 0 0) 0x7fffe4001b30 con 0x7fffe0000b30 2012-06-21 10:04:07.720703 7fffeeffd700 10 mon.c@2(peon).paxos(paxos active c 1..10) handle_lease on 10 now 2012-06-21 10:04:12.720304 2012-06-21 10:04:07.720713 7fffeeffd700 1 -- 127.0.0.1:6791/0 --> mon.0 127.0.0.1:6789/0 -- paxos(lease_ack lc 10 fc 1 pn 0 opn 0) v2 -- ?+0 0x7fffd802dd70 2012-06-21 10:04:07.720740 7fffeeffd700 10 mon.c@2(peon).paxos(paxos active c 1..10) trim_to 1 (was 1), latest_stashed 7885630734175789420 2012-06-21 10:04:07.720745 7fffeeffd700 1 mon.c@2(peon).paxos(paxos active c 1..10) is_readable now=2012-06-21 10:04:07.720745 lease_expire=2012-06-21 10:04:12.720304 has v0 lc 10 2012-06-21 10:04:07.720825 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos 2012-06-21 10:04:08.644195 7fffeeffd700 10 mon.c@2(peon) e1 ms_handle_reset 0x7fffe0001c20 127.0.0.1:0/11486 2012-06-21 10:04:08.645232 7fffeeffd700 10 mon.c@2(peon) e1 reset/close on session client.? 127.0.0.1:0/11486 2012-06-21 10:04:08.645249 7fffeeffd700 10 mon.c@2(peon) e1 remove_session MonSession: client.? 127.0.0.1:0/11486 is open client.? 127.0.0.1:0/11486
This is what is seen on the client:
2012-06-21 10:03:50.831570 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect read peer addr 127.0.0.1:6791/0 on socket 4 2012-06-21 10:03:50.831588 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect peer addr for me is 127.0.0.1:45583/0 2012-06-21 10:03:50.831643 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect sent my addr 127.0.0.1:0/11486 2012-06-21 10:03:50.831654 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect sending gseq=24 cseq=0 proto=15 2012-06-21 10:03:50.831684 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect wrote (self +) cseq, waiting for reply 2012-06-21 10:03:50.831827 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect got reply tag 1 connect_seq 1 global_seq 31 proto 15 flags 1 2012-06-21 10:03:50.831854 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).connect success 1, lossy = 1, features 65535 2012-06-21 10:03:50.831871 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> :/0 pipe(0x19817c0 sd=-1 pgs=0 cs=0 l=0).queue_received queuing pipe 2012-06-21 10:03:50.831891 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).connect starting reader 2012-06-21 10:03:50.831899 7f8dfa073700 20 -- 127.0.0.1:0/11486 dispatch_entry pipe 0x19817c0 dequeued 0 2012-06-21 10:03:50.831940 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer: state = 2 policy.server=0 2012-06-21 10:03:50.831968 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer encoding 1 0x7f8de00088c0 auth(proto 0 30 bytes epoch 0) v1 2012-06-21 10:03:50.832000 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer sending 1 0x7f8de00088c0 2012-06-21 10:03:50.832014 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).write_message 0x7f8de00088c0 2012-06-21 10:03:50.832066 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer: state = 2 policy.server=0 2012-06-21 10:03:50.832085 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer sleeping 2012-06-21 10:03:50.832109 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader reading tag... 2012-06-21 10:03:51.050347 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader got ACK 2012-06-21 10:03:51.050384 7f8dfd9c1700 15 reader got ack seq 1 2012-06-21 10:03:51.050387 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader reading tag... 2012-06-21 10:03:53.829890 7f8df9872700 1 -- 127.0.0.1:0/11486 mark_down 0x7f8de0008590 -- 0x7f8de0008320 2012-06-21 10:03:53.829911 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).unregister_pipe 2012-06-21 10:03:53.829924 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).stop 2012-06-21 10:03:53.829971 7f8df9872700 10 -- 127.0.0.1:0/11486 connect_rank to 127.0.0.1:6790/0, creating pipe and registering 2012-06-21 10:03:53.830010 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).register_pipe 2012-06-21 10:03:53.830021 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).writer: state = 1 policy.server=0 2012-06-21 10:03:53.830032 7f8df9872700 1 -- 127.0.0.1:0/11486 --> 127.0.0.1:6790/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8de0007330 con 0x7f8de0007000 2012-06-21 10:03:53.830038 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).connect 0 2012-06-21 10:03:53.830028 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer finishing 2012-06-21 10:03:53.830062 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer done 2012-06-21 10:03:53.830067 7f8df9872700 20 -- 127.0.0.1:0/11486 submit_message auth(proto 0 30 bytes epoch 0) v1 remote 127.0.0.1:6790/0 2012-06-21 10:03:53.830076 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connecting to 127.0.0.1:6790/0 2012-06-21 10:03:53.830071 7f8dfd9c1700 2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader couldn't read tag, Success 2012-06-21 10:03:53.830097 7f8dfd9c1700 2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).fault 0: Success 2012-06-21 10:03:53.830144 7f8df3efe700 2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect error 127.0.0.1:6790/0, 111: Connection refused 2012-06-21 10:03:53.830180 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).writer: state = 1 policy.server=0 2012-06-21 10:03:53.830188 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect 0 2012-06-21 10:03:53.830219 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connecting to 127.0.0.1:6790/0 2012-06-21 10:03:53.830109 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).fault already closed|closing 2012-06-21 10:03:53.830322 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 queue_reap 0x7f8de0008320 2012-06-21 10:03:53.830294 7f8df3efe700 2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect error 127.0.0.1:6790/0, 111: Connection refused 2012-06-21 10:03:53.830341 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).fault waiting 0.200000 2012-06-21 10:03:53.830357 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper 2012-06-21 10:03:53.830366 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper reaping pipe 0x7f8de0008320 127.0.0.1:6791/0 2012-06-21 10:03:53.830375 7f8dfa874700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).discard_queue 2012-06-21 10:03:53.830387 7f8dfa874700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1). dequeued pipe 2012-06-21 10:03:53.830399 7f8dfa874700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).unregister_pipe - not registered 2012-06-21 10:03:53.830418 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader done 2012-06-21 10:03:53.830476 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper reaped pipe 0x7f8de0008320 127.0.0.1:6791/0 2012-06-21 10:03:53.830490 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper deleted pipe 0x7f8de0008320 2012-06-21 10:03:53.830495 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper done
History
#1 Updated by Joao Eduardo Luis over 11 years ago
- Status changed from New to Closed
It appears to be fixed.
The ceph tool is able to obtain the status from the monitors.
The 'watch' command doesn't seem to be able to receive (or maybe it's the monitor that does not send) the subscription updates. Maybe there are none. Also, the ceph tool seems unable to perform commands on the monitor. But those, being confirmed they are actually happening, should be a different bug.
I'm closing this one for now, until evidence of it happening again pops up.
#2 Updated by Greg Farnum over 11 years ago
Has something changed in the last five hours that you think fixed this?