Project

General

Profile

Subtask #2633

Feature #2611: mon: Single-Paxos

mon: Single-Paxos: ceph tool unable to connect to monitor

Added by Joao Eduardo Luis almost 12 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

This is what usually happens on the monitor side. Every now and then, the ceph tool is able to connect, but we haven't been able to pinpoint why.

2012-06-21 10:04:05.835524 7fffeeffd700  1 -- 127.0.0.1:6791/0 <== client.? 127.0.0.1:0/11486 1 ==== auth(proto 0 30 bytes epoch 0) v1 ==== 60+0+0 (673663173 0 0) 0x7fffd0002f00 con 0x7fffe0001c20
2012-06-21 10:04:05.835554 7fffeeffd700 10 mon.c@2(peon) e1 do not have session, making new one
2012-06-21 10:04:05.835562 7fffeeffd700 10 mon.c@2(peon) e1 ms_dispatch new session MonSession: client.? 127.0.0.1:0/11486 is open for client.? 127.0.0.1:0/11486
2012-06-21 10:04:05.835573 7fffeeffd700 10 mon.c@2(peon) e1 setting timeout on session
2012-06-21 10:04:05.835576 7fffeeffd700 10 mon.c@2(peon).paxosservice(auth) dispatch auth(proto 0 30 bytes epoch 0) v1 from client.? 127.0.0.1:0/11486
2012-06-21 10:04:05.835617 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos
2012-06-21 10:04:05.835641 7fffeeffd700 10 mon.c@2(peon).auth v2 preprocess_query auth(proto 0 30 bytes epoch 0) v1 from client.? 127.0.0.1:0/11486
2012-06-21 10:04:05.835658 7fffeeffd700 10 mon.c@2(peon).auth v2 prep_auth() blob_size=30
2012-06-21 10:04:05.835678 7fffeeffd700 10 mon.c@2(peon).auth v2 AuthMonitor::assign_global_id m=auth(proto 0 30 bytes epoch 0) v1 mon=2/3 last_allocated=4096 max_global_id=4096
2012-06-21 10:04:05.835693 7fffeeffd700 10 mon.c@2(peon).auth v2 not the leader, requesting more ids from leader
2012-06-21 10:04:05.835710 7fffeeffd700  1 -- 127.0.0.1:6791/0 --> mon.0 127.0.0.1:6789/0 -- global_id  (4096) v1 -- ?+0 0x7fffd802db50
2012-06-21 10:04:07.720672 7fffeeffd700  1 -- 127.0.0.1:6791/0 <== mon.0 127.0.0.1:6789/0 46 ==== paxos(lease lc 10 fc 1 pn 0 opn 0) v2 ==== 80+0+0 (1813914843 0 0) 0x7fffe4001b30 con 0x7fffe0000b30
2012-06-21 10:04:07.720703 7fffeeffd700 10 mon.c@2(peon).paxos(paxos active c 1..10) handle_lease on 10 now 2012-06-21 10:04:12.720304
2012-06-21 10:04:07.720713 7fffeeffd700  1 -- 127.0.0.1:6791/0 --> mon.0 127.0.0.1:6789/0 -- paxos(lease_ack lc 10 fc 1 pn 0 opn 0) v2 -- ?+0 0x7fffd802dd70
2012-06-21 10:04:07.720740 7fffeeffd700 10 mon.c@2(peon).paxos(paxos active c 1..10) trim_to 1 (was 1), latest_stashed 7885630734175789420
2012-06-21 10:04:07.720745 7fffeeffd700  1 mon.c@2(peon).paxos(paxos active c 1..10) is_readable now=2012-06-21 10:04:07.720745 lease_expire=2012-06-21 10:04:12.720304 has v0 lc 10
2012-06-21 10:04:07.720825 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos
2012-06-21 10:04:08.644195 7fffeeffd700 10 mon.c@2(peon) e1 ms_handle_reset 0x7fffe0001c20 127.0.0.1:0/11486
2012-06-21 10:04:08.645232 7fffeeffd700 10 mon.c@2(peon) e1 reset/close on session client.? 127.0.0.1:0/11486
2012-06-21 10:04:08.645249 7fffeeffd700 10 mon.c@2(peon) e1 remove_session MonSession: client.? 127.0.0.1:0/11486 is open client.? 127.0.0.1:0/11486

This is what is seen on the client:

2012-06-21 10:03:50.831570 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect read peer addr 127.0.0.1:6791/0 on socket 4
2012-06-21 10:03:50.831588 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect peer addr for me is 127.0.0.1:45583/0
2012-06-21 10:03:50.831643 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect sent my addr 127.0.0.1:0/11486
2012-06-21 10:03:50.831654 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect sending gseq=24 cseq=0 proto=15
2012-06-21 10:03:50.831684 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect wrote (self +) cseq, waiting for reply
2012-06-21 10:03:50.831827 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=0 cs=0 l=0).connect got reply tag 1 connect_seq 1 global_seq 31 proto 15 flags 1
2012-06-21 10:03:50.831854 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).connect success 1, lossy = 1, features 65535
2012-06-21 10:03:50.831871 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> :/0 pipe(0x19817c0 sd=-1 pgs=0 cs=0 l=0).queue_received queuing pipe
2012-06-21 10:03:50.831891 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).connect starting reader
2012-06-21 10:03:50.831899 7f8dfa073700 20 -- 127.0.0.1:0/11486 dispatch_entry pipe 0x19817c0 dequeued 0
2012-06-21 10:03:50.831940 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer: state = 2 policy.server=0
2012-06-21 10:03:50.831968 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer encoding 1 0x7f8de00088c0 auth(proto 0 30 bytes epoch 0) v1
2012-06-21 10:03:50.832000 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer sending 1 0x7f8de00088c0
2012-06-21 10:03:50.832014 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).write_message 0x7f8de00088c0
2012-06-21 10:03:50.832066 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer: state = 2 policy.server=0
2012-06-21 10:03:50.832085 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer sleeping
2012-06-21 10:03:50.832109 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader reading tag...
2012-06-21 10:03:51.050347 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader got ACK
2012-06-21 10:03:51.050384 7f8dfd9c1700 15 reader got ack seq 1
2012-06-21 10:03:51.050387 7f8dfd9c1700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader reading tag...
2012-06-21 10:03:53.829890 7f8df9872700  1 -- 127.0.0.1:0/11486 mark_down 0x7f8de0008590 -- 0x7f8de0008320
2012-06-21 10:03:53.829911 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).unregister_pipe
2012-06-21 10:03:53.829924 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).stop
2012-06-21 10:03:53.829971 7f8df9872700 10 -- 127.0.0.1:0/11486 connect_rank to 127.0.0.1:6790/0, creating pipe and registering
2012-06-21 10:03:53.830010 7f8df9872700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).register_pipe
2012-06-21 10:03:53.830021 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).writer: state = 1 policy.server=0
2012-06-21 10:03:53.830032 7f8df9872700  1 -- 127.0.0.1:0/11486 --> 127.0.0.1:6790/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8de0007330 con 0x7f8de0007000
2012-06-21 10:03:53.830038 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=-1 pgs=0 cs=0 l=0).connect 0
2012-06-21 10:03:53.830028 7f8df3fff700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer finishing
2012-06-21 10:03:53.830062 7f8df3fff700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).writer done
2012-06-21 10:03:53.830067 7f8df9872700 20 -- 127.0.0.1:0/11486 submit_message auth(proto 0 30 bytes epoch 0) v1 remote 127.0.0.1:6790/0
2012-06-21 10:03:53.830076 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connecting to 127.0.0.1:6790/0
2012-06-21 10:03:53.830071 7f8dfd9c1700  2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader couldn't read tag, Success
2012-06-21 10:03:53.830097 7f8dfd9c1700  2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).fault 0: Success
2012-06-21 10:03:53.830144 7f8df3efe700  2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect error 127.0.0.1:6790/0, 111: Connection refused
2012-06-21 10:03:53.830180 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).writer: state = 1 policy.server=0
2012-06-21 10:03:53.830188 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect 0
2012-06-21 10:03:53.830219 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connecting to 127.0.0.1:6790/0
2012-06-21 10:03:53.830109 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).fault already closed|closing
2012-06-21 10:03:53.830322 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 queue_reap 0x7f8de0008320
2012-06-21 10:03:53.830294 7f8df3efe700  2 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).connect error 127.0.0.1:6790/0, 111: Connection refused
2012-06-21 10:03:53.830341 7f8df3efe700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6790/0 pipe(0x7f8de00068d0 sd=3 pgs=0 cs=0 l=0).fault waiting 0.200000
2012-06-21 10:03:53.830357 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper
2012-06-21 10:03:53.830366 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper reaping pipe 0x7f8de0008320 127.0.0.1:6791/0
2012-06-21 10:03:53.830375 7f8dfa874700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).discard_queue
2012-06-21 10:03:53.830387 7f8dfa874700 20 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1). dequeued pipe 
2012-06-21 10:03:53.830399 7f8dfa874700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).unregister_pipe - not registered
2012-06-21 10:03:53.830418 7f8dfd9c1700 10 -- 127.0.0.1:0/11486 >> 127.0.0.1:6791/0 pipe(0x7f8de0008320 sd=4 pgs=31 cs=1 l=1).reader done
2012-06-21 10:03:53.830476 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper reaped pipe 0x7f8de0008320 127.0.0.1:6791/0
2012-06-21 10:03:53.830490 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper deleted pipe 0x7f8de0008320
2012-06-21 10:03:53.830495 7f8dfa874700 10 -- 127.0.0.1:0/11486 reaper done

History

#1 Updated by Joao Eduardo Luis almost 12 years ago

  • Status changed from New to Closed

It appears to be fixed.

The ceph tool is able to obtain the status from the monitors.

The 'watch' command doesn't seem to be able to receive (or maybe it's the monitor that does not send) the subscription updates. Maybe there are none. Also, the ceph tool seems unable to perform commands on the monitor. But those, being confirmed they are actually happening, should be a different bug.

I'm closing this one for now, until evidence of it happening again pops up.

#2 Updated by Greg Farnum almost 12 years ago

Has something changed in the last five hours that you think fixed this?

Also available in: Atom PDF