Subtask #2616
closed
Feature #2611: mon: Single-Paxos
mon: Single-Paxos: AuthMonitor: key_server has no entries
Added by Joao Eduardo Luis almost 12 years ago.
Updated about 11 years ago.
Description
The Monitor's key_server has no entries, even though we made sure to populate mon.X/keyring with every single service key in existence.
Debugging is in progress. Will update this as we get further infos.
The problem appears to affect all mon clients, and it may be the reason why our OSDs do not work as well.
Log snippet, regarding an MDS being brought up to life:
2012-06-20 08:34:21.394367 7fffeeffd700 20 mon.c@2(peon) e1 ms_dispatch existing session MonSession: mds.? 127.0.0.1:6800/17231 is open for mds.? 127.0.0.1:6800/17231
2012-06-20 08:34:21.394374 7fffeeffd700 20 mon.c@2(peon) e1 caps
2012-06-20 08:34:21.394377 7fffeeffd700 10 mon.c@2(peon).paxosservice(auth) dispatch auth(proto 2 32 bytes epoch 0) v1 from mds.? 127.0.0.1:6800/17231
2012-06-20 08:34:21.394413 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos
2012-06-20 08:34:21.394434 7fffeeffd700 10 mon.c@2(peon).auth v2 preprocess_query auth(proto 2 32 bytes epoch 0) v1 from mds.? 127.0.0.1:6800/17231
2012-06-20 08:34:21.394449 7fffeeffd700 10 mon.c@2(peon).auth v2 prep_auth() blob_size=32
2012-06-20 08:34:21.394462 7fffeeffd700 10 cephx server mds.a: handle_request get_auth_session_key for mds.a
2012-06-20 08:34:21.394465 7fffeeffd700 0 cephx server mds.a: couldn't find entity name: mds.a
2012-06-20 08:34:21.394468 7fffeeffd700 1 -- 127.0.0.1:6791/0 --> 127.0.0.1:6800/17231 -- auth_reply(proto 2 -1 Operation not permitted) v1 -- ?+0 0x7fffd8086020 con 0x7fffe00023b0
We were encoding an empty "full version" of the key server during AuthMonitor::encode_pending(), along side with the incrementals we actually need.
This leads the AuthMonitor to read the full version on AuthMonitor::update_paxos() and to ignore the incrementals. This has been fixed, and we are now testing.
Although this appears to be fixed, we still are unable to authenticate clients.
My current suspicion is that we are spending way too much time being inactive on the services, mainly because we are waiting for our proposals to be finished, and that leads to the auth requests on the AuthMonitor to expire (?) somehow.
This is just the theory du jour, taking into consideration that the logs state that the auth request is queued, and only some time after are they eventually dealt with.
Debugging this is pending fixing some weird state changes on the Paxos and proposal queueing. More info as we get them.
Appears to be fixed.
The ceph tool is able to connect to the cluster and obtain status information.
However, the MDSs are not. May be related to this issue, or may be a completely different issue; it is yet to be determined.
- Status changed from In Progress to Resolved
- Status changed from Resolved to Closed
Also available in: Atom
PDF