Feature #2611: mon: Single-Paxos
mon: Single-Paxos: AuthMonitor: key_server has no entries
The Monitor's key_server has no entries, even though we made sure to populate mon.X/keyring with every single service key in existence.
Debugging is in progress. Will update this as we get further infos.
#1 Updated by Joao Eduardo Luis over 11 years ago
The problem appears to affect all mon clients, and it may be the reason why our OSDs do not work as well.
Log snippet, regarding an MDS being brought up to life:
2012-06-20 08:34:21.394367 7fffeeffd700 20 mon.c@2(peon) e1 ms_dispatch existing session MonSession: mds.? 127.0.0.1:6800/17231 is open for mds.? 127.0.0.1:6800/17231 2012-06-20 08:34:21.394374 7fffeeffd700 20 mon.c@2(peon) e1 caps 2012-06-20 08:34:21.394377 7fffeeffd700 10 mon.c@2(peon).paxosservice(auth) dispatch auth(proto 2 32 bytes epoch 0) v1 from mds.? 127.0.0.1:6800/17231 2012-06-20 08:34:21.394413 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos 2012-06-20 08:34:21.394434 7fffeeffd700 10 mon.c@2(peon).auth v2 preprocess_query auth(proto 2 32 bytes epoch 0) v1 from mds.? 127.0.0.1:6800/17231 2012-06-20 08:34:21.394449 7fffeeffd700 10 mon.c@2(peon).auth v2 prep_auth() blob_size=32 2012-06-20 08:34:21.394462 7fffeeffd700 10 cephx server mds.a: handle_request get_auth_session_key for mds.a 2012-06-20 08:34:21.394465 7fffeeffd700 0 cephx server mds.a: couldn't find entity name: mds.a 2012-06-20 08:34:21.394468 7fffeeffd700 1 -- 127.0.0.1:6791/0 --> 127.0.0.1:6800/17231 -- auth_reply(proto 2 -1 Operation not permitted) v1 -- ?+0 0x7fffd8086020 con 0x7fffe00023b0
#2 Updated by Joao Eduardo Luis over 11 years ago
We were encoding an empty "full version" of the key server during AuthMonitor::encode_pending(), along side with the incrementals we actually need.
This leads the AuthMonitor to read the full version on AuthMonitor::update_paxos() and to ignore the incrementals. This has been fixed, and we are now testing.
#3 Updated by Joao Eduardo Luis over 11 years ago
Although this appears to be fixed, we still are unable to authenticate clients.
My current suspicion is that we are spending way too much time being inactive on the services, mainly because we are waiting for our proposals to be finished, and that leads to the auth requests on the AuthMonitor to expire (?) somehow.
This is just the theory du jour, taking into consideration that the logs state that the auth request is queued, and only some time after are they eventually dealt with.
Debugging this is pending fixing some weird state changes on the Paxos and proposal queueing. More info as we get them.