Project

General

Profile

Actions

Subtask #2616

closed

Feature #2611: mon: Single-Paxos

mon: Single-Paxos: AuthMonitor: key_server has no entries

Added by Joao Eduardo Luis almost 12 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

The Monitor's key_server has no entries, even though we made sure to populate mon.X/keyring with every single service key in existence.

Debugging is in progress. Will update this as we get further infos.

Actions #1

Updated by Joao Eduardo Luis almost 12 years ago

The problem appears to affect all mon clients, and it may be the reason why our OSDs do not work as well.

Log snippet, regarding an MDS being brought up to life:


2012-06-20 08:34:21.394367 7fffeeffd700 20 mon.c@2(peon) e1 ms_dispatch existing session MonSession: mds.? 127.0.0.1:6800/17231 is open for mds.? 127.0.0.1:6800/17231
2012-06-20 08:34:21.394374 7fffeeffd700 20 mon.c@2(peon) e1  caps 
2012-06-20 08:34:21.394377 7fffeeffd700 10 mon.c@2(peon).paxosservice(auth) dispatch auth(proto 2 32 bytes epoch 0) v1 from mds.? 127.0.0.1:6800/17231
2012-06-20 08:34:21.394413 7fffeeffd700 10 mon.c@2(peon).auth v2 update_from_paxos
2012-06-20 08:34:21.394434 7fffeeffd700 10 mon.c@2(peon).auth v2 preprocess_query auth(proto 2 32 bytes epoch 0) v1 from mds.? 127.0.0.1:6800/17231
2012-06-20 08:34:21.394449 7fffeeffd700 10 mon.c@2(peon).auth v2 prep_auth() blob_size=32
2012-06-20 08:34:21.394462 7fffeeffd700 10 cephx server mds.a: handle_request get_auth_session_key for mds.a
2012-06-20 08:34:21.394465 7fffeeffd700  0 cephx server mds.a: couldn't find entity name: mds.a
2012-06-20 08:34:21.394468 7fffeeffd700  1 -- 127.0.0.1:6791/0 --> 127.0.0.1:6800/17231 -- auth_reply(proto 2 -1 Operation not permitted) v1 -- ?+0 0x7fffd8086020 con 0x7fffe00023b0
Actions #2

Updated by Joao Eduardo Luis almost 12 years ago

We were encoding an empty "full version" of the key server during AuthMonitor::encode_pending(), along side with the incrementals we actually need.

This leads the AuthMonitor to read the full version on AuthMonitor::update_paxos() and to ignore the incrementals. This has been fixed, and we are now testing.

Actions #3

Updated by Joao Eduardo Luis almost 12 years ago

Although this appears to be fixed, we still are unable to authenticate clients.

My current suspicion is that we are spending way too much time being inactive on the services, mainly because we are waiting for our proposals to be finished, and that leads to the auth requests on the AuthMonitor to expire (?) somehow.

This is just the theory du jour, taking into consideration that the logs state that the auth request is queued, and only some time after are they eventually dealt with.

Debugging this is pending fixing some weird state changes on the Paxos and proposal queueing. More info as we get them.

Actions #4

Updated by Joao Eduardo Luis almost 12 years ago

Appears to be fixed.

The ceph tool is able to connect to the cluster and obtain status information.

However, the MDSs are not. May be related to this issue, or may be a completely different issue; it is yet to be determined.

Actions #5

Updated by Joao Eduardo Luis almost 12 years ago

  • Status changed from In Progress to Resolved
Actions #6

Updated by Joao Eduardo Luis almost 12 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF