Bug #9859
closed
Commit 2ac2a96 appears to break OSD creation
Added by Mark Nelson over 9 years ago.
Updated over 9 years ago.
Description
Narrowed this down through Joao's comments and bisecting to hit this commit. Not sure if this only happens under specific circumstances (Perhaps due to the way CBT is creating OSDs?) It appears to be very reproducible on all branches that include this commit up to and including the latest giant and master builds as of 10/21/2014. anything prior to this commit (including the v0.86 release) appears to work perfectly.
Joao's comments after examining the mon logs:
18:17 < joao> ah, the osd is seen initially by the monitor as a 'client.'
18:17 < joao> no wonder adding caps to 'osd.X' doesn't help
18:17 < joao> now I wonder why this happens
18:23 < joao> nhm, the only feasible explanation is that we used to serve maps
to everyone regardless of caps
18:23 < joao> which stopped being so after my patch
18:24 < nhm> hrm, are the caps wrong?
18:26 < joao> no, and what's weird is that the patch has been merged for a
while and this is the first time I'm hearing about this
I believe Joao is referring to c0e3bc9a above, though it does not (at least in isolation) appear to be the culprit.
Specifically, this is with osd creation where the monmap isn't specified (similar to how vstart does it, but not ceph-disk).
ie:
ceph-osd -c /tmp/cbt/ceph/ceph.conf -i 0 --mkfs --mkkey --osd-uuid 7ef32c0f-75e0-4f25-bb02-2fb8c8dedf19
- Priority changed from Urgent to Immediate
- Status changed from New to In Progress
- Assignee set to Joao Eduardo Luis
Yesterday I figured as far as the monitor not handling 'MMonGetMap' messages from the OSD during mkfs because the OSD doesn't have enough caps.
The OSD attempts to get the map during mkfs using MonClient::get_monmap_privately(), and which uses an entity of type client with id '-1'. The monitor is unable to find caps for such client and refuses to handle its messages because there's no read cap for the client.
The real kicker is that, given this is a no-cephx cluster, the admin client also doesn't have any auth key or caps to go with it and yet it manages to operate the cluster.
also, 2ac2a96 is the merge commit for the branch of c0e3bc9a
Problem has been identified.
This went unnoticed as vstart.sh, even with cephx disabled, always creates a keyring, which apparently masks the issue. I will be modifying vstart.sh so as to skip keyring generation when cephx is disabled.
Fix will be out shortly.
- Status changed from In Progress to Resolved
Also available in: Atom
PDF