Project

General

Profile

Actions

Bug #9859

closed

Commit 2ac2a96 appears to break OSD creation

Added by Mark Nelson over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Immediate
Category:
Monitor
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Narrowed this down through Joao's comments and bisecting to hit this commit. Not sure if this only happens under specific circumstances (Perhaps due to the way CBT is creating OSDs?) It appears to be very reproducible on all branches that include this commit up to and including the latest giant and master builds as of 10/21/2014. anything prior to this commit (including the v0.86 release) appears to work perfectly.

Joao's comments after examining the mon logs:

18:17 < joao> ah, the osd is seen initially by the monitor as a 'client.'
18:17 < joao> no wonder adding caps to 'osd.X' doesn't help
18:17 < joao> now I wonder why this happens
18:23 < joao> nhm, the only feasible explanation is that we used to serve maps
to everyone regardless of caps
18:23 < joao> which stopped being so after my patch
18:24 < nhm> hrm, are the caps wrong?
18:26 < joao> no, and what's weird is that the patch has been merged for a
while and this is the first time I'm hearing about this

I believe Joao is referring to c0e3bc9a above, though it does not (at least in isolation) appear to be the culprit.

Actions #1

Updated by Mark Nelson over 9 years ago

Specifically, this is with osd creation where the monmap isn't specified (similar to how vstart does it, but not ceph-disk).

ie:

ceph-osd -c /tmp/cbt/ceph/ceph.conf -i 0 --mkfs --mkkey --osd-uuid 7ef32c0f-75e0-4f25-bb02-2fb8c8dedf19
Actions #2

Updated by Sage Weil over 9 years ago

  • Priority changed from Urgent to Immediate
Actions #3

Updated by Joao Eduardo Luis over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Joao Eduardo Luis
Actions #4

Updated by Joao Eduardo Luis over 9 years ago

Yesterday I figured as far as the monitor not handling 'MMonGetMap' messages from the OSD during mkfs because the OSD doesn't have enough caps.

The OSD attempts to get the map during mkfs using MonClient::get_monmap_privately(), and which uses an entity of type client with id '-1'. The monitor is unable to find caps for such client and refuses to handle its messages because there's no read cap for the client.

The real kicker is that, given this is a no-cephx cluster, the admin client also doesn't have any auth key or caps to go with it and yet it manages to operate the cluster.

Actions #5

Updated by Joao Eduardo Luis over 9 years ago

also, 2ac2a96 is the merge commit for the branch of c0e3bc9a

Actions #6

Updated by Joao Eduardo Luis over 9 years ago

  • Category set to Monitor

Problem has been identified.

This went unnoticed as vstart.sh, even with cephx disabled, always creates a keyring, which apparently masks the issue. I will be modifying vstart.sh so as to skip keyring generation when cephx is disabled.

Fix will be out shortly.

Actions #7

Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF