Project

General

Profile

Actions

Bug #37835

closed

ceph-mgr 13.2.4 fails to start with "auth_reply(proto 2 -22 (22) Invalid argument)"

Added by Randall Smith over 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have luminous cluster that I am attempting to upgrade to mimic. Prior to the upgrade, the mgrs were connecting just fine. Following the upgrade guide I upgrade the monitors to mimic first. That upgrade went fine. Next I attempted to upgrade the mgrs. Unfortunately, the upgraded mgrs fail to connect.

I started with 13.2.2 and have attempted versions up to 13.2.4. There has been no change in behavior between versions.

I've attached the output of /usr/bin/ceph-mgr --cluster ceph --id 8 -d --debug_ms 20 and what I believe is the relevant portion of the monitor log (collected with debug_ms = 10/10).

I have been working this problem at the ceph-users list (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032099.html) but have been unable to make progress.


Files

ceph-mgr.8.output (54.7 KB) ceph-mgr.8.output Randall Smith, 01/08/2019 04:32 PM
ceph-mon.7.log (4.83 KB) ceph-mon.7.log Randall Smith, 01/08/2019 05:08 PM
Actions #1

Updated by Randall Smith about 5 years ago

I'm still having this issue. Is there anything else that is needed to help troubleshoot this?

Actions #2

Updated by Sebastian Wagner almost 5 years ago

Not sure if ceph-mgr is really the best project to track this issue. Is this a core problem just affecting by luck the mgr?

Actions #3

Updated by Randall Smith almost 5 years ago

mgr is the only service that I am having problems with. Every other service is working. That doesn't preclude a core issue, of course.

Actions #4

Updated by Randall Smith almost 5 years ago

I finally found the problem and the fix. I have a keyring set in the [global] section of ceph.conf. ceph-mgr was trying to use that instead of the default in /var/lib/ceph/mgr/$cluster-$id/keyring. (I think this behavior changed with mimic but I didn't trace it down in the code.)

The fix was to set the keyring path in a [mgr] section in ceph.conf. Once that was done, the mgr started and authenticated just fine.

This can be closed.

Actions #5

Updated by Sebastian Wagner over 4 years ago

  • Status changed from New to Closed

closed as requested

Actions

Also available in: Atom PDF