Project

General

Profile

Actions

Bug #40053

closed

ceph-mgr 13.2.4 fails to start with "auth_reply(proto 2 -22 (22) Invalid argument)"

Added by Randall Smith almost 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have luminous cluster that I am attempting to upgrade to mimic. Prior to the upgrade, the mgrs were connecting just fine. Following the upgrade guide I upgrade the monitors to mimic first. That upgrade went fine. Next I attempted to upgrade the mgrs. Unfortunately, the upgraded mgrs fail to connect.

I started with 13.2.2 and have attempted versions up to 13.2.4. There has been no change in behavior between versions.

I've attached the output of /usr/bin/ceph-mgr --cluster ceph --id 8 -d --debug_ms 20 and what I believe is the relevant portion of the monitor log (collected with debug_ms = 10/10).

I have been working this problem at the ceph-users list (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032099.html) but have been unable to make progress.

The log messages have been posted as part of issue #37835.

This is explicitly a duplicate of #37835. I'd really like to get this fixed but the previous issue seems to have gotten lost in the noise.


Files

osd-73.log (55.7 KB) osd-73.log Randall Smith, 05/31/2019 09:05 PM
Actions #1

Updated by Randall Smith almost 5 years ago

This was only causing problems for my mgrs. It has not prevented an OSD from connecting after migration from filestore to bluestore.

I've attached the log from attempting to start the OSD like so:

/usr/bin/ceph-osd --cluster ceph --id 73 --setuser ceph --setgroup ceph -d --debug_ms 20
Actions #2

Updated by Randall Smith almost 5 years ago

Please disregard the OSD issue. It looks like osd keyring file didn't match what was reported by `ceph auth get osd.73`.

I have verified that that is not the case for the ceph-mgr service.

Actions #3

Updated by Randall Smith almost 5 years ago

I finally found the problem and the fix. I have a keyring set in the [global] section of ceph.conf. ceph-mgr was trying to use that instead of the default in /var/lib/ceph/mgr/$cluster-$id/keyring. (I think this behavior changed with mimic but I didn't trace it down in the code.)

The fix was to set the keyring path in a [mgr] section in ceph.conf. Once that was done, the mgr started and authenticated just fine.

This can be closed.

Actions #4

Updated by Greg Farnum almost 5 years ago

  • Project changed from Ceph to mgr
  • Status changed from New to Closed
Actions

Also available in: Atom PDF