Project

General

Profile

Actions

Bug #20950

closed

key mismatch for mgr after upgrade from jewel to luminous(dev)

Added by Vasu Kulkarni over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1. Setup cluster using ceph-deploy using jewel release
2. ceph-deploy upgrade to luminous ( dev=luminous to pickup latest dev and this should have the fix for auth create for mgr - https://github.com/ceph/ceph/pull/16395
mentioned in tracker http://tracker.ceph.com/issues/20848 )

[ubuntu@smithi022 cd]$ sudo ceph daemon mon.smithi022 version  
{"version":"12.1.2-469-ge29c598","release":"luminous","release_type":"rc"}

3. run mgr create to create mgr, but in the logs I see following as i create mgr

[ubuntu@smithi022 ~]$ sudo tail -f /var/log/ceph/ceph-mon.smithi022.log
2017-08-08 23:20:55.939986 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e26: no daemons active
2017-08-08 23:21:00.955907 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e27: no daemons active
2017-08-08 23:21:05.989035 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e28: no daemons active
2017-08-08 23:21:10.979647 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e29: no daemons active
2017-08-08 23:21:15.995454 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e30: no daemons active
2017-08-08 23:21:21.011366 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e31: no daemons active
2017-08-08 23:21:26.018917 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e32: no daemons active
2017-08-08 23:21:31.034912 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e33: no daemons active
2017-08-08 23:21:36.050718 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e34: no daemons active
2017-08-08 23:21:41.066701 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e35: no daemons active
2017-08-08 23:21:46.082633 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e36: no daemons active
2017-08-08 23:21:50.530084 7f965ead8700  0 mon.smithi022@0(leader).data_health(5) update_stats avail 93% total 916 GB, used 9432 MB, avail 860 GB
2017-08-08 23:21:51.098663 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e37: no daemons active
2017-08-08 23:21:56.139458 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e38: no daemons active
2017-08-08 23:21:58.234998 7f965c2d3700  0 cephx server client.bootstrap-mgr:  unexpected key: req.key=5dc3ed0bd2defac3 expected_key=32830f9952c9e52
2017-08-08 23:22:01.147021 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e39: no daemons active
2017-08-08 23:22:06.162826 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e40: no daemons active
2017-08-08 23:22:11.200540 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e41: no daemons active
2017-08-08 23:22:16.194801 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e42: no daemons active
^C[ubuntu@smithi022 ~]$  sudo cat /var/lib/ceph/bootstrap-mgr/ceph.keyring
[client.bootstrap-mgr]
        key = AQDKK31ZYcZNKRAANhRFer5Fr0McbuW/QHla1w==
[ubuntu@smithi022 ~]$ sudo ceph auth list
installed auth entries:

osd.0
        key: AQB1LH1ZRBtTIRAAd4PkIVg4FAAwMaJJ3dKaAA==
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQCELH1Z2QXFNxAApvrhJD60mxRpN2FhhNrvHg==
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQCPLH1ZDt0FOBAA4TZrHGJIlDRl+0BF2Sy+MQ==
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.3
        key: AQCaLH1ZRhyyOBAAH4XK0+sEUu+lqba+SwEhJg==
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQDIK31ZLGS9GhAA5R2+1pXEWZNR29VYhIjs1A==
        caps: [mds] allow *
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQDJK31ZRWkhDBAAmwRH6NsAw8Bf/VcrfRLIDw==
        caps: [mgr] allow r
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
        key: AQDaRopZckmUIxAASSuQsbDmggTPrzPIt1NXJg==
        caps: [mon] allow profile bootstrap-mgr

I also have a teuthology test that fails here for 4 node: http://pulpito.ceph.com/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/
recreated this on single node smithi022 as well, Feel free to login to that node. I believe auth create workaround is not needed here due to latest sha.


Related issues 1 (0 open1 closed)

Copied to mgr - Backport #22034: luminous: key mismatch for mgr after upgrade from jewel to luminous(dev) ResolvedShinobu KinjoActions
Actions #1

Updated by Vasu Kulkarni over 6 years ago

  • Release set to luminous
Actions #2

Updated by John Spray over 6 years ago

  1. Hmm, so looking at that smithi022 node, I'm not sure about what I'm seeing:
[root@smithi022 jspray]# zgrep bootstrap-mgr /var/log/ceph/ceph.audit.log-20170730.gz 
2017-07-30 00:43:54.365900 mon.0 172.21.15.22:6789/0 29 : audit [INF] from='client.? 172.21.15.22:0/627587046' entity='mon.' cmd=[{"prefix": "auth get", "entity": "client.bootstrap-mgr"}]: dispatch
2017-07-30 00:43:54.692779 mon.0 172.21.15.22:6789/0 30 : audit [INF] from='client.? 172.21.15.22:0/2367782365' entity='mon.' cmd=[{"prefix": "auth get-or-create", "entity": "client.bootstrap-mgr", "caps": ["mon", "allow profile bootstrap-mgr"]}]: dispatch
2017-07-30 00:43:54.708204 mon.0 172.21.15.22:6789/0 31 : audit [INF] from='client.? 172.21.15.22:0/2367782365' entity='mon.' cmd='[{"prefix": "auth get-or-create", "entity": "client.bootstrap-mgr", "caps": ["mon", "allow profile bootstrap-mgr"]}]': finished
[root@smithi022 jspray]# ls -l /var/lib/ceph/bootstrap-mgr/ceph.keyring
-rw-------. 1 root root 71 Aug  8 23:21 /var/lib/ceph/bootstrap-mgr/ceph.keyring

That's a big time gap between when the bootstrap-mgr keys were created, and when the keyring file was written out.

Was that machine blank before the issue was reproduced, or could we be seeing ghosts of something older?

Actions #3

Updated by Vasu Kulkarni over 6 years ago

Smithi was initially setup as jewel and after some days I upgraded it to luminous to recreate that issue, but I also have mon logs from VPS nodes (which are created fresh)

http://pulpito.ceph.com/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/

you can search for unexpected key in this log here: http://qa-proxy.ceph.com/teuthology/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/1497830/remote/vpm101/log/ceph-mon.vpm101.log.gz

Actions #4

Updated by Vasu Kulkarni over 6 years ago

Also similar issue reported in ceph-devel list recently https://www.spinics.net/lists/ceph-devel/msg37911.html

Hi,

Just had a go at this - 12.1.3 from a freshly deployed Jewel (10.2.9) on Ubuntu 16.04, following the notes in http://ceph.com/releases/v12-1-3-luminous-rc-released/

It all worked nicely *except* for the the mgr deploy (arrrg - again)! This time it is a new wrinkle, it appears that the bootstrap-mgr auth key does not match the on disk keyring:

markir@ceph0:~$ sudo cat /var/lib/ceph/bootstrap-mgr/ceph.keyring
[client.bootstrap-mgr]
        key = AQBWP45ZZVsOKRAAQoLg48bT6niU/dI8BmGqJQ==

markir@ceph0:~$ sudo ceph auth get client.bootstrap-mgr
exported keyring for client.bootstrap-mgr
[client.bootstrap-mgr]
        key = AQC7RY5ZsEHRKRAAOhUNB2rc8r/Xdg7xXIAteA==
        caps mon = "allow profile bootstrap-mgr" 

Editing the on disk key to make it match the one 'ceph auth get' shows solves the problem, so pretty simple to work around - if you think to check that file and auth get for differences!

regards

Mark
Actions #5

Updated by Vasu Kulkarni over 6 years ago

John,

Here is another run with recent sha, could you look into the logs and see if there is anything missing?

http://pulpito.ceph.com/vasu-2017-08-22_16:13:55-upgrade-master-distro-basic-vps/1551616/

Actions #7

Updated by John Spray over 6 years ago

  • Status changed from New to Fix Under Review
Actions #8

Updated by John Spray over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to luminous
Actions #9

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #22034: luminous: key mismatch for mgr after upgrade from jewel to luminous(dev) added
Actions #10

Updated by Kefu Chai over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF