Project

General

Profile

Bug #20950

key mismatch for mgr after upgrade from jewel to luminous(dev)

Added by Vasu Kulkarni 4 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
08/08/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
luminous
Needs Doc:
No

Description

1. Setup cluster using ceph-deploy using jewel release
2. ceph-deploy upgrade to luminous ( dev=luminous to pickup latest dev and this should have the fix for auth create for mgr - https://github.com/ceph/ceph/pull/16395
mentioned in tracker http://tracker.ceph.com/issues/20848 )

[ubuntu@smithi022 cd]$ sudo ceph daemon mon.smithi022 version  
{"version":"12.1.2-469-ge29c598","release":"luminous","release_type":"rc"}

3. run mgr create to create mgr, but in the logs I see following as i create mgr

[ubuntu@smithi022 ~]$ sudo tail -f /var/log/ceph/ceph-mon.smithi022.log
2017-08-08 23:20:55.939986 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e26: no daemons active
2017-08-08 23:21:00.955907 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e27: no daemons active
2017-08-08 23:21:05.989035 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e28: no daemons active
2017-08-08 23:21:10.979647 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e29: no daemons active
2017-08-08 23:21:15.995454 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e30: no daemons active
2017-08-08 23:21:21.011366 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e31: no daemons active
2017-08-08 23:21:26.018917 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e32: no daemons active
2017-08-08 23:21:31.034912 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e33: no daemons active
2017-08-08 23:21:36.050718 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e34: no daemons active
2017-08-08 23:21:41.066701 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e35: no daemons active
2017-08-08 23:21:46.082633 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e36: no daemons active
2017-08-08 23:21:50.530084 7f965ead8700  0 mon.smithi022@0(leader).data_health(5) update_stats avail 93% total 916 GB, used 9432 MB, avail 860 GB
2017-08-08 23:21:51.098663 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e37: no daemons active
2017-08-08 23:21:56.139458 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e38: no daemons active
2017-08-08 23:21:58.234998 7f965c2d3700  0 cephx server client.bootstrap-mgr:  unexpected key: req.key=5dc3ed0bd2defac3 expected_key=32830f9952c9e52
2017-08-08 23:22:01.147021 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e39: no daemons active
2017-08-08 23:22:06.162826 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e40: no daemons active
2017-08-08 23:22:11.200540 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e41: no daemons active
2017-08-08 23:22:16.194801 7f96612dd700  0 log_channel(cluster) log [DBG] : mgrmap e42: no daemons active
^C[ubuntu@smithi022 ~]$  sudo cat /var/lib/ceph/bootstrap-mgr/ceph.keyring
[client.bootstrap-mgr]
        key = AQDKK31ZYcZNKRAANhRFer5Fr0McbuW/QHla1w==
[ubuntu@smithi022 ~]$ sudo ceph auth list
installed auth entries:

osd.0
        key: AQB1LH1ZRBtTIRAAd4PkIVg4FAAwMaJJ3dKaAA==
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQCELH1Z2QXFNxAApvrhJD60mxRpN2FhhNrvHg==
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQCPLH1ZDt0FOBAA4TZrHGJIlDRl+0BF2Sy+MQ==
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.3
        key: AQCaLH1ZRhyyOBAAH4XK0+sEUu+lqba+SwEhJg==
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQDIK31ZLGS9GhAA5R2+1pXEWZNR29VYhIjs1A==
        caps: [mds] allow *
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQDJK31ZRWkhDBAAmwRH6NsAw8Bf/VcrfRLIDw==
        caps: [mgr] allow r
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
        key: AQDaRopZckmUIxAASSuQsbDmggTPrzPIt1NXJg==
        caps: [mon] allow profile bootstrap-mgr

I also have a teuthology test that fails here for 4 node: http://pulpito.ceph.com/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/
recreated this on single node smithi022 as well, Feel free to login to that node. I believe auth create workaround is not needed here due to latest sha.


Related issues

Copied to mgr - Backport #22034: luminous: key mismatch for mgr after upgrade from jewel to luminous(dev) Resolved

History

#1 Updated by Vasu Kulkarni 4 months ago

  • Release luminous added

#2 Updated by John Spray 4 months ago

  1. Hmm, so looking at that smithi022 node, I'm not sure about what I'm seeing:
[root@smithi022 jspray]# zgrep bootstrap-mgr /var/log/ceph/ceph.audit.log-20170730.gz 
2017-07-30 00:43:54.365900 mon.0 172.21.15.22:6789/0 29 : audit [INF] from='client.? 172.21.15.22:0/627587046' entity='mon.' cmd=[{"prefix": "auth get", "entity": "client.bootstrap-mgr"}]: dispatch
2017-07-30 00:43:54.692779 mon.0 172.21.15.22:6789/0 30 : audit [INF] from='client.? 172.21.15.22:0/2367782365' entity='mon.' cmd=[{"prefix": "auth get-or-create", "entity": "client.bootstrap-mgr", "caps": ["mon", "allow profile bootstrap-mgr"]}]: dispatch
2017-07-30 00:43:54.708204 mon.0 172.21.15.22:6789/0 31 : audit [INF] from='client.? 172.21.15.22:0/2367782365' entity='mon.' cmd='[{"prefix": "auth get-or-create", "entity": "client.bootstrap-mgr", "caps": ["mon", "allow profile bootstrap-mgr"]}]': finished
[root@smithi022 jspray]# ls -l /var/lib/ceph/bootstrap-mgr/ceph.keyring
-rw-------. 1 root root 71 Aug  8 23:21 /var/lib/ceph/bootstrap-mgr/ceph.keyring

That's a big time gap between when the bootstrap-mgr keys were created, and when the keyring file was written out.

Was that machine blank before the issue was reproduced, or could we be seeing ghosts of something older?

#3 Updated by Vasu Kulkarni 4 months ago

Smithi was initially setup as jewel and after some days I upgraded it to luminous to recreate that issue, but I also have mon logs from VPS nodes (which are created fresh)

http://pulpito.ceph.com/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/

you can search for unexpected key in this log here: http://qa-proxy.ceph.com/teuthology/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/1497830/remote/vpm101/log/ceph-mon.vpm101.log.gz

#4 Updated by Vasu Kulkarni 4 months ago

Also similar issue reported in ceph-devel list recently https://www.spinics.net/lists/ceph-devel/msg37911.html

Hi,

Just had a go at this - 12.1.3 from a freshly deployed Jewel (10.2.9) on Ubuntu 16.04, following the notes in http://ceph.com/releases/v12-1-3-luminous-rc-released/

It all worked nicely *except* for the the mgr deploy (arrrg - again)! This time it is a new wrinkle, it appears that the bootstrap-mgr auth key does not match the on disk keyring:

markir@ceph0:~$ sudo cat /var/lib/ceph/bootstrap-mgr/ceph.keyring
[client.bootstrap-mgr]
        key = AQBWP45ZZVsOKRAAQoLg48bT6niU/dI8BmGqJQ==

markir@ceph0:~$ sudo ceph auth get client.bootstrap-mgr
exported keyring for client.bootstrap-mgr
[client.bootstrap-mgr]
        key = AQC7RY5ZsEHRKRAAOhUNB2rc8r/Xdg7xXIAteA==
        caps mon = "allow profile bootstrap-mgr" 

Editing the on disk key to make it match the one 'ceph auth get' shows solves the problem, so pretty simple to work around - if you think to check that file and auth get for differences!

regards

Mark

#5 Updated by Vasu Kulkarni 4 months ago

John,

Here is another run with recent sha, could you look into the logs and see if there is anything missing?

http://pulpito.ceph.com/vasu-2017-08-22_16:13:55-upgrade-master-distro-basic-vps/1551616/

#7 Updated by John Spray about 2 months ago

  • Status changed from New to Need Review

#8 Updated by John Spray about 1 month ago

  • Status changed from Need Review to Pending Backport
  • Backport set to luminous

#9 Updated by Nathan Cutler about 1 month ago

  • Copied to Backport #22034: luminous: key mismatch for mgr after upgrade from jewel to luminous(dev) added

#10 Updated by Kefu Chai about 1 month ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF