Bug #20950
closedkey mismatch for mgr after upgrade from jewel to luminous(dev)
0%
Description
1. Setup cluster using ceph-deploy using jewel release
2. ceph-deploy upgrade to luminous ( dev=luminous to pickup latest dev and this should have the fix for auth create for mgr - https://github.com/ceph/ceph/pull/16395
mentioned in tracker http://tracker.ceph.com/issues/20848 )
[ubuntu@smithi022 cd]$ sudo ceph daemon mon.smithi022 version {"version":"12.1.2-469-ge29c598","release":"luminous","release_type":"rc"}
3. run mgr create to create mgr, but in the logs I see following as i create mgr
[ubuntu@smithi022 ~]$ sudo tail -f /var/log/ceph/ceph-mon.smithi022.log 2017-08-08 23:20:55.939986 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e26: no daemons active 2017-08-08 23:21:00.955907 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e27: no daemons active 2017-08-08 23:21:05.989035 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e28: no daemons active 2017-08-08 23:21:10.979647 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e29: no daemons active 2017-08-08 23:21:15.995454 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e30: no daemons active 2017-08-08 23:21:21.011366 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e31: no daemons active 2017-08-08 23:21:26.018917 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e32: no daemons active 2017-08-08 23:21:31.034912 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e33: no daemons active 2017-08-08 23:21:36.050718 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e34: no daemons active 2017-08-08 23:21:41.066701 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e35: no daemons active 2017-08-08 23:21:46.082633 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e36: no daemons active 2017-08-08 23:21:50.530084 7f965ead8700 0 mon.smithi022@0(leader).data_health(5) update_stats avail 93% total 916 GB, used 9432 MB, avail 860 GB 2017-08-08 23:21:51.098663 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e37: no daemons active 2017-08-08 23:21:56.139458 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e38: no daemons active 2017-08-08 23:21:58.234998 7f965c2d3700 0 cephx server client.bootstrap-mgr: unexpected key: req.key=5dc3ed0bd2defac3 expected_key=32830f9952c9e52 2017-08-08 23:22:01.147021 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e39: no daemons active 2017-08-08 23:22:06.162826 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e40: no daemons active 2017-08-08 23:22:11.200540 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e41: no daemons active 2017-08-08 23:22:16.194801 7f96612dd700 0 log_channel(cluster) log [DBG] : mgrmap e42: no daemons active ^C[ubuntu@smithi022 ~]$ sudo cat /var/lib/ceph/bootstrap-mgr/ceph.keyring
[client.bootstrap-mgr] key = AQDKK31ZYcZNKRAANhRFer5Fr0McbuW/QHla1w== [ubuntu@smithi022 ~]$ sudo ceph auth list installed auth entries: osd.0 key: AQB1LH1ZRBtTIRAAd4PkIVg4FAAwMaJJ3dKaAA== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * osd.1 key: AQCELH1Z2QXFNxAApvrhJD60mxRpN2FhhNrvHg== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * osd.2 key: AQCPLH1ZDt0FOBAA4TZrHGJIlDRl+0BF2Sy+MQ== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * osd.3 key: AQCaLH1ZRhyyOBAAH4XK0+sEUu+lqba+SwEhJg== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * client.admin key: AQDIK31ZLGS9GhAA5R2+1pXEWZNR29VYhIjs1A== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow * caps: [osd] allow * client.bootstrap-mds key: AQDJK31ZRWkhDBAAmwRH6NsAw8Bf/VcrfRLIDw== caps: [mgr] allow r caps: [mon] allow profile bootstrap-mds client.bootstrap-mgr key: AQDaRopZckmUIxAASSuQsbDmggTPrzPIt1NXJg== caps: [mon] allow profile bootstrap-mgr
I also have a teuthology test that fails here for 4 node: http://pulpito.ceph.com/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/
recreated this on single node smithi022 as well, Feel free to login to that node. I believe auth create workaround is not needed here due to latest sha.
Updated by John Spray over 6 years ago
- Hmm, so looking at that smithi022 node, I'm not sure about what I'm seeing:
[root@smithi022 jspray]# zgrep bootstrap-mgr /var/log/ceph/ceph.audit.log-20170730.gz 2017-07-30 00:43:54.365900 mon.0 172.21.15.22:6789/0 29 : audit [INF] from='client.? 172.21.15.22:0/627587046' entity='mon.' cmd=[{"prefix": "auth get", "entity": "client.bootstrap-mgr"}]: dispatch 2017-07-30 00:43:54.692779 mon.0 172.21.15.22:6789/0 30 : audit [INF] from='client.? 172.21.15.22:0/2367782365' entity='mon.' cmd=[{"prefix": "auth get-or-create", "entity": "client.bootstrap-mgr", "caps": ["mon", "allow profile bootstrap-mgr"]}]: dispatch 2017-07-30 00:43:54.708204 mon.0 172.21.15.22:6789/0 31 : audit [INF] from='client.? 172.21.15.22:0/2367782365' entity='mon.' cmd='[{"prefix": "auth get-or-create", "entity": "client.bootstrap-mgr", "caps": ["mon", "allow profile bootstrap-mgr"]}]': finished [root@smithi022 jspray]# ls -l /var/lib/ceph/bootstrap-mgr/ceph.keyring -rw-------. 1 root root 71 Aug 8 23:21 /var/lib/ceph/bootstrap-mgr/ceph.keyring
That's a big time gap between when the bootstrap-mgr keys were created, and when the keyring file was written out.
Was that machine blank before the issue was reproduced, or could we be seeing ghosts of something older?
Updated by Vasu Kulkarni over 6 years ago
Smithi was initially setup as jewel and after some days I upgraded it to luminous to recreate that issue, but I also have mon logs from VPS nodes (which are created fresh)
http://pulpito.ceph.com/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/
you can search for unexpected key in this log here: http://qa-proxy.ceph.com/teuthology/vasu-2017-08-08_18:34:23-upgrade-master-distro-basic-vps/1497830/remote/vpm101/log/ceph-mon.vpm101.log.gz
Updated by Vasu Kulkarni over 6 years ago
Also similar issue reported in ceph-devel list recently https://www.spinics.net/lists/ceph-devel/msg37911.html
Hi, Just had a go at this - 12.1.3 from a freshly deployed Jewel (10.2.9) on Ubuntu 16.04, following the notes in http://ceph.com/releases/v12-1-3-luminous-rc-released/ It all worked nicely *except* for the the mgr deploy (arrrg - again)! This time it is a new wrinkle, it appears that the bootstrap-mgr auth key does not match the on disk keyring: markir@ceph0:~$ sudo cat /var/lib/ceph/bootstrap-mgr/ceph.keyring [client.bootstrap-mgr] key = AQBWP45ZZVsOKRAAQoLg48bT6niU/dI8BmGqJQ== markir@ceph0:~$ sudo ceph auth get client.bootstrap-mgr exported keyring for client.bootstrap-mgr [client.bootstrap-mgr] key = AQC7RY5ZsEHRKRAAOhUNB2rc8r/Xdg7xXIAteA== caps mon = "allow profile bootstrap-mgr" Editing the on disk key to make it match the one 'ceph auth get' shows solves the problem, so pretty simple to work around - if you think to check that file and auth get for differences! regards Mark
Updated by Vasu Kulkarni over 6 years ago
John,
Here is another run with recent sha, could you look into the logs and see if there is anything missing?
http://pulpito.ceph.com/vasu-2017-08-22_16:13:55-upgrade-master-distro-basic-vps/1551616/
Updated by John Spray over 6 years ago
- Status changed from New to Fix Under Review
Updated by John Spray over 6 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to luminous
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #22034: luminous: key mismatch for mgr after upgrade from jewel to luminous(dev) added
Updated by Kefu Chai over 6 years ago
- Status changed from Pending Backport to Resolved