Bug #38941
closedError when enabling mgr module 'restful'
0%
Description
Looks like some jobs in teuthology are failing when deploying ceph using ceph-ansible with the following error:
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout:failed: [ovh028.front.sepia.ceph.com -> ovh010.front.sepia.ceph.com] (item=restful) => {
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout: "changed": true,
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout: "cmd": [
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout: "ceph",
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout: "--cluster",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "ceph",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "mgr",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "module",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "enable",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "restful"
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: ],
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "delta": "0:00:00.291293",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "end": "2019-03-25 04:44:05.477688",
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout: "item": "restful",
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout: "rc": 2,
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout: "start": "2019-03-25 04:44:05.186395"
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout:}
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout:STDERR:
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:MSG:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:non-zero return code
I couldn't reproduce this issue outside of teuthology context:
[root@mon0 ~]# ceph mgr module disable restful
[root@mon0 ~]# ceph mgr module enable restful
[root@mon0 ~]# ceph status
cluster:
id: 08a81dcb-bd86-406b-93e5-3fafa57ee3b8
health: HEALTH_OK
services:
mon: 3 daemons, quorum mon0,mon1,mon2 (age 64m)
mgr: mgr0(active, since 5s), standbys: mgr1
so I'm thinking of a possible race condition in 'pending_map' here : https://github.com/ceph/ceph/commit/d953198aff3e5aaa6e2eadcf8a53c9e0279a30de#diff-0bd5017ae61455d420a77a69ed75b4d4R626
Is there a case where the cluster would report 'restful' module wouldn't be available on all mgr even if they are running the same version?
All mons and mgrs are running the same ceph version when this error occurs:
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout:ok: [ovh042.front.sepia.ceph.com] => {
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout: "ansible_facts": {
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout: "ceph_version": "14.2.0-458-g673f1e8"
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout: },
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout: "changed": false
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout:}
2019-03-25T04:42:44.052 INFO:teuthology.orchestra.run.ovh027.stdout:ok: [ovh028.front.sepia.ceph.com] => {
2019-03-25T04:42:44.052 INFO:teuthology.orchestra.run.ovh027.stdout: "ansible_facts": {
2019-03-25T04:42:44.052 INFO:teuthology.orchestra.run.ovh027.stdout: "ceph_version": "14.2.0-458-g673f1e8"
2019-03-25T04:42:44.052 INFO:teuthology.orchestra.run.ovh027.stdout: },
2019-03-25T04:42:44.053 INFO:teuthology.orchestra.run.ovh027.stdout: "changed": false
2019-03-25T04:42:44.053 INFO:teuthology.orchestra.run.ovh027.stdout:}
Updated by Tim Serong about 5 years ago
I'm guessing `ceph mgr module enable restful` is happening very quickly after ceph mgr starts. We dealt with this in DeepSea by waiting on `'test "$(ceph mgr dump | jq .available)" = "true"` in a loop (see https://github.com/SUSE/DeepSea/pull/1563)
Updated by Guillaume Abrioux about 5 years ago
I've implemented the fix suggested by Tim Serong in c1 to deal with this issue at ceph-ansible level.
I guess we can close this since it looks more like an orchestration issue.
Updated by Sebastian Wagner almost 5 years ago
- Status changed from New to Closed
Closed accordingly.