Project

General

Profile

Actions

Bug #38941

closed

Error when enabling mgr module 'restful'

Added by Guillaume Abrioux about 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Looks like some jobs in teuthology are failing when deploying ceph using ceph-ansible with the following error:

2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout:failed: [ovh028.front.sepia.ceph.com -> ovh010.front.sepia.ceph.com] (item=restful) => {
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout: "changed": true,
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout: "cmd": [
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout: "ceph",
2019-03-25T04:44:05.524 INFO:teuthology.orchestra.run.ovh027.stdout: "--cluster",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "ceph",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "mgr",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "module",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "enable",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "restful"
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: ],
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "delta": "0:00:00.291293",
2019-03-25T04:44:05.525 INFO:teuthology.orchestra.run.ovh027.stdout: "end": "2019-03-25 04:44:05.477688",
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout: "item": "restful",
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout: "rc": 2,
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout: "start": "2019-03-25 04:44:05.186395"
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout:}
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout:STDERR:
2019-03-25T04:44:05.526 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:MSG:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:
2019-03-25T04:44:05.527 INFO:teuthology.orchestra.run.ovh027.stdout:non-zero return code

I couldn't reproduce this issue outside of teuthology context:

[root@mon0 ~]# ceph mgr module disable restful
[root@mon0 ~]# ceph mgr module enable restful
[root@mon0 ~]# ceph status
cluster:
id: 08a81dcb-bd86-406b-93e5-3fafa57ee3b8
health: HEALTH_OK

services:
mon: 3 daemons, quorum mon0,mon1,mon2 (age 64m)
mgr: mgr0(active, since 5s), standbys: mgr1

so I'm thinking of a possible race condition in 'pending_map' here : https://github.com/ceph/ceph/commit/d953198aff3e5aaa6e2eadcf8a53c9e0279a30de#diff-0bd5017ae61455d420a77a69ed75b4d4R626

Is there a case where the cluster would report 'restful' module wouldn't be available on all mgr even if they are running the same version?

All mons and mgrs are running the same ceph version when this error occurs:

2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout:ok: [ovh042.front.sepia.ceph.com] => {
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout: "ansible_facts": {
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout: "ceph_version": "14.2.0-458-g673f1e8"
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout: },
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout: "changed": false
2019-03-25T04:42:44.002 INFO:teuthology.orchestra.run.ovh027.stdout:}
2019-03-25T04:42:44.052 INFO:teuthology.orchestra.run.ovh027.stdout:ok: [ovh028.front.sepia.ceph.com] => {
2019-03-25T04:42:44.052 INFO:teuthology.orchestra.run.ovh027.stdout: "ansible_facts": {
2019-03-25T04:42:44.052 INFO:teuthology.orchestra.run.ovh027.stdout: "ceph_version": "14.2.0-458-g673f1e8"
2019-03-25T04:42:44.052 INFO:teuthology.orchestra.run.ovh027.stdout: },
2019-03-25T04:42:44.053 INFO:teuthology.orchestra.run.ovh027.stdout: "changed": false
2019-03-25T04:42:44.053 INFO:teuthology.orchestra.run.ovh027.stdout:}

Actions #1

Updated by Tim Serong about 5 years ago

I'm guessing `ceph mgr module enable restful` is happening very quickly after ceph mgr starts. We dealt with this in DeepSea by waiting on `'test "$(ceph mgr dump | jq .available)" = "true"` in a loop (see https://github.com/SUSE/DeepSea/pull/1563)

Actions #2

Updated by Brad Hubbard about 5 years ago

  • Project changed from Ceph to mgr
Actions #3

Updated by Brad Hubbard about 5 years ago

  • Source set to Development
Actions #4

Updated by Brad Hubbard about 5 years ago

Related to #38853 ?

Actions #5

Updated by Guillaume Abrioux about 5 years ago

I've implemented the fix suggested by Tim Serong in c1 to deal with this issue at ceph-ansible level.

I guess we can close this since it looks more like an orchestration issue.

Actions #6

Updated by Sebastian Wagner almost 5 years ago

  • Status changed from New to Closed

Closed accordingly.

Actions

Also available in: Atom PDF