Project

General

Profile

Bug #16477

ceph cli: Rados object in state configuring race

Added by Stefan Rubner over 1 year ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
06/25/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Release:
jewel
Needs Doc:
No

Description

On a clean install using the Jewel packages for 10.2.2 on Ubuntu 1604 I'm running into the same problem as described in #16379.

$ sudo dpkg -l | grep ceph
ii  ceph                                 10.2.2-1xenial                  amd64        distributed storage and file system
ii  ceph-base                            10.2.2-1xenial                  amd64        common ceph daemon libraries and management tools
ii  ceph-common                          10.2.2-1xenial                  amd64        common utilities to mount and interact with a ceph storage cluster
ii  ceph-deploy                          1.5.34                          all          Ceph-deploy is an easy to use configuration tool
ii  ceph-mds                             10.2.2-1xenial                  amd64        metadata server for the ceph distributed file system
ii  ceph-mon                             10.2.2-1xenial                  amd64        monitor server for the ceph storage system
ii  ceph-osd                             10.2.2-1xenial                  amd64        OSD server for the ceph storage system
ii  libcephfs1                           10.2.2-1xenial                  amd64        Ceph distributed file system client library
ii  python-cephfs                        10.2.2-1xenial                  amd64        Python libraries for the Ceph libcephfs library

Running ceph-deploy mon create-initial results in:

[2016-06-24 16:03:45,628][ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephir/.cephdeploy.conf
[2016-06-24 16:03:45,629][ceph_deploy.cli][INFO  ] Invoked (1.5.34): /usr/bin/ceph-deploy mon create-initial
[2016-06-24 16:03:45,629][ceph_deploy.cli][INFO  ] ceph-deploy options:
[2016-06-24 16:03:45,629][ceph_deploy.cli][INFO  ]  username                      : None
[2016-06-24 16:03:45,629][ceph_deploy.cli][INFO  ]  verbose                       : False
[2016-06-24 16:03:45,629][ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[2016-06-24 16:03:45,630][ceph_deploy.cli][INFO  ]  subcommand                    : create-initial
[2016-06-24 16:03:45,630][ceph_deploy.cli][INFO  ]  quiet                         : False
[2016-06-24 16:03:45,630][ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fe6fd4decf8>
[2016-06-24 16:03:45,630][ceph_deploy.cli][INFO  ]  cluster                       : ceph
[2016-06-24 16:03:45,630][ceph_deploy.cli][INFO  ]  func                          : <function mon at 0x7fe6fd4bf140>
[2016-06-24 16:03:45,630][ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[2016-06-24 16:03:45,631][ceph_deploy.cli][INFO  ]  keyrings                      : None
[2016-06-24 16:03:45,631][ceph_deploy.cli][INFO  ]  default_release               : False
[2016-06-24 16:03:45,632][ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph-mon-01
[2016-06-24 16:03:45,632][ceph_deploy.mon][DEBUG ] detecting platform for host ceph-mon-01 ...
[2016-06-24 16:03:46,037][ceph-mon-01][DEBUG ] connection detected need for sudo
[2016-06-24 16:03:46,404][ceph-mon-01][DEBUG ] connected to host: ceph-mon-01 
[2016-06-24 16:03:46,405][ceph-mon-01][DEBUG ] detect platform information from remote host
[2016-06-24 16:03:46,467][ceph-mon-01][DEBUG ] detect machine type
[2016-06-24 16:03:46,474][ceph-mon-01][DEBUG ] find the location of an executable
[2016-06-24 16:03:46,475][ceph_deploy.mon][INFO  ] distro info: Ubuntu 16.04 xenial
[2016-06-24 16:03:46,476][ceph-mon-01][DEBUG ] determining if provided host has same hostname in remote
[2016-06-24 16:03:46,476][ceph-mon-01][DEBUG ] get remote short hostname
[2016-06-24 16:03:46,477][ceph-mon-01][DEBUG ] deploying mon to ceph-mon-01
[2016-06-24 16:03:46,478][ceph-mon-01][DEBUG ] get remote short hostname
[2016-06-24 16:03:46,479][ceph-mon-01][DEBUG ] remote hostname: ceph-mon-01
[2016-06-24 16:03:46,482][ceph-mon-01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[2016-06-24 16:03:46,485][ceph-mon-01][DEBUG ] create the mon path if it does not exist
[2016-06-24 16:03:46,487][ceph-mon-01][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph-mon-01/done
[2016-06-24 16:03:46,488][ceph-mon-01][DEBUG ] create a done file to avoid re-doing the mon deployment
[2016-06-24 16:03:46,489][ceph-mon-01][DEBUG ] create the init path if it does not exist
[2016-06-24 16:03:46,493][ceph-mon-01][INFO  ] Running command: sudo systemctl enable ceph.target
[2016-06-24 16:03:46,617][ceph-mon-01][INFO  ] Running command: sudo systemctl enable ceph-mon@ceph-mon-01
[2016-06-24 16:03:46,788][ceph-mon-01][INFO  ] Running command: sudo systemctl start ceph-mon@ceph-mon-01
[2016-06-24 16:03:48,814][ceph-mon-01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon-01.asok mon_status
[2016-06-24 16:03:48,980][ceph-mon-01][DEBUG ] ********************************************************************************
[2016-06-24 16:03:48,980][ceph-mon-01][DEBUG ] status for monitor: mon.ceph-mon-01
[2016-06-24 16:03:48,981][ceph-mon-01][DEBUG ] {
[2016-06-24 16:03:48,981][ceph-mon-01][DEBUG ]   "election_epoch": 3, 
[2016-06-24 16:03:48,981][ceph-mon-01][DEBUG ]   "extra_probe_peers": [
[2016-06-24 16:03:48,981][ceph-mon-01][DEBUG ]     "172.29.50.231:6789/0" 
[2016-06-24 16:03:48,981][ceph-mon-01][DEBUG ]   ], 
[2016-06-24 16:03:48,981][ceph-mon-01][DEBUG ]   "monmap": {
[2016-06-24 16:03:48,982][ceph-mon-01][DEBUG ]     "created": "2016-06-24 16:02:37.367266", 
[2016-06-24 16:03:48,982][ceph-mon-01][DEBUG ]     "epoch": 1, 
[2016-06-24 16:03:48,982][ceph-mon-01][DEBUG ]     "fsid": "76849e1f-1002-4add-ab2e-a8da7d163ed0", 
[2016-06-24 16:03:48,982][ceph-mon-01][DEBUG ]     "modified": "2016-06-24 16:02:37.367266", 
[2016-06-24 16:03:48,982][ceph-mon-01][DEBUG ]     "mons": [
[2016-06-24 16:03:48,982][ceph-mon-01][DEBUG ]       {
[2016-06-24 16:03:48,983][ceph-mon-01][DEBUG ]         "addr": "172.28.50.231:6789/0", 
[2016-06-24 16:03:48,983][ceph-mon-01][DEBUG ]         "name": "ceph-mon-01", 
[2016-06-24 16:03:48,983][ceph-mon-01][DEBUG ]         "rank": 0
[2016-06-24 16:03:48,983][ceph-mon-01][DEBUG ]       }
[2016-06-24 16:03:48,983][ceph-mon-01][DEBUG ]     ]
[2016-06-24 16:03:48,983][ceph-mon-01][DEBUG ]   }, 
[2016-06-24 16:03:48,984][ceph-mon-01][DEBUG ]   "name": "ceph-mon-01", 
[2016-06-24 16:03:48,984][ceph-mon-01][DEBUG ]   "outside_quorum": [], 
[2016-06-24 16:03:48,984][ceph-mon-01][DEBUG ]   "quorum": [
[2016-06-24 16:03:48,984][ceph-mon-01][DEBUG ]     0
[2016-06-24 16:03:48,984][ceph-mon-01][DEBUG ]   ], 
[2016-06-24 16:03:48,984][ceph-mon-01][DEBUG ]   "rank": 0, 
[2016-06-24 16:03:48,985][ceph-mon-01][DEBUG ]   "state": "leader", 
[2016-06-24 16:03:48,985][ceph-mon-01][DEBUG ]   "sync_provider": []
[2016-06-24 16:03:48,985][ceph-mon-01][DEBUG ] }
[2016-06-24 16:03:48,985][ceph-mon-01][DEBUG ] ********************************************************************************
[2016-06-24 16:03:48,985][ceph-mon-01][INFO  ] monitor: mon.ceph-mon-01 is running
[2016-06-24 16:03:48,988][ceph-mon-01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon-01.asok mon_status
[2016-06-24 16:03:49,154][ceph_deploy.mon][INFO  ] processing monitor mon.ceph-mon-01
[2016-06-24 16:03:49,557][ceph-mon-01][DEBUG ] connection detected need for sudo
[2016-06-24 16:03:49,932][ceph-mon-01][DEBUG ] connected to host: ceph-mon-01 
[2016-06-24 16:03:49,933][ceph-mon-01][DEBUG ] detect platform information from remote host
[2016-06-24 16:03:49,997][ceph-mon-01][DEBUG ] detect machine type
[2016-06-24 16:03:50,004][ceph-mon-01][DEBUG ] find the location of an executable
[2016-06-24 16:03:50,009][ceph-mon-01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon-01.asok mon_status
[2016-06-24 16:03:50,176][ceph_deploy.mon][INFO  ] mon.ceph-mon-01 monitor has reached quorum!
[2016-06-24 16:03:50,176][ceph_deploy.mon][INFO  ] all initial monitors are running and have formed quorum
[2016-06-24 16:03:50,176][ceph_deploy.mon][INFO  ] Running gatherkeys...
[2016-06-24 16:03:50,178][ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory /tmp/tmpkoz_QK
[2016-06-24 16:03:50,533][ceph-mon-01][DEBUG ] connection detected need for sudo
[2016-06-24 16:03:50,888][ceph-mon-01][DEBUG ] connected to host: ceph-mon-01 
[2016-06-24 16:03:50,889][ceph-mon-01][DEBUG ] detect platform information from remote host
[2016-06-24 16:03:50,949][ceph-mon-01][DEBUG ] detect machine type
[2016-06-24 16:03:50,955][ceph-mon-01][DEBUG ] get remote short hostname
[2016-06-24 16:03:50,957][ceph-mon-01][DEBUG ] fetch remote file
[2016-06-24 16:03:50,960][ceph-mon-01][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.ceph-mon-01.asok mon_status
[2016-06-24 16:03:51,129][ceph-mon-01][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-ceph-mon-01/keyring auth get-or-create client.admin osd allow * mds allow * mon allow *
[2016-06-24 16:04:16,286][ceph-mon-01][ERROR ] "ceph auth get-or-create for keytype admin returned 1
[2016-06-24 16:04:16,286][ceph-mon-01][DEBUG ] 2016-06-24 16:03:51.257412 7f8293d22700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f8298059be0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8298059a20).fault
[2016-06-24 16:04:16,286][ceph-mon-01][DEBUG ] 2016-06-24 16:03:54.257802 7f8293c21700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f8288000cc0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8288002000).fault
[2016-06-24 16:04:16,286][ceph-mon-01][DEBUG ] 2016-06-24 16:03:57.259150 7f8293d22700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f82880052c0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f82880065a0).fault
[2016-06-24 16:04:16,286][ceph-mon-01][DEBUG ] 2016-06-24 16:04:00.258997 7f8293c21700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f8288000cc0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f82880024d0).fault
[2016-06-24 16:04:16,287][ceph-mon-01][DEBUG ] 2016-06-24 16:04:03.259256 7f8293d22700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f82880052c0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8288002ff0).fault
[2016-06-24 16:04:16,287][ceph-mon-01][DEBUG ] 2016-06-24 16:04:06.259529 7f8293c21700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f8288000cc0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8288003610).fault
[2016-06-24 16:04:16,287][ceph-mon-01][DEBUG ] 2016-06-24 16:04:09.260115 7f8293d22700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f82880052c0 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8288004340).fault
[2016-06-24 16:04:16,287][ceph-mon-01][DEBUG ] 2016-06-24 16:04:12.260518 7f8293c21700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f8288000cc0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8288008fd0).fault
[2016-06-24 16:04:16,287][ceph-mon-01][DEBUG ] 2016-06-24 16:04:15.260927 7f8293d22700  0 -- :/2431738263 >> 172.29.50.231:6789/0 pipe(0x7f82880052c0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8288009bb0).fault
[2016-06-24 16:04:16,287][ceph-mon-01][DEBUG ] Traceback (most recent call last):
[2016-06-24 16:04:16,287][ceph-mon-01][DEBUG ]   File "/usr/bin/ceph", line 948, in <module>
[2016-06-24 16:04:16,288][ceph-mon-01][DEBUG ]     retval = main()
[2016-06-24 16:04:16,288][ceph-mon-01][DEBUG ]   File "/usr/bin/ceph", line 852, in main
[2016-06-24 16:04:16,288][ceph-mon-01][DEBUG ]     prefix='get_command_descriptions')
[2016-06-24 16:04:16,288][ceph-mon-01][DEBUG ]   File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1291, in json_command
[2016-06-24 16:04:16,288][ceph-mon-01][DEBUG ]     raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
[2016-06-24 16:04:16,288][ceph-mon-01][DEBUG ] RuntimeError: "None": exception "['{"prefix": "get_command_descriptions"}']": exception You cannot perform that operation on a Rados object in state configuring.
[2016-06-24 16:04:16,290][ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:ceph-mon-01
[2016-06-24 16:04:16,290][ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmpkoz_QK
[2016-06-24 16:04:16,290][ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon

log.gz - ceph-helpers test log (55.3 KB) Loic Dachary, 09/13/2016 06:42 AM


Related issues

Copied to Ceph - Backport #17385: jewel: ceph cli: Rados object in state configuring race Resolved

History

#1 Updated by Loic Dachary about 1 year ago

  • File log.gz added
  • Subject changed from Fresh install of Jewel 10.2.2 fails on Ubuntu 1604 due to #16379 to ceph cli: Rados object in state configuring race
  • Status changed from New to Verified

I think this is a race condition. It happened during a test as well and is apparently rare.

/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/ceph-helpers.sh:217: test_kill_daemon:  ceph --connect-timeout 60 status
ceph-mon: mon.noname-a 127.0.0.1:7109/0 is local, renaming to mon.a
ceph-mon: set fsid to 4678a81b-ece6-4d52-a5e1-5bbc64007ea4
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/ceph", line 949, in <module>
    retval = main()
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/ceph", line 853, in main
    prefix='get_command_descriptions')
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/ceph_argparse.py", line 1312, in json_command
    raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
RuntimeError: "None": exception "['{"prefix": "get_command_descriptions"}']": exception You cannot perform that operation on a Rados object in state configuring.
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/ceph-helpers.sh:220: test_kill_daemon:  teardown testdir/ceph-helpers
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/ceph-helpers.sh:118: teardown:  local dir=testdir/ceph-helpers
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/ceph-helpers.sh:119: teardown:  kill_daemons testdir/ceph-helpers KILL
//home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/ceph-helpers.sh:252: kill_daemons:  shopt -q -o xtrace

When it happens it looks like the ceph command returns on success although it should return on error but that's a detail.

#2 Updated by Loic Dachary about 1 year ago

  • Status changed from Verified to Need Review

#3 Updated by Loic Dachary about 1 year ago

  • Backport set to jewel

#4 Updated by Kefu Chai about 1 year ago

  • Status changed from Need Review to Pending Backport
  • Assignee set to Loic Dachary
  • Needs Doc set to No

#5 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #17385: jewel: ceph cli: Rados object in state configuring race added

#6 Updated by Nathan Cutler about 2 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF