Project

General

Profile

Actions

Bug #5895

closed

Bug #5971: ceph-deploy: ceph-create-keys hung during mon create in dumpling release on centos 6.4

ceph-deploy: mon create command hung on ceph-create-keys in cuttlefish branch on RHEL 6.3

Added by Tamilarasi muthamizhan over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
ceph-deploy
Target version:
-
% Done:

100%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

on centos 6.3, when trying to deploy ceph using --stable=cuttlefish branch, the mon create command hung on ceph-create-keys [the firewall is turned off].

leaving the test setup in the current state for you to take a look at it : burnupi27, burnupi28.

[ubuntu@burnupi27 ceph-deploy]$ ./ceph-deploy mon create burnupi27 burnupi28
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts burnupi27 burnupi28
[ceph_deploy.mon][DEBUG ] detecting platform for host burnupi27 ...
[ceph_deploy.mon][INFO  ] distro info: RedHatEnterpriseServer 6.3 Santiago
[burnupi27][DEBUG ] deploying mon to burnupi27
[burnupi27][DEBUG ] remote hostname: burnupi27
[burnupi27][INFO  ] write cluster configuration to /etc/ceph/{cluster}.conf
[burnupi27][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-burnupi27/done
[burnupi27][INFO  ] create a done file to avoid re-doing the mon deployment
[burnupi27][INFO  ] create the init path if it does not exist
[burnupi27][INFO  ] locating `service` executable...
[burnupi27][INFO  ] found `service` executable: /sbin/service
[burnupi27][INFO  ] Running command: /sbin/service ceph start mon.burnupi27
[burnupi27][INFO  ] === mon.burnupi27 === 
[burnupi27][INFO  ] Starting Ceph mon.burnupi27 on burnupi27...
[burnupi27][INFO  ] Starting ceph-create-keys on burnupi27...


Subtasks 1 (0 open1 closed)

Bug #5975: Find a real fix for the pushy issue of hanging/deadlocking during long-running processesResolvedAlfredo Deza08/15/2013

Actions
Actions #1

Updated by Tamilarasi muthamizhan over 10 years ago

  • Subject changed from ceph-deploy: mon create command hung on ceph-create-keys in cuttlefish branch on centos 6.3 to ceph-deploy: mon create command hung on ceph-create-keys in cuttlefish branch on RHEL 6.3

oops, it is not centos 6.3, it is RHEL 6.3

Actions #2

Updated by Sage Weil over 10 years ago

It looks to me like the last command that mon create ran is finished, but pushy still has its connection open. that function does almost nothing except check_call() (which we see ran and completed), but pushy didn't finish the call. possibly a problem with the close() call?

distro.mon.create(distro, rlogger, args, monitor_keyring)
distro.sudo_conn.close()
Actions #3

Updated by Tamilarasi muthamizhan over 10 years ago

  • Priority changed from Urgent to Immediate

this issue seems to be on ubuntu systems as well.

we need this issue to be resolved so we can test the upgrades from cuttlefish to dumpling using ceph-deploy. [until then using teuthology install.upgrade task]

Actions #4

Updated by Alfredo Deza over 10 years ago

  • Status changed from New to 4

Tamil, you mentioned on IRC that you could not reproduce this anymore, can you confirm that is the case to resolve this ticket?

Actions #5

Updated by Tamilarasi muthamizhan over 10 years ago

Alfredo, I mentioned that the issue is not reproducible on a single node but the problem still exists with 2 nodes.

Actions #6

Updated by Alfredo Deza over 10 years ago

Ugh, I keep failing to replicate :(

The error on the second one is the famous ulimit configuration but that is not related to this.

ceph-deploy mon create burnupi27.front.sepia.ceph.com burnupi28.front.sepia.ceph.com
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts burnupi27.front.sepia.ceph.com burnupi28.front.sepia.ceph.com
[ceph_deploy.mon][DEBUG ] detecting platform for host burnupi27.front.sepia.ceph.com ...
[ceph_deploy.mon][INFO  ] distro info: RedHatEnterpriseServer 6.3 Santiago
[burnupi27.front.sepia.ceph.com][DEBUG ] deploying mon to burnupi27.front.sepia.ceph.com
[burnupi27.front.sepia.ceph.com][DEBUG ] remote hostname: burnupi27
[burnupi27.front.sepia.ceph.com][INFO  ] write cluster configuration to /etc/ceph/{cluster}.conf
[burnupi27.front.sepia.ceph.com][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-burnupi27/done
[burnupi27.front.sepia.ceph.com][INFO  ] create a done file to avoid re-doing the mon deployment
[burnupi27.front.sepia.ceph.com][INFO  ] create the init path if it does not exist
[burnupi27.front.sepia.ceph.com][INFO  ] locating `service` executable...
[burnupi27.front.sepia.ceph.com][INFO  ] found `service` executable: /sbin/service
[burnupi27.front.sepia.ceph.com][INFO  ] Running command: /sbin/service ceph start mon.burnupi27
[burnupi27.front.sepia.ceph.com][INFO  ] === mon.burnupi27 ===
[burnupi27.front.sepia.ceph.com][INFO  ] Starting Ceph mon.burnupi27 on burnupi27...
[burnupi27.front.sepia.ceph.com][INFO  ] Starting ceph-create-keys on burnupi27...
[ceph_deploy.mon][DEBUG ] detecting platform for host burnupi28.front.sepia.ceph.com ...
The authenticity of host 'burnupi28.front.sepia.ceph.com (10.214.135.34)' can't be established.
RSA key fingerprint is 74:f1:99:c3:ee:55:45:f0:39:0c:48:ac:f3:09:79:90.
Are you sure you want to continue connecting (yes/no)? yes
[ceph_deploy.mon][INFO  ] distro info: RedHatEnterpriseServer 6.3 Santiago
[burnupi28.front.sepia.ceph.com][DEBUG ] deploying mon to burnupi28.front.sepia.ceph.com
[burnupi28.front.sepia.ceph.com][DEBUG ] remote hostname: burnupi28
[burnupi28.front.sepia.ceph.com][INFO  ] write cluster configuration to /etc/ceph/{cluster}.conf
[burnupi28.front.sepia.ceph.com][INFO  ] creating path: /var/lib/ceph/mon/ceph-burnupi28
[burnupi28.front.sepia.ceph.com][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-burnupi28/done
[burnupi28.front.sepia.ceph.com][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-burnupi28/done
[burnupi28.front.sepia.ceph.com][INFO  ] creating keyring file: /var/lib/ceph/tmp/ceph-burnupi28.mon.keyring
[burnupi28.front.sepia.ceph.com][INFO  ] create the monitor keyring file
[burnupi28.front.sepia.ceph.com][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i burnupi28 --keyring /var/lib/ceph/tmp/ceph-burnupi28.mon.keyring
[burnupi28.front.sepia.ceph.com][INFO  ] ceph-mon: set fsid to 072e8e7b-d417-4c6a-b383-f5e0c8465bbd
[burnupi28.front.sepia.ceph.com][INFO  ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-burnupi28 for mon.burnupi28
[burnupi28.front.sepia.ceph.com][INFO  ] unlinking keyring file /var/lib/ceph/tmp/ceph-burnupi28.mon.keyring
[burnupi28.front.sepia.ceph.com][INFO  ] create a done file to avoid re-doing the mon deployment
[burnupi28.front.sepia.ceph.com][INFO  ] create the init path if it does not exist
[burnupi28.front.sepia.ceph.com][INFO  ] locating `service` executable...
[burnupi28.front.sepia.ceph.com][INFO  ] found `service` executable: /sbin/service
[burnupi28.front.sepia.ceph.com][INFO  ] Running command: /sbin/service ceph start mon.burnupi28
[burnupi28.front.sepia.ceph.com][ERROR ] Traceback (most recent call last):
[burnupi28.front.sepia.ceph.com][ERROR ]   File "/Users/alfredo/python/ceph-deploy/ceph_deploy/hosts/centos/mon/create.py", line 16, in create
[burnupi28.front.sepia.ceph.com][ERROR ]   File "/Users/alfredo/python/ceph-deploy/ceph_deploy/util/decorators.py", line 10, in inner
[burnupi28.front.sepia.ceph.com][ERROR ]   File "/Users/alfredo/python/ceph-deploy/ceph_deploy/util/wrappers.py", line 6, in remote_call
[burnupi28.front.sepia.ceph.com][ERROR ]   File "/usr/lib64/python2.6/subprocess.py", line 502, in check_call
[burnupi28.front.sepia.ceph.com][ERROR ]     raise CalledProcessError(retcode, cmd)
[burnupi28.front.sepia.ceph.com][ERROR ] CalledProcessError: Command '['/sbin/service', 'ceph', 'start', 'mon.burnupi28']' returned non-zero exit status 1
[burnupi28.front.sepia.ceph.com][INFO  ] === mon.burnupi28 ===
[burnupi28.front.sepia.ceph.com][INFO  ] Starting Ceph mon.burnupi28 on burnupi28...
[burnupi28.front.sepia.ceph.com][INFO  ] failed: 'ulimit -n 32768;  /usr/bin/ceph-mon -i burnupi28 --pid-file /var/run/ceph/mon.burnupi28.pid -c /etc/ceph/ceph.conf '
[burnupi28.front.sepia.ceph.com][INFO  ] Starting ceph-create-keys on burnupi28...
Actions #7

Updated by Alfredo Deza over 10 years ago

  • Status changed from 4 to Fix Under Review

With Sage's suggestion to execute directly on those nodes, I was able to replicate the problem and find a fix.

A pull request was opened: https://github.com/ceph/ceph-deploy/pull/41

Actions #8

Updated by Alfredo Deza over 10 years ago

  • Status changed from Fix Under Review to Resolved

Merged into ceph-deploy master branch: 40c8088e011670d881432a94d8f159355d57d1e5

The problem here is that, by not re-raising when there was an exception, pushy would hang on to the stdout/stderr streams, which combined
with the context manager utility, would create a deadlock when the remote had an exception.

This is a two-way benefit, since we will also not continue execution of remote commands if there is an exception being raised at some point.

Actions #9

Updated by Tamilarasi muthamizhan over 10 years ago

  • Status changed from Resolved to In Progress

hitting this issue now on centos 6.4 - burnupi05, burnupi21.

Alfredo is already on it.

Actions #10

Updated by Alfredo Deza over 10 years ago

I opened a bug in Pushy: https://github.com/axw/pushy/issues/45

Still investigating and trying all kinds of things. So so frustrating :(

Actions #11

Updated by Alfredo Deza over 10 years ago

The only way I see around this (other than pushy fixing this problem) is to avoid capturing the stdout/stderr of the remote end as we are currently doing.

Until pushy decides to get a fix, I am going to implement a flag that will not capture anything so that if this problem comes up again we can just pass a
'no-capture' flag.

Very unfortunate because it was nice to be able to capture everything from the remote end.

Actions #12

Updated by Tamilarasi muthamizhan over 10 years ago

  • Status changed from In Progress to Duplicate
  • Parent task set to #5971
Actions #13

Updated by Alfredo Deza over 10 years ago

  • Status changed from Duplicate to In Progress
Actions #14

Updated by Alfredo Deza over 10 years ago

  • Status changed from In Progress to Fix Under Review

I have opened a new pull request with some tested changes that fix this problem: https://github.com/ceph/ceph-deploy/pull/44

Actions #15

Updated by Zack Cerza over 10 years ago

I'll merge this pull request but I really want a ticket to stay open reminding us that this needs to be fixed and not just worked around.

Actions #17

Updated by Alfredo Deza over 10 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF