Bug #5390
closedceph-deploy osd create hangs
0%
Description
On Ubuntu 13.04 with ceph 0.61.3 .
It hangs when creating a new osd using ceph-deploy.
ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdd
ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdb
ceph@ceph-node4:~/mycluster$ ceph-deploy osd create ceph-node4:sdb:sdd
^CTraceback (most recent call last):
File "/home/ceph/ceph-deploy/ceph-deploy", line 9, in <module>
load_entry_point('ceph-deploy==0.1', 'console_scripts', 'ceph-deploy')()
File "/home/ceph/ceph-deploy/ceph_deploy/cli.py", line 112, in main
return args.func(args)
File "/home/ceph/ceph-deploy/ceph_deploy/osd.py", line 425, in osd
prepare(args, cfg, activate_prepared_disk=True)
File "/home/ceph/ceph-deploy/ceph_deploy/osd.py", line 265, in prepare
dmcrypt_dir=args.dmcrypt_key_dir,
File "/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/proxy.py", line 255, in <lambda>
(conn.operator(type_, self, args, kwargs))
File "/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/connection.py", line 66, in operator
return self.send_request(type_, (object, args, kwargs))
File "/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py", line 315, in send_request
m = self.__waitForResponse(handler)
File "/home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py", line 412, in _waitForResponse
self._processing_condition.wait()
File "/usr/lib/python2.7/threading.py", line 339, in wait
waiter.acquire()
ps aux | grep ceph
ceph 4015 0.0 1.1 118412 11404 pts/1 Sl+ 20:51 0:00 /home/ceph/ceph-deploy/virtualenv/bin/python /home/ceph/ceph-deploy/ceph-deploy osd create ceph-node4:sdb:sdd
root 4043 0.0 0.0 4444 628 pts/1 S+ 20:51 0:00 /bin/sh /usr/sbin/ceph-disk-prepare -- /dev/sdb /dev/sdd
root 4049 0.1 0.9 43216 9876 pts/1 S+ 20:51 0:00 /usr/bin/python /usr/sbin/ceph-disk prepare -- /dev/sdb /dev/sdd
As I said in the mailing list, the root cause has been found:
Previously I ran "ceph-deploy osd create ceph-node4:sdb" by mistake. I terminated it by "Ctrl-c". Therefore the lock on /var/lib/ceph/tmp/ceph-disk.prepare.lock.lock was not released.
So the next "ceph-deploy osd create" was hanging waiting for the lock.
It's a user error, but not easy to be located.
To avoid this problem, maybe we can catch SIGINT in the command ceph-disk:
import signal
import sys
def signal_handler(signal, frame):
prepare_lock.release()
sys.exit(0)
....
signal.signal(signal.SIGINT, signal_handler)
Or at least, for better problem determination, IMHO, a meaningful error message should be prompted by "ceph-deploy osd prepare" instead of running until hang.
Updated by Sage Weil almost 11 years ago
- Status changed from New to In Progress
- Priority changed from Normal to High
see also #5387. and i'll add the sigint handler to reduce the probability of this happening!
Updated by Sage Weil almost 11 years ago
- Status changed from In Progress to Fix Under Review
- Priority changed from High to Urgent
care to review teh top patch in wip-ceph-disk?
alternatively, do you know of a replacement for lockfile that will detect when the owning pid is not running? this would be more robust...
Updated by Sage Weil almost 11 years ago
starting with the mercurial lock implementation, which uses a pid. see wip-ceph-disk-lock, tho still incomplete.
Updated by Sage Weil almost 11 years ago
bah, trivial fcntl(2) is all we need here.
Updated by Sage Weil almost 11 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Sage Weil almost 11 years ago
- Status changed from Pending Backport to Resolved