Project

General

Profile

Actions

Bug #8344

closed

Upstart scripts silently fail when asok missing

Added by Mike Dawson almost 10 years ago. Updated almost 10 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In situations like Issue 7188, the admin socket can be lost from /var/run/ceph/ceph-<daemon>.<name>.asok. When this happens, the Upstart (and possibly sysvinit) scripts do not properly stop or restart ceph daemons.

root@node1:~# ls /var/run/ceph
total 0
drwxr-xr-x 2 root root 100 May 13 18:55 .
drwxr-xr-x 22 root root 860 May 13 18:33 ..
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-mon.node1.asok
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-osd.0.asok
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-osd.1.asok

root@node1:~# ps -ef | grep ceph-mon
root 4679 1 0 18:55 ? 00:00:00 /bin/sh -e -c /usr/bin/ceph-mon --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh
root 4680 4679 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17116 29198 0 19:38 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# logrotate --force /etc/logrotate.d/ceph

root@node1:~# ls /var/run/ceph
total 0
drwxr-xr-x 2 root root 80 May 13 19:39 .
drwxr-xr-x 22 root root 860 May 13 18:33 ..
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-osd.0.asok
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-osd.1.asok

root@node1:~# ps -ef | grep ceph-mon
root 4680 1 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17329 29198 0 19:39 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# stop ceph-all
ceph-all stop/waiting

root@node1:~# ps -ef | grep ceph-mon
root 4680 1 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17416 29198 0 19:39 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# restart ceph-all
restart: Unknown instance:

root@node1:~# ps -ef | grep ceph-mon
root 4680 1 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17491 29198 0 19:39 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# start ceph-all
ceph-all start/running

root@node1:~# ps -ef | grep ceph-mon
root 4680 1 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17833 29198 0 19:40 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# ls /var/run/ceph
total 0
drwxr-xr-x 2 root root 80 May 13 19:39 .
drwxr-xr-x 22 root root 860 May 13 18:33 ..
srwxr-xr-x 1 root root 0 May 13 19:39 ceph-osd.0.asok
srwxr-xr-x 1 root root 0 May 13 19:39 ceph-osd.1.asok

Ideally, the upstart scripts would work without an asok present to stop/restart the daemon. Or, at least throw some output to notify the operator of an issue in need of manual intervention.

Actions #1

Updated by Sage Weil almost 10 years ago

  • Priority changed from High to Urgent
Actions #2

Updated by Sage Weil almost 10 years ago

  • Assignee set to Sage Weil
Actions #3

Updated by Sage Weil almost 10 years ago

  • Status changed from New to Can't reproduce
  • Priority changed from Urgent to High

I tried reproducing this both on master and on dumpling and couldn't make it happen. I could see any problems when the asok file was removed (daemons would still stop/start) and i also didn't see the asok file disappear when logrotate ran (i had to remove them manually).

Are there changes to your logrotate file or something? what version are you running?

Actions #4

Updated by Mike Dawson almost 10 years ago

I saw this with Emperor and Firefly running on Ubuntu Raring. Perhaps it only affects Raring, similar to #7188.

Actions

Also available in: Atom PDF