Bug #8344: Upstart scripts silently fail when asok missing - Ceph - Ceph

Actions

Copy link

Bug #8344

closed

Upstart scripts silently fail when asok missing

Added by Mike Dawson almost 10 years ago. Updated almost 10 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

In situations like Issue 7188, the admin socket can be lost from /var/run/ceph/ceph-<daemon>.<name>.asok. When this happens, the Upstart (and possibly sysvinit) scripts do not properly stop or restart ceph daemons.

root@node1:~# ls /var/run/ceph
total 0
drwxr-xr-x 2 root root 100 May 13 18:55 .
drwxr-xr-x 22 root root 860 May 13 18:33 ..
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-mon.node1.asok
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-osd.0.asok
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-osd.1.asok

root@node1:~# ps -ef | grep ceph-mon
root 4679 1 0 18:55 ? 00:00:00 /bin/sh -e -c /usr/bin/ceph-mon --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh
root 4680 4679 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17116 29198 0 19:38 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# logrotate --force /etc/logrotate.d/ceph

root@node1:~# ls /var/run/ceph
total 0
drwxr-xr-x 2 root root 80 May 13 19:39 .
drwxr-xr-x 22 root root 860 May 13 18:33 ..
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-osd.0.asok
srwxr-xr-x 1 root root 0 May 13 18:55 ceph-osd.1.asok

root@node1:~# ps -ef | grep ceph-mon
root 4680 1 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17329 29198 0 19:39 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# stop ceph-all
ceph-all stop/waiting

root@node1:~# ps -ef | grep ceph-mon
root 4680 1 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17416 29198 0 19:39 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# restart ceph-all
restart: Unknown instance:

root@node1:~# ps -ef | grep ceph-mon
root 4680 1 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17491 29198 0 19:39 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# start ceph-all
ceph-all start/running

root@node1:~# ps -ef | grep ceph-mon
root 4680 1 0 18:55 ? 00:00:05 /usr/bin/ceph-mon --cluster=ceph -i node1 -f
root 17833 29198 0 19:40 pts/0 00:00:00 grep --color=auto ceph-mon

root@node1:~# ls /var/run/ceph
total 0
drwxr-xr-x 2 root root 80 May 13 19:39 .
drwxr-xr-x 22 root root 860 May 13 18:33 ..
srwxr-xr-x 1 root root 0 May 13 19:39 ceph-osd.0.asok
srwxr-xr-x 1 root root 0 May 13 19:39 ceph-osd.1.asok

Ideally, the upstart scripts would work without an asok present to stop/restart the daemon. Or, at least throw some output to notify the operator of an issue in need of manual intervention.

Actions

Copy link

Updated by Sage Weil almost 10 years ago

Priority changed from High to Urgent

Actions

Copy link

Updated by Sage Weil almost 10 years ago

Assignee set to Sage Weil

Actions

Copy link

Updated by Sage Weil almost 10 years ago

Status changed from New to Can't reproduce
Priority changed from Urgent to High

I tried reproducing this both on master and on dumpling and couldn't make it happen. I could see any problems when the asok file was removed (daemons would still stop/start) and i also didn't see the asok file disappear when logrotate ran (i had to remove them manually).

Are there changes to your logrotate file or something? what version are you running?

Actions

Copy link

Updated by Mike Dawson almost 10 years ago

I saw this with Emperor and Firefly running on Ubuntu Raring. Perhaps it only affects Raring, similar to #7188.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #8344

Upstart scripts silently fail when asok missing

Updated by Sage Weil almost 10 years ago

Updated by Sage Weil almost 10 years ago

Updated by Sage Weil almost 10 years ago

Updated by Mike Dawson almost 10 years ago