Bug #7188
Updated by Loïc Dachary almost 10 years ago
h3. h1. Description Steps to reproduce, using a fresh install of Ubuntu raring (it does not show on Ubuntu trusty) and firefly. <pre> root@raring:/etc/ceph# start ceph-mon id=raring ceph-mon (ceph/raring) start/running, process 6488 root@raring:/etc/ceph# ps -fauwwx | grep ceph- warning: bad ps syntax, perhaps a bogus '-'? See http://gitorious.org/procps/procps/blobs/master/Documentation/FAQ root 6506 0.0 0.0 6560 640 pts/1 S+ 16:17 0:00 \_ grep --color=auto ceph- root 6488 0.0 0.0 4444 624 ? Ss 16:17 0:00 /bin/sh -e -c /usr/bin/ceph-mon --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh root 6489 1.0 0.8 134772 9116 ? Sl 16:17 0:00 \_ /usr/bin/ceph-mon --cluster=ceph -i raring -f root@raring:/etc/ceph# reload ceph-mon id=raring root@raring:/etc/ceph# ls -l /var/run/ceph total 0 root@raring:/etc/ceph# status ceph-mon id=raring status: Unknown instance: ceph/raring </pre> Note that if the "/bin/sh" is not present, the problem does not show. Run *restart* until the "/bin/sh" process parent of */usr/bin/ceph-mon* shows up. *reload* kills the intermediate shell, notices that it is dead and run the *post-stop* script that removes the *asok* file. The *ceph-mon* is no longer managed by upstart and cannot be notified. h3. h1. Initial report My /var/run/ceph/*.asok for OSDs and mon were mysteriously gone when I came back in the morning after running overnight. I noticed that they had disappeared at the same time as a log rotation, and sure enough calling "logrotate --force /etc/logrotate.d/ceph" leaves services running with no socket files. It looks like the part that's causing the problem is the "initctl reload ceph-mon id=xxx", which on my system is leaving the original service PID running, and trying to start a new one at the same time: logs show the new process failing to start with "failed to create new leveldb store" while the existing process continues to function. Presumably it's the new process which is deleting the socket files in spite of failing to come up successfully. Package in use is 0.72.2-1raring. This is kind of severe for anyone using a monitoring system that relies on the socket files to see and talk to the Ceph processes. If we can show that this issue is limited to ubuntu 13.04 then this is less of a big deal: I wouldn't be surprised if it's quite sensitive to extra distro version in use.