Bug #6043
closedupstart does not reflect running ceph-osd daemons (ubuntu 13.04 only)
100%
Description
Workaround¶
Using restart instead of reload restarts the daemons instead of sending them a signal that gracefully reopens the log.
perl -pi -e 's/reload/restart/' /etc/logrotate.d/ceph
Original description¶
ubuntu 13.04, ceph from ceph.com repository. (0.67.1-1raring)
according to documentation [[http://ceph.com/docs/master/rados/operations/add-or-rm-osds/]], this should work.
root@zc2store:~# ps faux | grep ceph-osd root 21656 0.0 0.0 9436 960 pts/2 S+ 12:52 0:00 \_ grep --color=auto ceph-osd root 21475 0.0 0.0 4440 628 ? Ss 12:47 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh root 21476 0.3 0.0 438156 24556 ? Sl 12:47 0:00 \_ /usr/bin/ceph-osd --cluster=ceph -i 0 -f root@zc2store:~# /etc/init.d/ceph stop osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) root@zc2store:~# mount | grep ceph /dev/sdb1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noatime) root@zc2store:~#
Updated by Sage Weil over 10 years ago
- Status changed from New to Rejected
stop ceph-osd id=0
or
stop ceph-osd-all
Updated by Zoltan Arnold Nagy over 10 years ago
well...
root@signina:~# service ceph-osd id=11 ceph-osd: unrecognized service root@signina:~# service ceph osd id=11 /etc/init.d/ceph: id=.11 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) root@signina:~# service ceph osd 11 /etc/init.d/ceph: 11. not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )
but more importantly:
root@signina:~# ps faux | grep ceph | grep '\-i 11' root 3047 0.3 0.0 707088 172644 ? Sl Aug17 10:31 /usr/bin/ceph-osd --cluster=ceph -i 11 -f root@signina:~# service ceph osd.11 root@signina:~# ps faux | grep ceph | grep '\-i 11' root 3047 0.3 0.0 707088 172644 ? Sl Aug17 10:31 /usr/bin/ceph-osd --cluster=ceph -i 11 -f root@signina:~#
Updated by Sage Weil over 10 years ago
- Status changed from Rejected to In Progress
is the ceph package still installed? some older versions didn't stop the jobs before they uninstalled, which might explain this.
initctl list | grep ceph
Updated by Zoltan Arnold Nagy over 10 years ago
getting somewhere. but it still can't find it.
root@signina:~# ps faux | grep ceph root 3028 0.3 0.1 786888 267732 ? Ssl Aug17 11:54 /usr/bin/ceph-osd --cluster=ceph -i 6 -f root 3032 0.3 0.0 699544 184632 ? Sl Aug17 10:35 /usr/bin/ceph-osd --cluster=ceph -i 8 -f root 3039 0.3 0.1 765468 250896 ? Sl Aug17 12:10 /usr/bin/ceph-osd --cluster=ceph -i 10 -f root 3043 0.3 0.0 653020 131000 ? Sl Aug17 10:54 /usr/bin/ceph-osd --cluster=ceph -i 7 -f root 3049 0.3 0.1 718976 205100 ? Sl Aug17 10:42 /usr/bin/ceph-osd --cluster=ceph -i 9 -f root 3078 0.3 0.1 767140 252344 ? Sl Aug17 13:03 /usr/bin/ceph-osd --cluster=ceph -i 5 -f root 5076 0.0 0.0 9436 956 pts/4 S+ 22:02 0:00 \_ grep --color=auto ceph root@signina:~# initctl list | grep ceph ceph-mds-all-starter stop/waiting ceph-mds-all start/running ceph-osd-all start/running ceph-osd-all-starter stop/waiting ceph-all start/running ceph-mon-all start/running ceph-mon-all-starter stop/waiting ceph-mon stop/waiting ceph-create-keys stop/waiting ceph-osd (ceph/6) start/running, process 3028 ceph-mds stop/waiting root@signina:~# stop ceph-osd id=10 stop: Unknown instance: ceph/10 root@signina:~#
Updated by Sage Weil over 10 years ago
- Subject changed from stopping an osd doesn't work on ubuntu to upstart does not reflect running ceph-osd daemons
- Status changed from In Progress to Need More Info
is this still a problem? unless we can figure out the sequence to reproduce this i'm not sure what to do here. upstart is generally quite reliable about tracking the running processes. was there an upgrade involved?
Updated by Hunter Nield over 10 years ago
I seem to be running into a similar issue. Running 13.04 and 0.67.2 but it was also happening with 0.61.7, it seems that the admin sockets (for OSDs and Mon) get deleted at some point. I've not been able to determine when or how this is happening. I'll keep investigating.
Updated by Hunter Nield over 10 years ago
Has anyone else experienced this issue? It seems to be affecting a few others - http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-August/003402.html
Updated by Hunter Nield over 10 years ago
I've done some further investigation on this. It seems that logrotate is the culprit.
After executing the following lines in /etc/logrotate.d/ceph
initctl list \ | sed -n 's/^\(ceph-\(mon\|osd\|mds\)\+\)[ \t]\+(\([^ \/]\+\)\/\([^ \/]\+\))[ \t]\+start\/.*$/\1 cluster=\3 id=\4/p' \ | while read l; do initctl reload -- $l 2>/dev/null || : done
Then the admin sockets are missing from /var/run/ceph
and the parent processes for each mon/osd (eg. /bin/sh -e -c /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh
) are missing
If I run initctl reload -- ceph-osd cluster=ceph id=X
for a healthy OSD then I can trigger the issue.
Updated by Hunter Nield over 10 years ago
I should have done a little more digging in my previous update but it looks more likely that sending a HUP is where the problem lies - kill -s HUP <process>
will trigger the issue.
Updated by Samuel Just over 10 years ago
- Status changed from Need More Info to 12
- Assignee set to Tamilarasi muthamizhan
Need to reproduce.
Updated by Sage Weil over 10 years ago
- Status changed from 12 to Need More Info
- Priority changed from Normal to High
Tamil, can you try to reproduce this? (look at the last two comments.. sending HUP or issuing the reload seems to break things)
Updated by Tamilarasi muthamizhan over 10 years ago
- Status changed from Need More Info to Can't reproduce
I am not able to reproduce this issue on raring with latest stable dumpling branch [v0.67.4]
test setup tried: vpm018. It is still on the same state, if anyone is interested.
Updated by Samuel Just over 10 years ago
Hunter, what are the osd data directories named?
Updated by Hunter Nield over 10 years ago
We're using Chef to install our nodes (which uses ceph-disk tools to manage the disks) so:
/var/lib/ceph/osd/ceph-<id>/
Updated by Loïc Dachary almost 10 years ago
- Subject changed from upstart does not reflect running ceph-osd daemons to upstart does not reflect running ceph-osd daemons (ubuntu 13.04 only)
- Description updated (diff)
- Status changed from Can't reproduce to Won't Fix
- % Done changed from 0 to 100