Bug #6043: upstart does not reflect running ceph-osd daemons (ubuntu 13.04 only) - Ceph - Ceph

Actions

Copy link

Bug #6043

closed

upstart does not reflect running ceph-osd daemons (ubuntu 13.04 only)

Added by Zoltan Arnold Nagy over 10 years ago. Updated almost 10 years ago.

Status:

Won't Fix

Priority:

High

Assignee:

Tamilarasi muthamizhan

Category:

Target version:

% Done:

100%

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Workaround¶

Using restart instead of reload restarts the daemons instead of sending them a signal that gracefully reopens the log.

perl -pi -e 's/reload/restart/' /etc/logrotate.d/ceph

Original description¶

ubuntu 13.04, ceph from ceph.com repository. (0.67.1-1raring)

according to documentation [[http://ceph.com/docs/master/rados/operations/add-or-rm-osds/]], this should work.

root@zc2store:~# ps faux | grep ceph-osd
root     21656  0.0  0.0   9436   960 pts/2    S+   12:52   0:00          \_ grep --color=auto ceph-osd
root     21475  0.0  0.0   4440   628 ?        Ss   12:47   0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh
root     21476  0.3  0.0 438156 24556 ?        Sl   12:47   0:00  \_ /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root@zc2store:~# /etc/init.d/ceph stop osd.0
/etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )
root@zc2store:~# mount | grep ceph
/dev/sdb1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noatime)
root@zc2store:~#

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Sage Weil over 10 years ago

Status changed from New to Rejected

stop ceph-osd id=0
or
stop ceph-osd-all

Actions

Copy link

Updated by Zoltan Arnold Nagy over 10 years ago

well...

root@signina:~# service ceph-osd id=11
ceph-osd: unrecognized service
root@signina:~# service ceph osd id=11
/etc/init.d/ceph: id=.11 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )
root@signina:~# service ceph osd 11
/etc/init.d/ceph: 11. not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )

but more importantly:

root@signina:~# ps faux | grep ceph | grep '\-i 11'
root      3047  0.3  0.0 707088 172644 ?       Sl   Aug17  10:31 /usr/bin/ceph-osd --cluster=ceph -i 11 -f
root@signina:~# service ceph osd.11
root@signina:~# ps faux | grep ceph | grep '\-i 11'
root      3047  0.3  0.0 707088 172644 ?       Sl   Aug17  10:31 /usr/bin/ceph-osd --cluster=ceph -i 11 -f
root@signina:~#

Actions

Copy link

Updated by Sage Weil over 10 years ago

Status changed from Rejected to In Progress

is the ceph package still installed? some older versions didn't stop the jobs before they uninstalled, which might explain this.

initctl list | grep ceph

Actions

Copy link

Updated by Zoltan Arnold Nagy over 10 years ago

getting somewhere. but it still can't find it.

root@signina:~# ps faux | grep ceph
root      3028  0.3  0.1 786888 267732 ?       Ssl  Aug17  11:54 /usr/bin/ceph-osd --cluster=ceph -i 6 -f
root      3032  0.3  0.0 699544 184632 ?       Sl   Aug17  10:35 /usr/bin/ceph-osd --cluster=ceph -i 8 -f
root      3039  0.3  0.1 765468 250896 ?       Sl   Aug17  12:10 /usr/bin/ceph-osd --cluster=ceph -i 10 -f
root      3043  0.3  0.0 653020 131000 ?       Sl   Aug17  10:54 /usr/bin/ceph-osd --cluster=ceph -i 7 -f
root      3049  0.3  0.1 718976 205100 ?       Sl   Aug17  10:42 /usr/bin/ceph-osd --cluster=ceph -i 9 -f
root      3078  0.3  0.1 767140 252344 ?       Sl   Aug17  13:03 /usr/bin/ceph-osd --cluster=ceph -i 5 -f
root      5076  0.0  0.0   9436   956 pts/4    S+   22:02   0:00          \_ grep --color=auto ceph
root@signina:~# initctl list | grep ceph
ceph-mds-all-starter stop/waiting
ceph-mds-all start/running
ceph-osd-all start/running
ceph-osd-all-starter stop/waiting
ceph-all start/running
ceph-mon-all start/running
ceph-mon-all-starter stop/waiting
ceph-mon stop/waiting
ceph-create-keys stop/waiting
ceph-osd (ceph/6) start/running, process 3028
ceph-mds stop/waiting
root@signina:~# stop ceph-osd id=10
stop: Unknown instance: ceph/10
root@signina:~#

Actions

Copy link

Updated by Sage Weil over 10 years ago

Subject changed from stopping an osd doesn't work on ubuntu to upstart does not reflect running ceph-osd daemons
Status changed from In Progress to Need More Info

is this still a problem? unless we can figure out the sequence to reproduce this i'm not sure what to do here. upstart is generally quite reliable about tracking the running processes. was there an upgrade involved?

Actions

Copy link

Updated by Hunter Nield over 10 years ago

I seem to be running into a similar issue. Running 13.04 and 0.67.2 but it was also happening with 0.61.7, it seems that the admin sockets (for OSDs and Mon) get deleted at some point. I've not been able to determine when or how this is happening. I'll keep investigating.

Actions

Copy link

Updated by Hunter Nield over 10 years ago

Has anyone else experienced this issue? It seems to be affecting a few others - http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-August/003402.html

Actions

Copy link

Updated by Hunter Nield over 10 years ago

I've done some further investigation on this. It seems that logrotate is the culprit.

After executing the following lines in /etc/logrotate.d/ceph

initctl list \
        | sed -n 's/^\(ceph-\(mon\|osd\|mds\)\+\)[ \t]\+(\([^ \/]\+\)\/\([^ \/]\+\))[ \t]\+start\/.*$/\1 cluster=\3 id=\4/p' \
        | while read l; do
        initctl reload -- $l 2>/dev/null || :
        done

Then the admin sockets are missing from /var/run/ceph and the parent processes for each mon/osd (eg. /bin/sh -e -c /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh) are missing

If I run initctl reload -- ceph-osd cluster=ceph id=X for a healthy OSD then I can trigger the issue.

Actions

Copy link

Updated by Hunter Nield over 10 years ago

I should have done a little more digging in my previous update but it looks more likely that sending a HUP is where the problem lies - kill -s HUP <process> will trigger the issue.

Actions

Copy link

#10

Updated by Samuel Just over 10 years ago

Status changed from Need More Info to 12
Assignee set to Tamilarasi muthamizhan

Need to reproduce.

Actions

Copy link

#11

Updated by Sage Weil over 10 years ago

Status changed from 12 to Need More Info
Priority changed from Normal to High

Tamil, can you try to reproduce this? (look at the last two comments.. sending HUP or issuing the reload seems to break things)

Actions

Copy link

#12

Updated by Tamilarasi muthamizhan over 10 years ago

Status changed from Need More Info to Can't reproduce

I am not able to reproduce this issue on raring with latest stable dumpling branch [v0.67.4]

test setup tried: vpm018. It is still on the same state, if anyone is interested.

Actions

Copy link

#13

Updated by Samuel Just over 10 years ago

Hunter, what are the osd data directories named?

Actions

Copy link

#14

Updated by Hunter Nield over 10 years ago

We're using Chef to install our nodes (which uses ceph-disk tools to manage the disks) so:

/var/lib/ceph/osd/ceph-<id>/

Actions

Copy link

#15

Updated by Loïc Dachary almost 10 years ago

Subject changed from upstart does not reflect running ceph-osd daemons to upstart does not reflect running ceph-osd daemons (ubuntu 13.04 only)
Description updated (diff)
Status changed from Can't reproduce to Won't Fix
% Done changed from 0 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #6043

upstart does not reflect running ceph-osd daemons (ubuntu 13.04 only)

Workaround¶

Original description¶

Updated by Sage Weil over 10 years ago

Updated by Zoltan Arnold Nagy over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Zoltan Arnold Nagy over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Hunter Nield over 10 years ago

Updated by Hunter Nield over 10 years ago

Updated by Hunter Nield over 10 years ago

Updated by Hunter Nield over 10 years ago

Updated by Samuel Just over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Tamilarasi muthamizhan over 10 years ago

Updated by Samuel Just over 10 years ago

Updated by Hunter Nield over 10 years ago

Updated by Loïc Dachary almost 10 years ago