Bug #2382
closed
osd: unable to start due to 1 child already started
Added by Joao Eduardo Luis almost 12 years ago.
Updated almost 12 years ago.
Description
I had seen this bug a few days ago while setting up ceph on my desktop, but it went away by rerunning ./ceph-osd so I didn't give it a second thought.
jeffp have been seeing this bug for what appears to be quite a lot:
<jeffp> it's been intermittent but now it's happening every time
<gregaf> oh dear, we've got a race somewhere then...
<jeffp> yeah that's what i was thinking
<jeffp> i'm running a 5 node cluster running on centos inside virtualbox all on the same machine
<jeffp> maybe the slowness is exacerbating it
The error: -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
jeffp's log:
[root@osd0 ceph]# /etc/init.d/ceph -a start
=== mon.a ===
Starting Ceph mon.a on osd0...
starting mon.a rank 0 at 192.168.56.100:6789/0 mon_data /var/local/ceph/mon.a fsid abaf5302-13cc-4531-990f-c56935679649
=== mon.b ===
Starting Ceph mon.b on osd1...
starting mon.b rank 1 at 192.168.56.101:6789/0 mon_data /var/local/ceph/mon.b fsid abaf5302-13cc-4531-990f-c56935679649
=== mon.c ===
Starting Ceph mon.c on osd2...
starting mon.c rank 2 at 192.168.56.102:6789/0 mon_data /var/local/ceph/mon.c fsid abaf5302-13cc-4531-990f-c56935679649
=== osd.0 ===
Starting Ceph osd.0 on osd0...
starting osd.0 at :/0 osd_data /var/local/ceph/osd.0 /var/local/ceph/osd.0/journal
2012-05-03 17:36:31.005537 7f98df29f760 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
failed: ' /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf '
update: if i start the osd's with ceph-osd manually this doesn't seem to happen - just with /etc/init.d/ceph start and service ceph start
- Status changed from New to 12
- Assignee set to Sage Weil
- Priority changed from High to Urgent
i saw this on congress too. will reproduce on my burnupi cluster and investigate.
I re-triggered this using 'CEPH_NUM_OSD=1 CEPH_NUM_MDS=1 CEPH_NUM_MON=1 ./vstart.sh' on my desktop (granted, it's a desktop).
The bug can pop up not only on the osd but also on the mon and the mds.
Also, occasionally, this also happens with ./init-ceph when starting all services, or each one individually. For instance, this happened 3 times in a row:
jecluis@Magrathea:~/Code/ceph/src$ ./init-ceph start mds
=== mds.a ===
Starting Ceph mds.a on Magrathea...
starting mds.a at :/0
2012-05-04 02:04:56.412819 7f9c5f5ae780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
failed: ' ./ceph-mds -i a --pid-file deploy/out/mds.a.pid -c ./ceph.conf '
It has never happened (if I recall correctly) when starting each service individually with ./ceph-{osd,mon,mds}.
I just built fresh 0.46 rpms (ran 0.45 before) and now I'm seeing this too.
Notice the timestamps. I had to call this quite a few times in rapid succession until the daemon finally started:
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:45.695992 7f0689416780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.127370 7f0cfc205780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.532800 7f7358edd780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:56.914539 7f7bf6424780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:57.265571 7f3f87f1d780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:57.652731 7f881ffbf780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.025576 7f95ad08e780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.398080 7face0770780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:58.758070 7fac62bb0780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.087255 7f28331ac780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.426267 7fe92a33a780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2012-05-04 02:59:59.776314 7fa4c3abf780 -1 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[root@ceph1 ~]# /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
[root@ceph1 ~]#
ok, this is just a bad check. we're verifying there aren't threads because fork()/daemonize() will destroy them. the problem is we just stopped a thread (via join()), and then look in /proc to count threads.. and the kernel is apparently removing the /proc entry asynchronously.
I'm inclined to just remove this check. If threads get wiped out at daemonize() time we'll just have to figure that out the hard way.
- Status changed from 12 to Fix Under Review
- Status changed from Fix Under Review to Resolved
Sounds good to me; I never liked depending on /proc for that anyway.
Merged into master. We probably want to put it in stable as well, but it was branched off master and I wasn't sure so I'll leave that for you.
(And I just put the first inktank email into the git history! Hurray!)
Also available in: Atom
PDF