Project

General

Profile

Backport #13512

Updated by Nathan Cutler about 7 years ago

https://github.com/ceph/ceph/pull/13104 This is happening at startup in a small minority of test runs.

teuthology-2015-10-12_23:08:03-kcephfs-master-testing-basic-multi/1105038/

The ceph-osd daemons are starting, their logs are happily spinning away, but they're not getting as far as sending their boot messages to the mon.

I caught one in the act, and tried to attach a debugger, gdb hung, tried to run a fresh osd process in a debugger and it hung on ctrl-c.

I happened to notice that the host mira106 had some dead krbd volumes (presumably from some other test, see #13510).

It seems highly likely that the OSD process is hanging inside get_device_by_uuid. For some reason the heartbeat map doesn't care that this thread is hanging.

Back