Actions
Bug #589
closedOSD: crash on startup, PG::read_state
% Done:
0%
Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
After upgrading to today's unstable all my OSD's crashed directly after startup, for example osd0:
Last loglines after cranking it up:
2010-11-17 20:28:46.542978 7f300569a720 filestore(/srv/ceph/osd.0) write /srv/ceph/osd.0/current/meta/pginfo_0.63_0 0~8 2010-11-17 20:28:46.543029 7f300569a720 filestore(/srv/ceph/osd.0) queue_flusher ep 0 fd 11 0~8 qlen 1 2010-11-17 20:28:46.543038 7f300569a720 filestore(/srv/ceph/osd.0) write /srv/ceph/osd.0/current/meta/pginfo_0.63_0 0~8 = 8 2010-11-17 20:28:46.545741 7f300569a720 journal WARNING: disk write cache is ON; journaling will not be reliable 2010-11-17 20:28:46.545759 7f300569a720 journal on kernels prior to 2.6.33 (recent kernels are safe) 2010-11-17 20:28:46.545772 7f300569a720 journal disable with 'hdparm -W 0 /dev/sda1' 2010-11-17 20:28:46.545952 7f30000ba710 filestore(/srv/ceph/osd.0) sync_entry waiting for max_interval 5.000000 2010-11-17 20:28:46.546047 7f2ffe8b7710 filestore(/srv/ceph/osd.0) flusher_entry start 2010-11-17 20:28:46.546071 7f2ffe8b7710 filestore(/srv/ceph/osd.0) flusher_entry flushing+closing 11 ep 0 2010-11-17 20:28:46.546093 7f300569a720 filestore(/srv/ceph/osd.0) mount: enabling PARALLEL journal mode: btrfs, SNAP_CREATE_ASYNC detected and 'filestore btrfs snap' mode is enabled 2010-11-17 20:28:46.546112 7f300569a720 osd0 0 boot 2010-11-17 20:28:46.546135 7f300569a720 filestore(/srv/ceph/osd.0) read /srv/ceph/osd.0/current/meta/osd_superblock_0 0~0 2010-11-17 20:28:46.546147 7f2ffe8b7710 filestore(/srv/ceph/osd.0) flusher_entry sleeping 2010-11-17 20:28:46.546201 7f300569a720 filestore(/srv/ceph/osd.0) read /srv/ceph/osd.0/current/meta/osd_superblock_0 0~123 = 123 2010-11-17 20:28:46.546222 7f300569a720 osd0 0 read_superblock sb(850e9018-0809-e464-c693-d93aeb9e7d29 osd0 e4532 [1,4532] lci=[3842,4532]) 2010-11-17 20:28:46.546264 7f300569a720 filestore(/srv/ceph/osd.0) read /srv/ceph/osd.0/current/meta/osdmap.4532_0 0~0 2010-11-17 20:28:46.546417 7f300569a720 filestore(/srv/ceph/osd.0) read /srv/ceph/osd.0/current/meta/osdmap.4532_0 0~6705 = 6705 2010-11-17 20:28:46.546504 7f300569a720 osd0 4532 clear_temp 2010-11-17 20:28:46.546516 7f300569a720 filestore(/srv/ceph/osd.0) collection_list /srv/ceph/osd.0/current/temp 2010-11-17 20:28:46.546553 7f300569a720 filestore(/srv/ceph/osd.0) collection_list /srv/ceph/osd.0/current/temp sorting 0 objects 2010-11-17 20:28:46.546564 7f300569a720 filestore(/srv/ceph/osd.0) collection_list /srv/ceph/osd.0/current/temp = 0 (0 objects) 2010-11-17 20:28:46.546572 7f300569a720 osd0 4532 0 objects 2010-11-17 20:28:46.546583 7f300569a720 filestore(/srv/ceph/osd.0) collection_stat /srv/ceph/osd.0/current/temp 2010-11-17 20:28:46.546592 7f300569a720 filestore(/srv/ceph/osd.0) collection_stat /srv/ceph/osd.0/current/temp = 0 2010-11-17 20:28:46.546601 7f300569a720 osd0 4532 load_pgs 2010-11-17 20:28:46.546609 7f300569a720 filestore(/srv/ceph/osd.0) list_collections 2010-11-17 20:28:46.547277 7f300569a720 osd0 4532 load_pgs skipping non-pg meta 2010-11-17 20:28:46.547285 7f300569a720 osd0 4532 load_pgs skipping non-pg temp 2010-11-17 20:28:46.547294 7f300569a720 osd0 4532 _open_lock_pg 1.1p0 2010-11-17 20:28:46.547313 7f300569a720 osd0 4532 _get_pool 1 0 -> 1 2010-11-17 20:28:46.547416 7f300569a720 filestore(/srv/ceph/osd.0) collection_getattr /srv/ceph/osd.0/current/1.1p0_head 'info' 2010-11-17 20:28:46.547457 7f300569a720 filestore(/srv/ceph/osd.0) collection_getattr /srv/ceph/osd.0/current/1.1p0_head 'info' = 305 *** Caught signal (ABRT) *** ceph version 0.24~rc (commit:d57181d3d5b05b893ed75621a03e860281d98dd5) 1: (sigabrt_handler(int)+0x7d) [0x5dea5d] 2: (()+0x33af0) [0x7f3003d24af0] 3: (gsignal()+0x35) [0x7f3003d24a75] 4: (abort()+0x180) [0x7f3003d285c0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f30045da8e5] 6: (()+0xcad16) [0x7f30045d8d16] 7: (()+0xcad43) [0x7f30045d8d43] 8: (()+0xcae3e) [0x7f30045d8e3e] 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x11e) [0x45b7ae] 10: (decode(PG::Info&, ceph::buffer::list::iterator&)+0x246) [0x515686] 11: (PG::read_state(ObjectStore*)+0x10c) [0x54548c] 12: (OSD::load_pgs()+0x15c) [0x4dff5c] 13: (OSD::init()+0x4a9) [0x4e6259] 14: (main()+0x16ca) [0x45924a] 15: (__libc_start_main()+0xfd) [0x7f3003d0fc4d] 16: /usr/bin/cosd() [0x457969]All the OSD's went down with the same backtrace, but you might want to check out:
- osd0 (node01)
- osd1 (node02)
- osd2 (node03)
Actions