Project

General

Profile

Bug #589

OSD: crash on startup, PG::read_state

Added by Wido den Hollander almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
Start date:
11/17/2010
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

After upgrading to today's unstable all my OSD's crashed directly after startup, for example osd0:

Last loglines after cranking it up:

2010-11-17 20:28:46.542978 7f300569a720 filestore(/srv/ceph/osd.0) write /srv/ceph/osd.0/current/meta/pginfo_0.63_0 0~8
2010-11-17 20:28:46.543029 7f300569a720 filestore(/srv/ceph/osd.0) queue_flusher ep 0 fd 11 0~8 qlen 1
2010-11-17 20:28:46.543038 7f300569a720 filestore(/srv/ceph/osd.0) write /srv/ceph/osd.0/current/meta/pginfo_0.63_0 0~8 = 8
2010-11-17 20:28:46.545741 7f300569a720 journal WARNING: disk write cache is ON; journaling will not be reliable
2010-11-17 20:28:46.545759 7f300569a720 journal          on kernels prior to 2.6.33 (recent kernels are safe)
2010-11-17 20:28:46.545772 7f300569a720 journal          disable with 'hdparm -W 0 /dev/sda1'
2010-11-17 20:28:46.545952 7f30000ba710 filestore(/srv/ceph/osd.0) sync_entry waiting for max_interval 5.000000
2010-11-17 20:28:46.546047 7f2ffe8b7710 filestore(/srv/ceph/osd.0) flusher_entry start
2010-11-17 20:28:46.546071 7f2ffe8b7710 filestore(/srv/ceph/osd.0) flusher_entry flushing+closing 11 ep 0
2010-11-17 20:28:46.546093 7f300569a720 filestore(/srv/ceph/osd.0) mount: enabling PARALLEL journal mode: btrfs, SNAP_CREATE_ASYNC detected and 'filestore btrfs snap' mode is enabled
2010-11-17 20:28:46.546112 7f300569a720 osd0 0 boot
2010-11-17 20:28:46.546135 7f300569a720 filestore(/srv/ceph/osd.0) read /srv/ceph/osd.0/current/meta/osd_superblock_0 0~0
2010-11-17 20:28:46.546147 7f2ffe8b7710 filestore(/srv/ceph/osd.0) flusher_entry sleeping
2010-11-17 20:28:46.546201 7f300569a720 filestore(/srv/ceph/osd.0) read /srv/ceph/osd.0/current/meta/osd_superblock_0 0~123 = 123
2010-11-17 20:28:46.546222 7f300569a720 osd0 0 read_superblock sb(850e9018-0809-e464-c693-d93aeb9e7d29 osd0 e4532 [1,4532] lci=[3842,4532])
2010-11-17 20:28:46.546264 7f300569a720 filestore(/srv/ceph/osd.0) read /srv/ceph/osd.0/current/meta/osdmap.4532_0 0~0
2010-11-17 20:28:46.546417 7f300569a720 filestore(/srv/ceph/osd.0) read /srv/ceph/osd.0/current/meta/osdmap.4532_0 0~6705 = 6705
2010-11-17 20:28:46.546504 7f300569a720 osd0 4532 clear_temp
2010-11-17 20:28:46.546516 7f300569a720 filestore(/srv/ceph/osd.0) collection_list /srv/ceph/osd.0/current/temp
2010-11-17 20:28:46.546553 7f300569a720 filestore(/srv/ceph/osd.0) collection_list /srv/ceph/osd.0/current/temp sorting 0 objects
2010-11-17 20:28:46.546564 7f300569a720 filestore(/srv/ceph/osd.0) collection_list /srv/ceph/osd.0/current/temp = 0 (0 objects)
2010-11-17 20:28:46.546572 7f300569a720 osd0 4532 0 objects
2010-11-17 20:28:46.546583 7f300569a720 filestore(/srv/ceph/osd.0) collection_stat /srv/ceph/osd.0/current/temp
2010-11-17 20:28:46.546592 7f300569a720 filestore(/srv/ceph/osd.0) collection_stat /srv/ceph/osd.0/current/temp = 0
2010-11-17 20:28:46.546601 7f300569a720 osd0 4532 load_pgs
2010-11-17 20:28:46.546609 7f300569a720 filestore(/srv/ceph/osd.0) list_collections
2010-11-17 20:28:46.547277 7f300569a720 osd0 4532 load_pgs skipping non-pg meta
2010-11-17 20:28:46.547285 7f300569a720 osd0 4532 load_pgs skipping non-pg temp
2010-11-17 20:28:46.547294 7f300569a720 osd0 4532 _open_lock_pg 1.1p0
2010-11-17 20:28:46.547313 7f300569a720 osd0 4532 _get_pool 1 0 -> 1
2010-11-17 20:28:46.547416 7f300569a720 filestore(/srv/ceph/osd.0) collection_getattr /srv/ceph/osd.0/current/1.1p0_head 'info'
2010-11-17 20:28:46.547457 7f300569a720 filestore(/srv/ceph/osd.0) collection_getattr /srv/ceph/osd.0/current/1.1p0_head 'info' = 305
*** Caught signal (ABRT) ***
 ceph version 0.24~rc (commit:d57181d3d5b05b893ed75621a03e860281d98dd5)
 1: (sigabrt_handler(int)+0x7d) [0x5dea5d]
 2: (()+0x33af0) [0x7f3003d24af0]
 3: (gsignal()+0x35) [0x7f3003d24a75]
 4: (abort()+0x180) [0x7f3003d285c0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f30045da8e5]
 6: (()+0xcad16) [0x7f30045d8d16]
 7: (()+0xcad43) [0x7f30045d8d43]
 8: (()+0xcae3e) [0x7f30045d8e3e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x11e) [0x45b7ae]
 10: (decode(PG::Info&, ceph::buffer::list::iterator&)+0x246) [0x515686]
 11: (PG::read_state(ObjectStore*)+0x10c) [0x54548c]
 12: (OSD::load_pgs()+0x15c) [0x4dff5c]
 13: (OSD::init()+0x4a9) [0x4e6259]
 14: (main()+0x16ca) [0x45924a]
 15: (__libc_start_main()+0xfd) [0x7f3003d0fc4d]
 16: /usr/bin/cosd() [0x457969]
All the OSD's went down with the same backtrace, but you might want to check out:
  • osd0 (node01)
  • osd1 (node02)
  • osd2 (node03)

History

#1 Updated by Sage Weil almost 9 years ago

  • Status changed from New to Resolved
  • Assignee set to Sage Weil
  • Target version set to v0.24

Ok, this is fixed by 7e9812b4a9bbf320a8b0bd0abec48c1c5d78fe66. Assuming your fs is old enough you should be ok just upgrading. Otherwise, mkcephfs!

Also available in: Atom PDF