Actions
Bug #416
closedOSD crash: PG::read_state
Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I'm not sure if this is a duplicate of #345 but to me the backtrace seems a bit different, so i'm opening a new issue for it.
After bringing my btrfs-stripe back to one disk the OSD didn't kernel panic (btrfs issue) anymore, but started to crash with:
2010-09-17 14:05:13.171400 7fdcbf68c720 filestore(/srv/ceph/osd.4) parse meta -> meta = 1 2010-09-17 14:05:13.171409 7fdcbf68c720 filestore(/srv/ceph/osd.4) parse temp -> temp = 1 2010-09-17 14:05:13.171418 7fdcbf68c720 filestore(/srv/ceph/osd.4) parse commit_op_seq -> meta = 0 2010-09-17 14:05:13.171455 7fdcbf68c720 osd4 12134 _open_lock_pg 0.0p4 2010-09-17 14:05:13.171479 7fdcb92f6710 filestore(/srv/ceph/osd.4) flusher_entry flushing+closing 12 ep 0 2010-09-17 14:05:13.171526 7fdcbf68c720 osd4 12134 _get_pool 0 0 -> 1 2010-09-17 14:05:13.171684 7fdcbf68c720 filestore(/srv/ceph/osd.4) collection_getattr /srv/ceph/osd.4/current/0.0p4_head 'inf o' 2010-09-17 14:05:13.171748 7fdcbf68c720 filestore(/srv/ceph/osd.4) collection_getattr /srv/ceph/osd.4/current/0.0p4_head 'inf o' = -61 ./include/buffer.h: In function 'void ceph::buffer::ptr::copy_out(unsigned int, unsigned int, char*) const': ./include/buffer.h:457: FAILED assert(_raw) 1: (PG::read_state(ObjectStore*)+0x17e) [0x54108e] 2: (OSD::load_pgs()+0x145) [0x4e5f75] 3: (OSD::init()+0x4b0) [0x4e6950] 4: (main()+0x1d92) [0x458162] 5: (__libc_start_main()+0xfd) [0x7fdcbdf4dc4d] 6: /usr/bin/cosd() [0x4561b9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I checked, but /srv/ceph/osd.4/current/0.0p4_head was empty. So for the test I removed this directory, but then the OSD started to crash with:
2010-09-17 21:31:55.697582 7f18cefc4720 filestore(/srv/ceph/osd.4) parse 4.0_head -> 4.0_head??? = 1 2010-09-17 21:31:55.697593 7f18cefc4720 filestore(/srv/ceph/osd.4) parse 9.7_head -> 9.7_head??? = 1 2010-09-17 21:31:55.697601 7f18cefc4720 filestore(/srv/ceph/osd.4) parse meta -> meta = 1 2010-09-17 21:31:55.697610 7f18cefc4720 filestore(/srv/ceph/osd.4) parse temp -> temp = 1 2010-09-17 21:31:55.697619 7f18cefc4720 filestore(/srv/ceph/osd.4) parse commit_op_seq -> meta = 0 2010-09-17 21:31:55.697672 7f18cefc4720 osd4 12134 _open_lock_pg 0.10 2010-09-17 21:31:55.697701 7f18cefc4720 osd4 12134 _get_pool 0 0 -> 1 2010-09-17 21:31:55.697887 7f18cefc4720 filestore(/srv/ceph/osd.4) collection_getattr /srv/ceph/osd.4/current/0.10_head 'info' 2010-09-17 21:31:55.698005 7f18cefc4720 filestore(/srv/ceph/osd.4) collection_getattr /srv/ceph/osd.4/current/0.10_head 'info' = -61 ./include/buffer.h: In function 'void ceph::buffer::ptr::copy_out(unsigned int, unsigned int, char*) const': ./include/buffer.h:457: FAILED assert(_raw) 1: (PG::read_state(ObjectStore*)+0x17e) [0x54108e] 2: (OSD::load_pgs()+0x145) [0x4e5f75] 3: (OSD::init()+0x4b0) [0x4e6950] 4: (main()+0x1d92) [0x458162] 5: (__libc_start_main()+0xfd) [0x7f18cd885c4d] 6: /usr/bin/cosd() [0x4561b9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
In this case, /srv/ceph/osd.4/current/0.10_head was NOT empty.
I've uploaded the cores, binary and logfile to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_read_pg_state I preserved the timestamps of the corefiles, so they match the log.
This crash occured on node05.ceph.widodh.nl
Actions