Project

General

Profile

Actions

Bug #416

closed

OSD crash: PG::read_state

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm not sure if this is a duplicate of #345 but to me the backtrace seems a bit different, so i'm opening a new issue for it.

After bringing my btrfs-stripe back to one disk the OSD didn't kernel panic (btrfs issue) anymore, but started to crash with:

2010-09-17 14:05:13.171400 7fdcbf68c720 filestore(/srv/ceph/osd.4) parse meta -> meta = 1
2010-09-17 14:05:13.171409 7fdcbf68c720 filestore(/srv/ceph/osd.4) parse temp -> temp = 1
2010-09-17 14:05:13.171418 7fdcbf68c720 filestore(/srv/ceph/osd.4) parse commit_op_seq -> meta = 0
2010-09-17 14:05:13.171455 7fdcbf68c720 osd4 12134 _open_lock_pg 0.0p4
2010-09-17 14:05:13.171479 7fdcb92f6710 filestore(/srv/ceph/osd.4) flusher_entry flushing+closing 12 ep 0
2010-09-17 14:05:13.171526 7fdcbf68c720 osd4 12134 _get_pool 0 0 -> 1
2010-09-17 14:05:13.171684 7fdcbf68c720 filestore(/srv/ceph/osd.4) collection_getattr /srv/ceph/osd.4/current/0.0p4_head 'inf
o'
2010-09-17 14:05:13.171748 7fdcbf68c720 filestore(/srv/ceph/osd.4) collection_getattr /srv/ceph/osd.4/current/0.0p4_head 'inf
o' = -61
./include/buffer.h: In function 'void ceph::buffer::ptr::copy_out(unsigned int, unsigned int, char*) const':
./include/buffer.h:457: FAILED assert(_raw)
 1: (PG::read_state(ObjectStore*)+0x17e) [0x54108e]
 2: (OSD::load_pgs()+0x145) [0x4e5f75]
 3: (OSD::init()+0x4b0) [0x4e6950]
 4: (main()+0x1d92) [0x458162]
 5: (__libc_start_main()+0xfd) [0x7fdcbdf4dc4d]
 6: /usr/bin/cosd() [0x4561b9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I checked, but /srv/ceph/osd.4/current/0.0p4_head was empty. So for the test I removed this directory, but then the OSD started to crash with:

2010-09-17 21:31:55.697582 7f18cefc4720 filestore(/srv/ceph/osd.4) parse 4.0_head -> 4.0_head??? = 1
2010-09-17 21:31:55.697593 7f18cefc4720 filestore(/srv/ceph/osd.4) parse 9.7_head -> 9.7_head??? = 1
2010-09-17 21:31:55.697601 7f18cefc4720 filestore(/srv/ceph/osd.4) parse meta -> meta = 1
2010-09-17 21:31:55.697610 7f18cefc4720 filestore(/srv/ceph/osd.4) parse temp -> temp = 1
2010-09-17 21:31:55.697619 7f18cefc4720 filestore(/srv/ceph/osd.4) parse commit_op_seq -> meta = 0
2010-09-17 21:31:55.697672 7f18cefc4720 osd4 12134 _open_lock_pg 0.10
2010-09-17 21:31:55.697701 7f18cefc4720 osd4 12134 _get_pool 0 0 -> 1
2010-09-17 21:31:55.697887 7f18cefc4720 filestore(/srv/ceph/osd.4) collection_getattr /srv/ceph/osd.4/current/0.10_head 'info'
2010-09-17 21:31:55.698005 7f18cefc4720 filestore(/srv/ceph/osd.4) collection_getattr /srv/ceph/osd.4/current/0.10_head 'info' = -61
./include/buffer.h: In function 'void ceph::buffer::ptr::copy_out(unsigned int, unsigned int, char*) const':
./include/buffer.h:457: FAILED assert(_raw)
 1: (PG::read_state(ObjectStore*)+0x17e) [0x54108e]
 2: (OSD::load_pgs()+0x145) [0x4e5f75]
 3: (OSD::init()+0x4b0) [0x4e6950]
 4: (main()+0x1d92) [0x458162]
 5: (__libc_start_main()+0xfd) [0x7f18cd885c4d]
 6: /usr/bin/cosd() [0x4561b9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

In this case, /srv/ceph/osd.4/current/0.10_head was NOT empty.

I've uploaded the cores, binary and logfile to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_read_pg_state I preserved the timestamps of the corefiles, so they match the log.

This crash occured on node05.ceph.widodh.nl

Actions

Also available in: Atom PDF