Bug #345: OSD crash: PG::read_state - Ceph - Ceph

Actions

Copy link

Bug #345

closed

OSD crash: PG::read_state

Added by Wido den Hollander almost 14 years ago. Updated over 13 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This might be a duplicate of #279 but i'm not sure.

This morning i saw that 4 of my 12 OSD's were down (most of them killed by the OOM killer while i'm using tcmalloc).

Tried to start them again, but then osd5 crashed:

10.08.11_09:12:35.468138 7f17fb9d3720 osd5 2559 pg[0.18d( v 889'2876 lc 0'0 (889'2874,889'2876]+backlog n=2758 ec=2 les=2497 2552/2552/879) [] r=0 (info mismatch, log(0'0,0'0]) mlcod 0'0 inactive] read_log 0~250978
10.08.11_09:12:35.468157 7f17fb9d3720 filestore(/srv/ceph/osd5) read /srv/ceph/osd5/current/meta/pglog_0.18d_0 0~250978
10.08.11_09:12:35.468218 7f17fb9d3720 filestore(/srv/ceph/osd5) read couldn't open /srv/ceph/osd5/current/meta/pglog_0.18d_0 errno 2 No such file or directory
10.08.11_09:12:35.468228 7f17fb9d3720 filestore(/srv/ceph/osd5) read /srv/ceph/osd5/current/meta/pglog_0.18d_0 0~250978 = -2
10.08.11_09:12:35.468237 7f17fb9d3720 osd5 2559 pg[0.18d( v 889'2876 lc 0'0 (889'2874,889'2876]+backlog n=2758 ec=2 les=2497 2552/2552/879) [] r=0 (info mismatch, log(889'2874,0'0]+backlog) (log bound mismatch, empty) mlcod 0'0 inactive] read_log got 0 bytes, expected 250978-0=250978
osd/PG.cc: In function 'void PG::read_log(ObjectStore*)':
osd/PG.cc:2168: FAILED assert(0)
 1: (PG::read_state(ObjectStore*)+0x846) [0x532746]
 2: (OSD::load_pgs()+0x145) [0x4e6b25]
 3: (OSD::init()+0x4b8) [0x4e7508]
 4: (main()+0x1d72) [0x458022]
 5: (__libc_start_main()+0xfd) [0x7f17fa295c4d]
 6: /usr/bin/cosd() [0x456099]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Like the log says, /srv/ceph/osd5/current/meta/pglog_0.18d_0 is missing from the disk (fs is not full).

This looks like Christian Brunner's post on the ML (cosd dying after start), but i am not using the rbd branch on my client, i'm running the latest unstable ( 0eb6cd49f6e3ec523787d09cf08d3179be270db4 ).

Like mentioned on the ML, i tried a scrub, but that fails:

root@node14:~# ceph osd scrub 5
10.08.11_09:24:22.453954 mon <- [osd,scrub,5]
10.08.11_09:24:22.455788 mon0 -> 'unknown command scrub' (-22)
root@node14:~#

I've uploaded the core, logfile and binary to logger.ceph.widodh.nl in the directory /srv/ceph/issues/cosd_crash_pg_read_state

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #345

OSD crash: PG::read_state

Updated by Wido den Hollander almost 14 years ago

Updated by Wido den Hollander almost 14 years ago

Updated by Sage Weil almost 14 years ago

Updated by Sage Weil almost 14 years ago