Project

General

Profile

Actions

Bug #345

closed

OSD crash: PG::read_state

Added by Wido den Hollander almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This might be a duplicate of #279 but i'm not sure.

This morning i saw that 4 of my 12 OSD's were down (most of them killed by the OOM killer while i'm using tcmalloc).

Tried to start them again, but then osd5 crashed:

10.08.11_09:12:35.468138 7f17fb9d3720 osd5 2559 pg[0.18d( v 889'2876 lc 0'0 (889'2874,889'2876]+backlog n=2758 ec=2 les=2497 2552/2552/879) [] r=0 (info mismatch, log(0'0,0'0]) mlcod 0'0 inactive] read_log 0~250978
10.08.11_09:12:35.468157 7f17fb9d3720 filestore(/srv/ceph/osd5) read /srv/ceph/osd5/current/meta/pglog_0.18d_0 0~250978
10.08.11_09:12:35.468218 7f17fb9d3720 filestore(/srv/ceph/osd5) read couldn't open /srv/ceph/osd5/current/meta/pglog_0.18d_0 errno 2 No such file or directory
10.08.11_09:12:35.468228 7f17fb9d3720 filestore(/srv/ceph/osd5) read /srv/ceph/osd5/current/meta/pglog_0.18d_0 0~250978 = -2
10.08.11_09:12:35.468237 7f17fb9d3720 osd5 2559 pg[0.18d( v 889'2876 lc 0'0 (889'2874,889'2876]+backlog n=2758 ec=2 les=2497 2552/2552/879) [] r=0 (info mismatch, log(889'2874,0'0]+backlog) (log bound mismatch, empty) mlcod 0'0 inactive] read_log got 0 bytes, expected 250978-0=250978
osd/PG.cc: In function 'void PG::read_log(ObjectStore*)':
osd/PG.cc:2168: FAILED assert(0)
 1: (PG::read_state(ObjectStore*)+0x846) [0x532746]
 2: (OSD::load_pgs()+0x145) [0x4e6b25]
 3: (OSD::init()+0x4b8) [0x4e7508]
 4: (main()+0x1d72) [0x458022]
 5: (__libc_start_main()+0xfd) [0x7f17fa295c4d]
 6: /usr/bin/cosd() [0x456099]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Like the log says, /srv/ceph/osd5/current/meta/pglog_0.18d_0 is missing from the disk (fs is not full).

This looks like Christian Brunner's post on the ML (cosd dying after start), but i am not using the rbd branch on my client, i'm running the latest unstable ( 0eb6cd49f6e3ec523787d09cf08d3179be270db4 ).

Like mentioned on the ML, i tried a scrub, but that fails:

root@node14:~# ceph osd scrub 5
10.08.11_09:24:22.453954 mon <- [osd,scrub,5]
10.08.11_09:24:22.455788 mon0 -> 'unknown command scrub' (-22)
root@node14:~#

I've uploaded the core, logfile and binary to logger.ceph.widodh.nl in the directory /srv/ceph/issues/cosd_crash_pg_read_state

Actions

Also available in: Atom PDF