Project

General

Profile

Bug #1426

osd assert fail ../../src/osd/PG.cc: 4271: FAILED assert(query.query.type == Query::MISSING)

Added by Sam Lang over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm seeing assertion failures on a few nodes on my cluster. 10/33 hit the following assertion, the other 17 remain running, but the monitor has marked most of them down and out:

osd e8854: 36 osds: 3 up, 3 in

This is the assertion in one of the osd logs: http://fpaste.org/hc1X/raw/
This is the stack trace: http://fpaste.org/RZjY/raw/

I've also attached the log from the leader monitor.
This is the monitor log:

mon.alpha.log View (7.87 MB) Sam Lang, 08/22/2011 07:15 AM

History

#1 Updated by Sam Lang over 12 years ago

(gdb) p query.query.type
$3 = 0
(gdb) p query.query
$4 = (const PG::Query &) @0x78db028: {type = 0, since = {version = 0, epoch = 0, __pad = 0}, history = {epoch_c
reated = 2, last_epoch_started = 8766, last_epoch_clean = 8857, last_epoch_split = 8766, same_up_since = 8907,
same_acting_since = 8907, same_primary_since = 8878, last_scrub = {version = 0, epoch = 0, __pad = 0}, last_scr
ub_stamp = {tv = {tv_sec = 1313991400, tv_nsec = 660218000}}}}
(gdb)

#2 Updated by Sage Weil over 12 years ago

  • Target version set to v0.35

type=0 is INFO.

#3 Updated by Sage Weil over 12 years ago

  • Priority changed from Normal to High

#4 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position set to 2

#5 Updated by Samuel Just over 12 years ago

This is probably caused by the same last_warm_restart as before. My previous patch handled the case where the handle_create callback is invoked. However, in cases other than handle_pg_create and pg splitting, handle_create is not used. reset_last_warm_restart will now be invoked from the Initial state exit handler instead, covering all exit conditions.

cf3b7cf6a9d3f873ad27a313cc1635822bdd89a1

Thanks for the bug report!

#6 Updated by Sage Weil over 12 years ago

  • Status changed from New to 7

#7 Updated by Sage Weil over 12 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF