Bug #1426
osd assert fail ../../src/osd/PG.cc: 4271: FAILED assert(query.query.type == Query::MISSING)
0%
Description
I'm seeing assertion failures on a few nodes on my cluster. 10/33 hit the following assertion, the other 17 remain running, but the monitor has marked most of them down and out:
osd e8854: 36 osds: 3 up, 3 in
This is the assertion in one of the osd logs: http://fpaste.org/hc1X/raw/
This is the stack trace: http://fpaste.org/RZjY/raw/
I've also attached the log from the leader monitor.
This is the monitor log:
History
#1 Updated by Sam Lang over 12 years ago
(gdb) p query.query.type
$3 = 0
(gdb) p query.query
$4 = (const PG::Query &) @0x78db028: {type = 0, since = {version = 0, epoch = 0, __pad = 0}, history = {epoch_c
reated = 2, last_epoch_started = 8766, last_epoch_clean = 8857, last_epoch_split = 8766, same_up_since = 8907,
same_acting_since = 8907, same_primary_since = 8878, last_scrub = {version = 0, epoch = 0, __pad = 0}, last_scr
ub_stamp = {tv = {tv_sec = 1313991400, tv_nsec = 660218000}}}}
(gdb)
#3 Updated by Sage Weil over 12 years ago
- Priority changed from Normal to High
#4 Updated by Sage Weil over 12 years ago
- translation missing: en.field_position set to 2
#5 Updated by Samuel Just over 12 years ago
This is probably caused by the same last_warm_restart as before. My previous patch handled the case where the handle_create callback is invoked. However, in cases other than handle_pg_create and pg splitting, handle_create is not used. reset_last_warm_restart will now be invoked from the Initial state exit handler instead, covering all exit conditions.
cf3b7cf6a9d3f873ad27a313cc1635822bdd89a1
Thanks for the bug report!
#6 Updated by Sage Weil over 12 years ago
- Status changed from New to 7
#7 Updated by Sage Weil over 12 years ago
- Status changed from 7 to Resolved