Project

General

Profile

Actions

Bug #1191

closed

FAILED assert(!missing.is_missing(soid))

Added by ar Fred almost 13 years ago. Updated almost 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In 30 seconds, that error happened on 4 out of 8 OSDs. This is with today's stable branch.

That happend after some btrfs related problems. I had to restart the whole cluster, ceph then took approx 5-10 minutes to reach the 100% active+clean, all OSDs up+in state. And then in a matter of seconds after ceph being up+repaired my osds started to crash one after the other.

osd/ReplicatedPG.cc: In function 'void ReplicatedPG::sub_op_modify(MOSDSubOp*)', in thread '0x7fb7b7da2700'
osd/ReplicatedPG.cc: 3058: FAILED assert(!missing.is_missing(soid))
 ceph version  (commit:)
 1: (ReplicatedPG::sub_op_modify(MOSDSubOp*)+0x8e2) [0x4bd5e2]
 2: (OSD::dequeue_op(PG*)+0x3a5) [0x51e4d5]
 3: (ThreadPool::worker()+0x2a6) [0x61a726]
 4: (ThreadPool::WorkThread::entry()+0xd) [0x539bcd]
 5: (()+0x6d8c) [0x7fb7c5e6fd8c]
 6: (clone()+0x6d) [0x7fb7c4abd04d]

This looks a lot like [[http://marc.info/?l=ceph-devel&m=129192415004110&w=2]]

I'm also attaching the osd log.


Files

osd.8.log (148 KB) osd.8.log ar Fred, 06/16/2011 08:02 AM
Actions #1

Updated by ar Fred almost 13 years ago

Core was generated by `/usr/bin/cosd -i 8 -c /etc/ceph/ceph.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007fb7c5e78b3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007fb7c5e78b3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000064d1d2 in reraise_fatal (signum=6) at common/signal.cc:61
#2  0x000000000064dc5d in handle_fatal_signal (signum=6) at common/signal.cc:108
#3  <signal handler called>
#4  0x00007fb7c4a0ad05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007fb7c4a0eab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007fb7c52c16dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fb7c52bf926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fb7c52bf953 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007fb7c52bfa5e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00000000006189f2 in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=3058, func=0x6a50e0 "void ReplicatedPG::sub_op_modify(MOSDSubOp*)") at common/assert.cc:86
#11 0x00000000004bd5e2 in ReplicatedPG::sub_op_modify (this=0x2633000, op=0x2433b00) at osd/ReplicatedPG.cc:3058
#12 0x000000000051e4d5 in OSD::dequeue_op (this=0x1bf3000, pg=0x2633000) at osd/OSD.cc:5281
#13 0x000000000061a726 in ThreadPool::worker (this=0x1bf33f0) at common/WorkQueue.cc:44
#14 0x0000000000539bcd in ThreadPool::WorkThread::entry (this=<value optimized out>) at ./common/WorkQueue.h:113
#15 0x00007fb7c5e6fd8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007fb7c4abd04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000000000000000 in ?? ()

Actions #2

Updated by Sage Weil almost 13 years ago

  • Target version set to v0.31
Actions #3

Updated by Sage Weil almost 13 years ago

  • Assignee set to Samuel Just
Actions #4

Updated by Samuel Just almost 13 years ago

  • Status changed from New to 4

This is probably caused by a bug in missing set construction during log merging. It may be fixed in 33aa578656f64606612a9c0feab87548be1cd123 or b418896d43e7d6b3d900f5c51463683c2e938c3e, which I just pushed to master. If you get a chance to retest, it'd be good to know if this resolves the problem.

Actions #5

Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.31 to v0.32
Actions #6

Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.32 to v0.33
Actions #7

Updated by Sage Weil almost 13 years ago

  • Status changed from 4 to Can't reproduce
  • Target version deleted (v0.33)
Actions

Also available in: Atom PDF