Project

General

Profile

Bug #215

osd crash: FAILED assert(seq >= last_committed_seq)

Added by ar Fred almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
Start date:
06/20/2010
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

this is ceph unstable c626ac384678661b765c1ae1dee8db48b2c70993

#0  0x00007f41b1a18a75 in raise () from /lib/libc.so.6
#1  0x00007f41b1a1c5c0 in abort () from /lib/libc.so.6
#2  0x00007f41b22cd8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x00007f41b22cbd16 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007f41b22cbd43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007f41b22cbe3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x00000000005b39f8 in ceph::__ceph_assert_fail (assertion=0x5ec3b2 "seq >= last_committed_seq", file=<value optimized out>, line=711, func=<value optimized out>) at common/assert.cc:30
#7  0x00000000005649e1 in FileJournal::committed_thru (this=0x1116310, seq=0) at os/FileJournal.cc:711
#8  0x000000000055d265 in JournalingObjectStore::commit_finish (this=0x1125740) at os/JournalingObjectStore.cc:186
#9  0x00000000005543f3 in FileStore::sync_entry (this=0x1125740) at os/FileStore.cc:1714
#10 0x00000000004ef93d in FileStore::SyncThread::entry() ()
#11 0x0000000000469a4a in Thread::_entry_func (arg=0x6315) at ./common/Thread.h:39
#12 0x00007f41b28ab9ca in start_thread () from /lib/libpthread.so.0
#13 0x00007f41b1acb6cd in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()

attached full log of that osd.

I kept all logs, core, ... just ask!

osd1.gz (54.2 KB) ar Fred, 06/20/2010 10:39 AM

commit_op_seq (5 Bytes) ar Fred, 06/22/2010 01:42 AM

History

#1 Updated by Sage Weil almost 9 years ago

  • Status changed from New to Testing
  • Target version set to v0.21

this should be fixed by bf3d52a4b725a0f2d3db39ea9ad5b412171ea0ad... can you please confirm?

thanks!

#2 Updated by ar Fred almost 9 years ago

I got a crash after restarting that osd, same stacktrace (if you ignore line numbers difference due to your recent commit):

#0  0x00007fbc5be06a75 in raise () from /lib/libc.so.6
#1  0x00007fbc5be0a5c0 in abort () from /lib/libc.so.6
#2  0x00007fbc5c6bb8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x00007fbc5c6b9d16 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007fbc5c6b9d43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007fbc5c6b9e3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x00000000005b3a08 in ceph::__ceph_assert_fail (assertion=0x5ec3d2 "seq >= last_committed_seq", file=<value optimized out>, line=711, func=<value optimized out>) at common/assert.cc:30
#7  0x00000000005649f1 in FileJournal::committed_thru (this=0x1def3c0, seq=0) at os/FileJournal.cc:711
#8  0x000000000055d265 in JournalingObjectStore::commit_finish (this=0x1df5740) at os/JournalingObjectStore.cc:189
#9  0x00000000005543f3 in FileStore::sync_entry (this=0x1df5740) at os/FileStore.cc:1714
#10 0x00000000004ef93d in FileStore::SyncThread::entry() ()
#11 0x0000000000469a4a in Thread::_entry_func (arg=0x6c0b) at ./common/Thread.h:39
#12 0x00007fbc5cc999ca in start_thread () from /lib/libpthread.so.0
#13 0x00007fbc5beb96cd in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()

Should your patch fix the actual problem, or its cause? i.e., should I expect my osd to restart, or should I reformat?

just in case, on frame 7:

(gdb) p seq
$1 = 0
(gdb) p last_committed_seq
$2 = 1975

#3 Updated by Sage Weil almost 9 years ago

Oh.. it may have written the bad (0) value to current/commit_op_seq. Can you confirm that file has 0 in it? If so, then you either need to re-mkfs, or probably

$ echo 1975 > current/commit_op_seq

will do the trick.

#4 Updated by ar Fred almost 9 years ago

I'm attaching the commit_op_seq file, as the content is not what I was expecting, it indeed has a 0 in it, but it also holds a second line.

the echo trick worked, this osd has now joined the other and all the PGs are active+clean.

Thanks

#5 Updated by Sage Weil almost 9 years ago

  • Status changed from Testing to Resolved

Also available in: Atom PDF