Actions
Bug #339
closedOSD crash: ReplicatedPG::sub_op_modify
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Two OSD's got killed by the OOM killer, after restarting both (osd4 and osd5), one crash with the following message:
Core was generated by `/usr/bin/cosd -i 5 -c /etc/ceph/ceph.conf'. Program terminated with signal 6, Aborted. #0 0x00007f9bdc464a75 in raise () from /lib/libc.so.6 (gdb) bt #0 0x00007f9bdc464a75 in raise () from /lib/libc.so.6 #1 0x00007f9bdc4685c0 in abort () from /lib/libc.so.6 #2 0x00007f9bdcd198e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #3 0x00007f9bdcd17d16 in ?? () from /usr/lib/libstdc++.so.6 #4 0x00007f9bdcd17d43 in std::terminate() () from /usr/lib/libstdc++.so.6 #5 0x00007f9bdcd17e3e in __cxa_throw () from /usr/lib/libstdc++.so.6 #6 0x00000000005c02b8 in ceph::__ceph_assert_fail (assertion=0x5eb240 "!missing.is_missing(soid)", file=<value optimized out>, line=2776, func=<value optimized out>) at common/assert.cc:30 #7 0x00000000004905e9 in ReplicatedPG::sub_op_modify (this=<value optimized out>, op=0x7f9bc402a810) at osd/ReplicatedPG.cc:2776 #8 0x00000000004d94a4 in OSD::dequeue_op (this=0xe64120, pg=0xfc8ba0) at osd/OSD.cc:4740 #9 0x00000000005c097f in ThreadPool::worker (this=0xe64600) at common/WorkQueue.cc:44 #10 0x00000000004f89ad in ThreadPool::WorkThread::entry() () #11 0x000000000046d32a in Thread::_entry_func (arg=0x4cc0) at ./common/Thread.h:39 #12 0x00007f9bdd2f79ca in start_thread () from /lib/libpthread.so.0 #13 0x00007f9bdc5176cd in clone () from /lib/libc.so.6 #14 0x0000000000000000 in ?? () (gdb)
During this the cluster was degraded since the OSD's had been down for some time.
I've uploaded the logs, core and binary to logger.ceph.widodh.nl into /srv/ceph/issues/osd_crash_ReplicatedPG_sub_op_modify
After this crash i tried to start the OSD again with a higher loglevel (20), but it didn't crash again.
Actions