Actions
Bug #1875
closedosd: ReplicatedPG::do_op
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I just noticed two OSD's (osd.11 and osd.20) go down in my cluster.
The backtrace of both OSD's:
Core was generated by `/usr/bin/ceph-osd -i 20 -c /etc/ceph/ceph.conf'. Program terminated with signal 6, Aborted. #0 0x00007f085d298f2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f085d298f2b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00000000005f99e2 in reraise_fatal (signum=6) at global/signal_handler.cc:59 #2 0x00000000005f9b9d in handle_fatal_signal (signum=6) at global/signal_handler.cc:106 #3 <signal handler called> #4 0x00007f085b8163a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f085b819b0b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f085c0d4d7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f085c0d2f26 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #8 0x00007f085c0d2f53 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #9 0x00007f085c0d304e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #10 0x00000000005cde57 in copy_out (dest=<optimized out>, l=<optimized out>, o=<optimized out>, this=<optimized out>) at ./include/buffer.h:193 #11 ceph::buffer::list::iterator::copy (this=0x7f084dca91a0, len=4, dest=0x7f084dca93c4 "") at common/buffer.cc:493 #12 0x00000000004be3f0 in decode_raw<unsigned long long> (t=@0x7f084dca93c0, p=...) at ./include/encoding.h:56 #13 decode (p=..., v=@0x7f084dca9178) at ./include/encoding.h:99 #14 decode (i=..., p=...) at ./include/object.h:194 #15 decode (this=0x7f084dca9170, bl=...) at ./include/object.h:347 #16 decode (p=..., c=...) at ./include/object.h:355 #17 ReplicatedPG::do_pg_op (this=0x527e000, op=0x375fd80) at osd/ReplicatedPG.cc:285 #18 0x00000000004ec8e5 in ReplicatedPG::do_op (this=0x527e000, op=0x375fd80) at osd/ReplicatedPG.cc:420 #19 0x0000000000530edd in OSD::dequeue_op (this=0x2a48000, pg=0x527e000) at osd/OSD.cc:5532 #20 0x00000000005cbbc6 in ThreadPool::worker (this=0x2a48408) at common/WorkQueue.cc:54 #21 0x000000000055368d in ThreadPool::WorkThread::entry (this=<optimized out>) at ./common/WorkQueue.h:120 #22 0x00007f085d290efc in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #23 0x00007f085b8c189d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #24 0x0000000000000000 in ?? () (gdb)
Both core files have almost the exact same timestamp:
root@atom2:~# stat /core.atom2.31856 File: `/core.atom2.31856' Size: 630030336 Blocks: 81136 IO Block: 4096 regular file Device: fc00h/64512d Inode: 62 Links: 1 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-01-04 14:15:28.106616697 +0100 Modify: 2012-01-04 14:15:28.522611737 +0100 Change: 2012-01-04 14:15:28.522611737 +0100 root@atom2:~#
root@atom5:~# stat /core.atom5.22990 File: `/core.atom5.22990' Size: 614453248 Blocks: 55392 IO Block: 4096 regular file Device: fc00h/64512d Inode: 2833 Links: 1 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-01-04 14:15:02.692203774 +0100 Modify: 2012-01-04 14:15:02.880204313 +0100 Change: 2012-01-04 14:15:02.880204313 +0100 root@atom5:~#
For some mysterious reason all my log files are empty, so I don't have logs for this one. It will probably be hard to track it down without logs.
Both OSD's had been running for about 3 weeks now.
The version of both OSD's: ceph version 0.39-140-ge5f4910 (e5f49104ab62ba7bc42cf6ecf41c9257b46585f7)
Actions