Actions
Bug #3440
closedRunning OSDs on ZFS on Linux
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Development
Tags:
osd,zfs
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I just gave it a try to run a small setup on ZFS on Linux (http://zfsonlinux.org/)
The OSDs boot just fine, but when you start writing data it will fail at some point.
This is the backtrace I got:
Core was generated by `/usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.con'. Program terminated with signal 6, Aborted. #0 0x00007f10d7885b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f10d7885b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00000000006edc0d in reraise_fatal (signum=6) at global/signal_handler.cc:58 #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:104 #3 <signal handler called> #4 0x00007f10d645d425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f10d6460b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f10d6daf69d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f10d6dad846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #8 0x00007f10d6dad873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #9 0x00007f10d6dad96e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #10 0x000000000080a4d1 in object_info_t::decode (this=<optimized out>, bl=...) at osd/osd_types.cc:2150 #11 0x0000000000572ee4 in decode (bl=..., this=0x7f10c6b809f0) at osd/osd_types.h:1669 #12 object_info_t::object_info_t (this=0x7f10c6b809f0, bl=...) at osd/osd_types.h:1684 #13 0x000000000053d7bd in ReplicatedPG::get_object_context (this=0x2cd7000, soid=..., oloc=..., can_create=true) at osd/ReplicatedPG.cc:3895 #14 0x000000000053f0c9 in ReplicatedPG::find_object_context (this=0x2cd7000, oid=..., oloc=..., pobc=0x7f10c6b81780, can_create=true, psnapid=0x7f10c6b81730) at osd/ReplicatedPG.cc:3946 #15 0x00000000005639d5 in ReplicatedPG::do_op (this=0x2cd7000, op=...) at osd/ReplicatedPG.cc:665 #16 0x0000000000600839 in PG::do_request (this=0x2cd7000, op=...) at osd/PG.cc:1462 #17 0x00000000005bfaf8 in OSD::dequeue_op (this=0x2218000, pg=0x2cd7000) at osd/OSD.cc:5819 #18 0x000000000079f835 in ThreadPool::worker (this=0x2218408) at common/WorkQueue.cc:54 #19 0x00000000005d87cd in ThreadPool::WorkThread::entry (this=<optimized out>) at ./common/WorkQueue.h:126 #20 0x00007f10d787de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #21 0x00007f10d651acbd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #22 0x0000000000000000 in ?? () (gdb)
The logs that go with it are:
-11> 2012-11-05 16:16:31.151479 7f01867da780 1 journal _open /dev/ceph/osd/journal-0 fd 28: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0 -10> 2012-11-05 16:16:31.151855 7f01867da780 2 osd.0 0 boot -9> 2012-11-05 16:16:31.151910 7f01867da780 15 filestore(/ceph/osd/0) read meta/23c2fcde/osd_superblock/0//-1 0~0 -8> 2012-11-05 16:16:31.152101 7f01867da780 10 filestore(/ceph/osd/0) FileStore::read meta/23c2fcde/osd_superblock/0//-1 0~144/144 -7> 2012-11-05 16:16:31.152158 7f01867da780 15 filestore(/ceph/osd/0) read meta/fd6e4231/osdmap.9/0//-1 0~0 -6> 2012-11-05 16:16:31.152222 7f01867da780 10 filestore(/ceph/osd/0) FileStore::read meta/fd6e4231/osdmap.9/0//-1 0~2479/2479 -5> 2012-11-05 16:16:31.152310 7f01867da780 10 filestore(/ceph/osd/0) list_collections -4> 2012-11-05 16:16:31.154345 7f01867da780 15 filestore(/ceph/osd/0) collection_getattr /ceph/osd/0/current/1.3_head 'info' -3> 2012-11-05 16:16:31.154424 7f01867da780 10 filestore(/ceph/osd/0) collection_getattr /ceph/osd/0/current/1.3_head 'info' = 100 -2> 2012-11-05 16:16:31.155720 7f017a07c700 20 filestore(/ceph/osd/0) flusher_entry start -1> 2012-11-05 16:16:31.155785 7f017a07c700 20 filestore(/ceph/osd/0) flusher_entry sleeping 0> 2012-11-05 16:16:31.156860 7f01867da780 -1 *** Caught signal (Aborted) ** in thread 7f01867da780 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c) 1: /usr/bin/ceph-osd() [0x6edaba] 2: (()+0xfcb0) [0x7f0185c74cb0] 3: (gsignal()+0x35) [0x7f018484c425] 4: (abort()+0x17b) [0x7f018484fb8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f018519e69d] 6: (()+0xb5846) [0x7f018519c846] 7: (()+0xb5873) [0x7f018519c873] 8: (()+0xb596e) [0x7f018519c96e] 9: (pg_info_t::decode(ceph::buffer::list::iterator&)+0x37f) [0x80b9cf] 10: (PG::read_state(ObjectStore*)+0x176) [0x62e3a6] 11: (OSD::load_pgs()+0x71f) [0x5d1b2f] 12: (OSD::init()+0x585) [0x5d26a5] 13: (main()+0x2377) [0x518067] 14: (__libc_start_main()+0xed) [0x7f018483776d] 15: /usr/bin/ceph-osd() [0x51a239] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- end dump of recent events ---
This was run with 'debug filestore = 20'.
The OSD in this case is trying to start again after it crashed previously (forgot to set the logs then).
This was tested with:- Ubuntu 12.04
- ZOL 0.6.0-rc11
- Argonout 0.48.1argonaut (a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
- Journal on ZFS ZVOL (/dev/ceph/X)
I didn't try a newer version yet, since I'm assuming nothing has changed regarding this.
Files
Actions