Project

General

Profile

Actions

Bug #3440

closed

Running OSDs on ZFS on Linux

Added by Wido den Hollander over 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
osd,zfs
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just gave it a try to run a small setup on ZFS on Linux (http://zfsonlinux.org/)

The OSDs boot just fine, but when you start writing data it will fail at some point.

This is the backtrace I got:

Core was generated by `/usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 6, Aborted.
#0  0x00007f10d7885b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f10d7885b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000006edc0d in reraise_fatal (signum=6) at global/signal_handler.cc:58
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:104
#3  <signal handler called>
#4  0x00007f10d645d425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f10d6460b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f10d6daf69d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f10d6dad846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f10d6dad873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f10d6dad96e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000000000080a4d1 in object_info_t::decode (this=<optimized out>, bl=...) at osd/osd_types.cc:2150
#11 0x0000000000572ee4 in decode (bl=..., this=0x7f10c6b809f0) at osd/osd_types.h:1669
#12 object_info_t::object_info_t (this=0x7f10c6b809f0, bl=...) at osd/osd_types.h:1684
#13 0x000000000053d7bd in ReplicatedPG::get_object_context (this=0x2cd7000, soid=..., oloc=..., can_create=true) at osd/ReplicatedPG.cc:3895
#14 0x000000000053f0c9 in ReplicatedPG::find_object_context (this=0x2cd7000, oid=..., oloc=..., pobc=0x7f10c6b81780, can_create=true, 
    psnapid=0x7f10c6b81730) at osd/ReplicatedPG.cc:3946
#15 0x00000000005639d5 in ReplicatedPG::do_op (this=0x2cd7000, op=...) at osd/ReplicatedPG.cc:665
#16 0x0000000000600839 in PG::do_request (this=0x2cd7000, op=...) at osd/PG.cc:1462
#17 0x00000000005bfaf8 in OSD::dequeue_op (this=0x2218000, pg=0x2cd7000) at osd/OSD.cc:5819
#18 0x000000000079f835 in ThreadPool::worker (this=0x2218408) at common/WorkQueue.cc:54
#19 0x00000000005d87cd in ThreadPool::WorkThread::entry (this=<optimized out>) at ./common/WorkQueue.h:126
#20 0x00007f10d787de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#21 0x00007f10d651acbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#22 0x0000000000000000 in ?? ()
(gdb)

The logs that go with it are:

   -11> 2012-11-05 16:16:31.151479 7f01867da780  1 journal _open /dev/ceph/osd/journal-0 fd 28: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0
   -10> 2012-11-05 16:16:31.151855 7f01867da780  2 osd.0 0 boot
    -9> 2012-11-05 16:16:31.151910 7f01867da780 15 filestore(/ceph/osd/0) read meta/23c2fcde/osd_superblock/0//-1 0~0
    -8> 2012-11-05 16:16:31.152101 7f01867da780 10 filestore(/ceph/osd/0) FileStore::read meta/23c2fcde/osd_superblock/0//-1 0~144/144
    -7> 2012-11-05 16:16:31.152158 7f01867da780 15 filestore(/ceph/osd/0) read meta/fd6e4231/osdmap.9/0//-1 0~0
    -6> 2012-11-05 16:16:31.152222 7f01867da780 10 filestore(/ceph/osd/0) FileStore::read meta/fd6e4231/osdmap.9/0//-1 0~2479/2479
    -5> 2012-11-05 16:16:31.152310 7f01867da780 10 filestore(/ceph/osd/0) list_collections
    -4> 2012-11-05 16:16:31.154345 7f01867da780 15 filestore(/ceph/osd/0) collection_getattr /ceph/osd/0/current/1.3_head 'info'
    -3> 2012-11-05 16:16:31.154424 7f01867da780 10 filestore(/ceph/osd/0) collection_getattr /ceph/osd/0/current/1.3_head 'info' = 100
    -2> 2012-11-05 16:16:31.155720 7f017a07c700 20 filestore(/ceph/osd/0) flusher_entry start
    -1> 2012-11-05 16:16:31.155785 7f017a07c700 20 filestore(/ceph/osd/0) flusher_entry sleeping
     0> 2012-11-05 16:16:31.156860 7f01867da780 -1 *** Caught signal (Aborted) **
 in thread 7f01867da780

 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: /usr/bin/ceph-osd() [0x6edaba]
 2: (()+0xfcb0) [0x7f0185c74cb0]
 3: (gsignal()+0x35) [0x7f018484c425]
 4: (abort()+0x17b) [0x7f018484fb8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f018519e69d]
 6: (()+0xb5846) [0x7f018519c846]
 7: (()+0xb5873) [0x7f018519c873]
 8: (()+0xb596e) [0x7f018519c96e]
 9: (pg_info_t::decode(ceph::buffer::list::iterator&)+0x37f) [0x80b9cf]
 10: (PG::read_state(ObjectStore*)+0x176) [0x62e3a6]
 11: (OSD::load_pgs()+0x71f) [0x5d1b2f]
 12: (OSD::init()+0x585) [0x5d26a5]
 13: (main()+0x2377) [0x518067]
 14: (__libc_start_main()+0xed) [0x7f018483776d]
 15: /usr/bin/ceph-osd() [0x51a239]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---

This was run with 'debug filestore = 20'.

The OSD in this case is trying to start again after it crashed previously (forgot to set the logs then).

This was tested with:

I didn't try a newer version yet, since I'm assuming nothing has changed regarding this.


Files

ceph.conf (304 Bytes) ceph.conf Wido den Hollander, 04/04/2013 06:47 AM
Actions

Also available in: Atom PDF