Bug #3440: Running OSDs on ZFS on Linux - Ceph - Ceph

Actions

Copy link

Bug #3440

closed

Running OSDs on ZFS on Linux

Added by Wido den Hollander over 11 years ago. Updated almost 11 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Development

Tags:

osd,zfs

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I just gave it a try to run a small setup on ZFS on Linux (http://zfsonlinux.org/)

The OSDs boot just fine, but when you start writing data it will fail at some point.

This is the backtrace I got:

Core was generated by `/usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 6, Aborted.
#0  0x00007f10d7885b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f10d7885b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000006edc0d in reraise_fatal (signum=6) at global/signal_handler.cc:58
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:104
#3  <signal handler called>
#4  0x00007f10d645d425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f10d6460b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f10d6daf69d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f10d6dad846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f10d6dad873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f10d6dad96e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000000000080a4d1 in object_info_t::decode (this=<optimized out>, bl=...) at osd/osd_types.cc:2150
#11 0x0000000000572ee4 in decode (bl=..., this=0x7f10c6b809f0) at osd/osd_types.h:1669
#12 object_info_t::object_info_t (this=0x7f10c6b809f0, bl=...) at osd/osd_types.h:1684
#13 0x000000000053d7bd in ReplicatedPG::get_object_context (this=0x2cd7000, soid=..., oloc=..., can_create=true) at osd/ReplicatedPG.cc:3895
#14 0x000000000053f0c9 in ReplicatedPG::find_object_context (this=0x2cd7000, oid=..., oloc=..., pobc=0x7f10c6b81780, can_create=true, 
    psnapid=0x7f10c6b81730) at osd/ReplicatedPG.cc:3946
#15 0x00000000005639d5 in ReplicatedPG::do_op (this=0x2cd7000, op=...) at osd/ReplicatedPG.cc:665
#16 0x0000000000600839 in PG::do_request (this=0x2cd7000, op=...) at osd/PG.cc:1462
#17 0x00000000005bfaf8 in OSD::dequeue_op (this=0x2218000, pg=0x2cd7000) at osd/OSD.cc:5819
#18 0x000000000079f835 in ThreadPool::worker (this=0x2218408) at common/WorkQueue.cc:54
#19 0x00000000005d87cd in ThreadPool::WorkThread::entry (this=<optimized out>) at ./common/WorkQueue.h:126
#20 0x00007f10d787de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#21 0x00007f10d651acbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#22 0x0000000000000000 in ?? ()
(gdb)

The logs that go with it are:

   -11> 2012-11-05 16:16:31.151479 7f01867da780  1 journal _open /dev/ceph/osd/journal-0 fd 28: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0
   -10> 2012-11-05 16:16:31.151855 7f01867da780  2 osd.0 0 boot
    -9> 2012-11-05 16:16:31.151910 7f01867da780 15 filestore(/ceph/osd/0) read meta/23c2fcde/osd_superblock/0//-1 0~0
    -8> 2012-11-05 16:16:31.152101 7f01867da780 10 filestore(/ceph/osd/0) FileStore::read meta/23c2fcde/osd_superblock/0//-1 0~144/144
    -7> 2012-11-05 16:16:31.152158 7f01867da780 15 filestore(/ceph/osd/0) read meta/fd6e4231/osdmap.9/0//-1 0~0
    -6> 2012-11-05 16:16:31.152222 7f01867da780 10 filestore(/ceph/osd/0) FileStore::read meta/fd6e4231/osdmap.9/0//-1 0~2479/2479
    -5> 2012-11-05 16:16:31.152310 7f01867da780 10 filestore(/ceph/osd/0) list_collections
    -4> 2012-11-05 16:16:31.154345 7f01867da780 15 filestore(/ceph/osd/0) collection_getattr /ceph/osd/0/current/1.3_head 'info'
    -3> 2012-11-05 16:16:31.154424 7f01867da780 10 filestore(/ceph/osd/0) collection_getattr /ceph/osd/0/current/1.3_head 'info' = 100
    -2> 2012-11-05 16:16:31.155720 7f017a07c700 20 filestore(/ceph/osd/0) flusher_entry start
    -1> 2012-11-05 16:16:31.155785 7f017a07c700 20 filestore(/ceph/osd/0) flusher_entry sleeping
     0> 2012-11-05 16:16:31.156860 7f01867da780 -1 *** Caught signal (Aborted) **
 in thread 7f01867da780

 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: /usr/bin/ceph-osd() [0x6edaba]
 2: (()+0xfcb0) [0x7f0185c74cb0]
 3: (gsignal()+0x35) [0x7f018484c425]
 4: (abort()+0x17b) [0x7f018484fb8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f018519e69d]
 6: (()+0xb5846) [0x7f018519c846]
 7: (()+0xb5873) [0x7f018519c873]
 8: (()+0xb596e) [0x7f018519c96e]
 9: (pg_info_t::decode(ceph::buffer::list::iterator&)+0x37f) [0x80b9cf]
 10: (PG::read_state(ObjectStore*)+0x176) [0x62e3a6]
 11: (OSD::load_pgs()+0x71f) [0x5d1b2f]
 12: (OSD::init()+0x585) [0x5d26a5]
 13: (main()+0x2377) [0x518067]
 14: (__libc_start_main()+0xed) [0x7f018483776d]
 15: /usr/bin/ceph-osd() [0x51a239]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---

This was run with 'debug filestore = 20'.

The OSD in this case is trying to start again after it crashed previously (forgot to set the logs then).

This was tested with:

Ubuntu 12.04
ZOL 0.6.0-rc11
Argonout 0.48.1argonaut (a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Journal on ZFS ZVOL (/dev/ceph/X)

I didn't try a newer version yet, since I'm assuming nothing has changed regarding this.

Files

ceph.conf (304 Bytes) ceph.conf

Wido den Hollander, 04/04/2013 06:47 AM

Actions

Copy link

Updated by Sage Weil about 11 years ago

Hey Wido,

Want to give this a go with the latest code? It would be nice to make this work, at least in a basic way!

Actions

Copy link

Updated by Wido den Hollander about 11 years ago

File ceph.conf ceph.conf added

I just tested it. It boots, but isn't very stable.

You can't run your journal on a file though, since ZoL doesn't do O_DIRECT: https://github.com/zfsonlinux/zfs/issues/224

Running the journal on ZVOL seems to work however.

I've attached my ceph.conf as a reference for my quick test.

This is how my "zfs list" looks like:

root@wido-desktop:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
desktop                   98.6G  2.58T   144K  /desktop
desktop/ceph              77.8M  2.58T   152K  /desktop/ceph
desktop/ceph/mon          2.25M  2.58T   144K  /desktop/ceph/mon
desktop/ceph/mon/desktop  2.11M  2.58T  2.11M  /desktop/ceph/mon/desktop
desktop/ceph/osd          75.4M  2.58T   152K  /desktop/ceph/osd
desktop/ceph/osd/0        42.4M  2.58T  42.4M  /desktop/ceph/osd/0
desktop/ceph/osd/1        32.8M  2.58T  32.8M  /desktop/ceph/osd/1
desktop/home              94.4G  2.58T  94.4G  /home
desktop/journal-0         2.06G  2.58T  11.0M  -
desktop/journal-1         2.06G  2.58T  4.81M  -
root@wido-desktop:~#

root@wido-desktop:~# ceph -s
   health HEALTH_OK
   monmap e1: 1 mons at {desktop=[::1]:6789/0}, election epoch 2, quorum 0 desktop
   osdmap e5: 2 osds: 2 up, 2 in
    pgmap v30: 576 pgs: 576 active+clean; 0 bytes data, 83328 KB used, 5283 GB / 5283 GB avail
   mdsmap e1: 0/0/1 up

root@wido-desktop:~#

When I start a "rados load-gen" the OSds will crash directly with this backtrace:

 ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
 1: /usr/bin/ceph-osd() [0x7834ca]
 2: (()+0xfcb0) [0x7fb5c89bfcb0]
 3: (gsignal()+0x35) [0x7fb5c715e425]
 4: (abort()+0x17b) [0x7fb5c7161b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb5c7ab769d]
 6: (()+0xb5846) [0x7fb5c7ab5846]
 7: (()+0xb5873) [0x7fb5c7ab5873]
 8: (()+0xb596e) [0x7fb5c7ab596e]
 9: (object_info_t::decode(ceph::buffer::list::iterator&)+0x7f5) [0x8a2d15]
 10: (object_info_t::object_info_t(ceph::buffer::list&)+0x184) [0x5d2264]
 11: (ReplicatedPG::get_object_context(hobject_t const&, object_locator_t const&, bool)+0x147) [0x588247]
 12: (ReplicatedPG::find_object_context(hobject_t const&, object_locator_t const&, ObjectContext**, bool, snapid_t*)+0x539) [0x592749]
 13: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x5be) [0x5b90de]
 14: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x59a) [0x69ab9a]
 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>)+0x323) [0x5f5ab3]
 16: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>)+0x49b) [0x60becb]
 17: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x31) [0x646771]
 18: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x64699c]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x824426]
 20: (ThreadPool::WorkThread::entry()+0x10) [0x826250]
 21: (()+0x7e9a) [0x7fb5c89b7e9a]
 22: (clone()+0x6d) [0x7fb5c721bcbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I've set debug osd to 20, but nothing that really points to ZFS.

Actions

Copy link

Updated by Wido den Hollander about 11 years ago

I also tried to remove a object manually:

rados -p bench rm obj-srJ1R77RPri4Pwz

That crashed osd.0 directly with this as the log lines (debug osd and filestore = 20)

    -6> 2013-04-04 15:51:31.202646 7f4431e9b700 10 osd.0 16 dequeue_op 0x3697d20 prio 63 cost 0 latency 0.000165 osd_op(client.4115.0:1 obj-srJ1R77RPri4Pwz [delete] 3.dd148d7f e16) v4 pg pg[3.7( v 12'44 (0'0,12'44] local-les=16 n=44 ec=6 les/c 16/16 15/15/15) [0,1] r=0 lpr=15 lcod 0'0 mlcod 0'0 active+clean]
    -5> 2013-04-04 15:51:31.202704 7f4431e9b700  5 --OSD::tracker-- reqid: client.4115.0:1, seq: 1444, time: 2013-04-04 15:51:31.202704, event: reached_pg, request: osd_op(client.4115.0:1 obj-srJ1R77RPri4Pwz [delete] 3.dd148d7f e16) v4
    -4> 2013-04-04 15:51:31.202728 7f4431e9b700 20 osd.0 pg_epoch: 16 pg[3.7( v 12'44 (0'0,12'44] local-les=16 n=44 ec=6 les/c 16/16 15/15/15) [0,1] r=0 lpr=15 lcod 0'0 mlcod 0'0 active+clean] op_has_sufficient_caps pool=3 (bench) owner=0 need_read_cap=0 need_write_cap=1 need_class_read_cap=0 need_class_write_cap=0 -> yes
    -3> 2013-04-04 15:51:31.202766 7f4431e9b700 10 osd.0 pg_epoch: 16 pg[3.7( v 12'44 (0'0,12'44] local-les=16 n=44 ec=6 les/c 16/16 15/15/15) [0,1] r=0 lpr=15 lcod 0'0 mlcod 0'0 active+clean] do_op osd_op(client.4115.0:1 obj-srJ1R77RPri4Pwz [delete] 3.dd148d7f e16) v4 may_write
    -2> 2013-04-04 15:51:31.202845 7f4431e9b700 15 filestore(/desktop/ceph/osd/0) getattr 3.7_head/dd148d7f/obj-srJ1R77RPri4Pwz/head//3 '_'
    -1> 2013-04-04 15:51:31.218193 7f4431e9b700 10 filestore(/desktop/ceph/osd/0) getattr 3.7_head/dd148d7f/obj-srJ1R77RPri4Pwz/head//3 '_' = 100
     0> 2013-04-04 15:51:31.223444 7f4431e9b700 -1 *** Caught signal (Aborted) **
 in thread 7f4431e9b700

 ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
 1: /usr/bin/ceph-osd() [0x7834ca]
 2: (()+0xfcb0) [0x7f4442f2fcb0]
 3: (gsignal()+0x35) [0x7f44416ce425]
 4: (abort()+0x17b) [0x7f44416d1b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f444202769d]
 6: (()+0xb5846) [0x7f4442025846]
 7: (()+0xb5873) [0x7f4442025873]
 8: (()+0xb596e) [0x7f444202596e]
 9: (object_info_t::decode(ceph::buffer::list::iterator&)+0x7f5) [0x8a2d15]
 10: (object_info_t::object_info_t(ceph::buffer::list&)+0x184) [0x5d2264]
 11: (ReplicatedPG::get_object_context(hobject_t const&, object_locator_t const&, bool)+0x147) [0x588247]
 12: (ReplicatedPG::find_object_context(hobject_t const&, object_locator_t const&, ObjectContext**, bool, snapid_t*)+0x539) [0x592749]
 13: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x5be) [0x5b90de]
 14: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x59a) [0x69ab9a]
 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>)+0x323) [0x5f5ab3]
 16: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>)+0x49b) [0x60becb]
 17: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x31) [0x646771]
 18: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x64699c]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x824426]
 20: (ThreadPool::WorkThread::entry()+0x10) [0x826250]
 21: (()+0x7e9a) [0x7f4442f27e9a]
 22: (clone()+0x6d) [0x7f444178bcbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions

Copy link

Updated by Wido den Hollander about 11 years ago

Looking at the logs again and seeing the posts on the mailinglist today it seems to be something with the xattrs indeed:

    -2> 2013-04-04 15:51:31.202845 7f4431e9b700 15 filestore(/desktop/ceph/osd/0) getattr 3.7_head/dd148d7f/obj-srJ1R77RPri4Pwz/head//3 '_'
    -1> 2013-04-04 15:51:31.218193 7f4431e9b700 10 filestore(/desktop/ceph/osd/0) getattr 3.7_head/dd148d7f/obj-srJ1R77RPri4Pwz/head//3 '_' = 100
     0> 2013-04-04 15:51:31.223444 7f4431e9b700 -1 *** Caught signal (Aborted) **
 in thread 7f4431e9b700

I'll see if I can try the patch from Brian Behlendorf as I have a setup ready to test it on.

Actions

Copy link