Project

General

Profile

Bug #3287

OSD dies when using zfs

Added by Mike Lowe over 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

OSD dies during startup for a new ceph fs when backed by zfs

osd.0.log View (1.5 MB) Mike Lowe, 10/11/2012 10:50 AM

osd.1.log View (8.66 KB) Mike Lowe, 10/11/2012 10:50 AM

osd.2.log View (7.27 KB) Mike Lowe, 10/11/2012 10:50 AM

mon.alpha.log View (8.49 KB) Mike Lowe, 10/11/2012 10:50 AM

mds.alpha.log View (14 MB) Mike Lowe, 10/11/2012 10:50 AM

History

#1 Updated by Josh Durgin over 11 years ago

For reference, osd.0 is failing with this backtrace:

2012-10-11 13:43:42.059623 7f4414285780  0 ceph version 0.52 (commit:e48859474c4944d4ff201ddc9f5fd400e8898173), process ceph-osd, pid 338
2012-10-11 13:43:42.074612 7f4414285780  0 filestore(/data/osd.0) mount FIEMAP ioctl is NOT supported
2012-10-11 13:43:42.074632 7f4414285780  0 filestore(/data/osd.0) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2012-10-11 13:43:42.074856 7f4414285780  0 filestore(/data/osd.0) mount did NOT detect btrfs
2012-10-11 13:43:42.078440 7f4414285780  0 filestore(/data/osd.0) mount syncfs(2) syscall fully supported (by glibc and kernel)
2012-10-11 13:43:42.078607 7f4414285780  0 filestore(/data/osd.0) mount found snaps <>
2012-10-11 13:43:42.083826 7f4414285780  0 filestore(/data/osd.0) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2012-10-11 13:43:42.084270 7f4414285780  1 journal _open /data/osd.0/journal fd 20: 1048576000 bytes, block size 131072 bytes, directio = 0, aio = 0
2012-10-11 13:43:42.084647 7f4414285780  1 journal _open /data/osd.0/journal fd 20: 1048576000 bytes, block size 131072 bytes, directio = 0, aio = 0
2012-10-11 13:43:42.085213 7f4414285780  1 journal close /data/osd.0/journal
2012-10-11 13:43:42.093305 7f4414285780  0 filestore(/data/osd.0) mount FIEMAP ioctl is NOT supported
2012-10-11 13:43:42.093320 7f4414285780  0 filestore(/data/osd.0) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2012-10-11 13:43:42.093467 7f4414285780  0 filestore(/data/osd.0) mount did NOT detect btrfs
2012-10-11 13:43:42.096317 7f4414285780  0 filestore(/data/osd.0) mount syncfs(2) syscall fully supported (by glibc and kernel)
2012-10-11 13:43:42.096426 7f4414285780  0 filestore(/data/osd.0) mount found snaps <>
2012-10-11 13:43:42.100054 7f4414285780  0 filestore(/data/osd.0) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2012-10-11 13:43:42.100268 7f4414285780  1 journal _open /data/osd.0/journal fd 29: 1048576000 bytes, block size 131072 bytes, directio = 0, aio = 0
2012-10-11 13:43:42.100488 7f4414285780  1 journal _open /data/osd.0/journal fd 29: 1048576000 bytes, block size 131072 bytes, directio = 0, aio = 0
2012-10-11 13:43:53.681117 7f4402a29700 -1 *** Caught signal (Aborted) **
 in thread 7f4402a29700

 ceph version 0.52 (commit:e48859474c4944d4ff201ddc9f5fd400e8898173)
 1: /usr/bin/ceph-osd() [0x71d86a]
 2: (()+0xfcb0) [0x7f441372acb0]
 3: (gsignal()+0x35) [0x7f4412304445]
 4: (abort()+0x17b) [0x7f4412307bab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4412c5469d]
 6: (()+0xb5846) [0x7f4412c52846]
 7: (()+0xb5873) [0x7f4412c52873]
 8: (()+0xb596e) [0x7f4412c5296e]
 9: (object_info_t::decode(ceph::buffer::list::iterator&)+0x5f1) [0x827ef1]
 10: (object_info_t::object_info_t(ceph::buffer::list&)+0x184) [0x58ac44]
 11: (ReplicatedPG::get_object_context(hobject_t const&, object_locator_t const&, bool)+0x145) [0x554fb5]
 12: (ReplicatedPG::recover_object_replicas(hobject_t const&, eversion_t)+0xf0) [0x563ae0]
 13: (ReplicatedPG::wait_for_degraded_object(hobject_t const&, std::tr1::shared_ptr<OpRequest>)+0x17b) [0x564ffb]
 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x9c0) [0x573d10]
 15: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x140) [0x6460d0]
 16: (OSD::dequeue_op(PG*)+0x2b2) [0x5ad642]
 17: (ThreadPool::worker()+0x4da) [0x7b9aea]
 18: (ThreadPool::WorkThread::entry()+0xd) [0x5ec7ed]
 19: (()+0x7e9a) [0x7f4413722e9a]
 20: (clone()+0x6d) [0x7f44123c1dbd]

That means it could not decode the on-disk metadata for an object. How did you setup the cluster and zfs?

#2 Updated by Mike Lowe over 11 years ago

Clean new unused zfs filesystems are mounted at /data/osd.N, /data is in the root ext4 filesystem. The ceph filesystem was created with 'mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring'
The configuration is all default except host/ip and the following lines:
max open files = 131072
osd journal = /data/$name/journal
osd journal size = 1000 ; journal size, in megabytes
journal dio = false
filestore xattr use omap = true

#3 Updated by Sage Weil over 11 years ago

  • Status changed from New to Need More Info

Can you reproduce this crash with

debug osd = 20
debug filestore = 20

in your [osd] section of ceph.conf? There isn't quite enough info to determine which metadata is corrupt, or who is writing it.

Thanks!

#4 Updated by Sage Weil over 11 years ago

  • Project changed from CephFS to Ceph

#5 Updated by Sage Weil over 11 years ago

  • Category set to OSD
  • Source changed from Development to Community (user)

#6 Updated by Dan Mick over 11 years ago

Mike, did you ever have a chance to try to reproduce this with more debug on?

#7 Updated by Sage Weil almost 11 years ago

  • Status changed from Need More Info to Resolved

Also available in: Atom PDF