Project

General

Profile

Actions

Bug #15017

closed

ceph-osd segfault during start process

Added by Sergey Baukin about 8 years ago. Updated about 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph-osd gets segmentation fault signal in 10 seconds after start.

(Ubuntu 14.04, ceph 0.94.6-1trusty)

2016-03-09 01:08:02.256086 7fb88ad0c8c0 0 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403), process ceph-osd, pid 91765
2016-03-09 01:08:02.276958 7fb88ad0c8c0 0 filestore(/var/lib/ceph/osd/ceph-0) backend generic (magic 0xef53)
2016-03-09 01:08:02.278857 7fb88ad0c8c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is supported and appears to work
2016-03-09 01:08:02.278878 7fb88ad0c8c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2016-03-09 01:08:02.357483 7fb88ad0c8c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2016-03-09 01:08:02.361354 7fb88ad0c8c0 0 filestore(/var/lib/ceph/osd/ceph-0) limited size xattrs
2016-03-09 01:08:02.428515 7fb88ad0c8c0 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2016-03-09 01:08:02.430027 7fb88ad0c8c0 1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2016-03-09 01:08:02.430033 7fb88ad0c8c0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 19: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0
2016-03-09 01:08:02.717829 7fb88ad0c8c0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 17: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0
2016-03-09 01:08:02.722408 7fb88ad0c8c0 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2016-03-09 01:08:02.726077 7fb88ad0c8c0 0 osd.0 6186 crush map has features 35467296768, adjusting msgr requires for clients
2016-03-09 01:08:02.726090 7fb88ad0c8c0 0 osd.0 6186 crush map has features 35467296768 was 8705, adjusting msgr requires for mons
2016-03-09 01:08:02.726096 7fb88ad0c8c0 0 osd.0 6186 crush map has features 35467296768, adjusting msgr requires for osds
2016-03-09 01:08:02.726110 7fb88ad0c8c0 0 osd.0 6186 load_pgs
2016-03-09 01:08:06.511388 7fb88ad0c8c0 0 osd.0 6186 load_pgs opened 128 pgs
2016-03-09 01:08:06.517560 7fb88ad0c8c0 -1 osd.0 6186 log_to_monitors {default=true}
2016-03-09 01:08:06.625070 7fb8764d9700 0 osd.0 6186 ignoring osdmap until we have initialized
2016-03-09 01:08:06.625157 7fb8764d9700 0 osd.0 6186 ignoring osdmap until we have initialized
2016-03-09 01:08:06.634312 7fb88ad0c8c0 0 osd.0 6186 done with init, starting boot process
2016-03-09 01:08:09.024210 7fb860895700 0 -
10.0.253.128:6809/91765 >> 10.0.253.129:6804/29372 pipe(0x1f8af000 sd=23 :6809 s=0 pgs=0 cs=0 l=0 c=0x1ec5a100).accept connect_seq 0 vs existing 0 state connecting
2016-03-09 01:08:11.221110 7fb85018f700 0 -- 10.0.253.128:6809/91765 >> 10.0.253.128:6804/2770 pipe(0x1f8aa000 sd=301 :6809 s=0 pgs=0 cs=0 l=0 c=0x1ec59a20).accept connect_seq 0 vs existing 0 state connecting
2016-03-09 01:08:11.221467 7fb84ff8d700 0 -- 10.0.253.128:6809/91765 >> 10.0.253.128:6800/2567 pipe(0x21f1f000 sd=302 :6809 s=0 pgs=0 cs=0 l=0 c=0x1ec59ce0).accept connect_seq 0 vs existing 0 state connecting
2016-03-09 01:08:11.222255 7fb84fd8b700 0 -- 10.0.253.128:6809/91765 >> 10.0.253.129:6800/2931 pipe(0x21f24000 sd=305 :6809 s=0 pgs=0 cs=0 l=0 c=0x218fc260).accept connect_seq 0 vs existing 0 state connecting
2016-03-09 01:08:11.646970 7fb86d0c3700 0 log_channel(cluster) log [INF] : 21.26 restarting backfill on osd.7 from (4683'1084551,6186'1087556] 0//0//-1 to 6186'1087557
2016-03-09 01:08:11.718170 7fb86d0c3700 0 log_channel(cluster) log [INF] : 21.71 restarting backfill on osd.7 from (4681'1106580,6186'1109607] 0//0//-1 to 6186'1109608
2016-03-09 01:08:11.769547 7fb86c8c2700 0 log_channel(cluster) log [INF] : 21.6d restarting backfill on osd.7 from (4681'1051185,6186'1054394] 0//0//-1 to 6186'1054395
2016-03-09 01:08:11.941220 7fb8688ba700 -1 ** Caught signal (Segmentation fault) *
in thread 7fb8688ba700

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0xaaff6a]
2: (()+0x10340) [0x7fb889c0b340]
3: (()+0x74aaa) [0x7fb8889c9aaa]
4: (ReplicatedPG::make_writeable(ReplicatedPG::OpContext*)+0x139) [0x8851e9]
5: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x3e3) [0x886cc3]
6: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5ba) [0x88730a]
7: (ReplicatedPG::do_op(std::tr1::shared_ptr&lt;OpRequest&gt;&)+0x4559) [0x88cee9]
8: (ReplicatedPG::do_request(std::tr1::shared_ptr&lt;OpRequest&gt;&, ThreadPool::TPHandle&)+0x66a) [0x82702a]
9: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x3bd) [0x6961dd]
10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x338) [0x696708]
11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x875) [0xb98555]
12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb9a670]
13: (()+0x8182) [0x7fb889c03182]
14: (clone()+0x6d) [0x7fb88816e47d]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
Actions #1

Updated by Sergey Baukin about 8 years ago

OSD starts ok after renaming one certain pg directory

/var/local/osd0/current# mv 21.20_head moved-21.20_head

Actions #2

Updated by Samuel Just about 8 years ago

Can you reproduce with

debug osd = 20
debug ms = 1
debug filestore = 20

or get a backtrace with line numbers from gdb?

Actions #3

Updated by Samuel Just about 8 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF