Project

General

Profile

Actions

Bug #51030

open

osd crush during writing to EC pool when enabling jaeger tracing

Added by Tomohiro Misono almost 3 years ago. Updated almost 3 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On cent8(x86_64)

1. compiled with -DWITH_JAEGER=ON
2. starts vstart cluster
3. write to ec pool (i.e. rados bench write)
4. some OSD crush

crushed osd's log:

     0> 2021-06-01T14:44:14.052+0900 7f0325e6a700 -1 *** Caught signal (Aborted) **
 in thread 7f0325e6a700 thread_name:tp_osd_tp

 ceph version Development (no_version) quincy (dev)
 1: /workdir/ceph/build/bin/ceph-osd(+0x3374582) [0x55df7847a582]
 2: /lib64/libpthread.so.0(+0x12b20) [0x7f034c135b20]
 3: gsignal()
 4: abort()
 5: /lib64/libc.so.6(+0x21b09) [0x7f034ad87b09]
 6: /lib64/libc.so.6(+0x2fde6) [0x7f034ad95de6]
 7: (boost::intrusive_ptr<OpRequest>::operator->() const+0x37) [0x55df77912fe5]
 8: (ECBackend::try_reads_to_commit()+0x173b) [0x55df780e4df3]
 9: (ECBackend::check_ops()+0x28) [0x55df780e5a9a]
 10: (ECBackend::handle_sub_write_reply(pg_shard_t, ECSubWriteReply const&, ZTracer::Trace const&)+0x3b1) [0x55df780db901]
 11: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2eb) [0x55df780d7fff]
 12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x78) [0x55df77d48a06]
 13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xc9c) [0x55df77ba7a72]
 14: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x38e) [0x55df779c1e96]
 15: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x98) [0x55df77e75206]
 16: (ceph::osd::scheduler::OpSchedulerItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x4b) [0x55df779efaa9]
 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x3d18) [0x55df779d153c]
 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x580) [0x55df785159b8]
 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x25) [0x55df78517525]
 20: (Thread::entry_wrapper()+0x83) [0x55df78501c7d]
 21: (Thread::_entry_func(void*)+0x18) [0x55df78501bf0]
 22: /lib64/libpthread.so.0(+0x814a) [0x7f034c12b14a]
 23: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

It seems this is a simple null pointer dereferences problem. I will send a PR.
Note that current master has build problem for jaeger: #51029

Actions #2

Updated by Neha Ojha almost 3 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)
  • Status changed from New to Fix Under Review
  • Pull request ID set to 41604
Actions

Also available in: Atom PDF