Actions
Bug #51030
openosd crush during writing to EC pool when enabling jaeger tracing
Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
On cent8(x86_64)
1. compiled with -DWITH_JAEGER=ON
2. starts vstart cluster
3. write to ec pool (i.e. rados bench write)
4. some OSD crush
crushed osd's log:
0> 2021-06-01T14:44:14.052+0900 7f0325e6a700 -1 *** Caught signal (Aborted) ** in thread 7f0325e6a700 thread_name:tp_osd_tp ceph version Development (no_version) quincy (dev) 1: /workdir/ceph/build/bin/ceph-osd(+0x3374582) [0x55df7847a582] 2: /lib64/libpthread.so.0(+0x12b20) [0x7f034c135b20] 3: gsignal() 4: abort() 5: /lib64/libc.so.6(+0x21b09) [0x7f034ad87b09] 6: /lib64/libc.so.6(+0x2fde6) [0x7f034ad95de6] 7: (boost::intrusive_ptr<OpRequest>::operator->() const+0x37) [0x55df77912fe5] 8: (ECBackend::try_reads_to_commit()+0x173b) [0x55df780e4df3] 9: (ECBackend::check_ops()+0x28) [0x55df780e5a9a] 10: (ECBackend::handle_sub_write_reply(pg_shard_t, ECSubWriteReply const&, ZTracer::Trace const&)+0x3b1) [0x55df780db901] 11: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2eb) [0x55df780d7fff] 12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x78) [0x55df77d48a06] 13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xc9c) [0x55df77ba7a72] 14: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x38e) [0x55df779c1e96] 15: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x98) [0x55df77e75206] 16: (ceph::osd::scheduler::OpSchedulerItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x4b) [0x55df779efaa9] 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x3d18) [0x55df779d153c] 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x580) [0x55df785159b8] 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x25) [0x55df78517525] 20: (Thread::entry_wrapper()+0x83) [0x55df78501c7d] 21: (Thread::_entry_func(void*)+0x18) [0x55df78501bf0] 22: /lib64/libpthread.so.0(+0x814a) [0x7f034c12b14a] 23: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
It seems this is a simple null pointer dereferences problem. I will send a PR.
Note that current master has build problem for jaeger: #51029
Updated by Tomohiro Misono almost 3 years ago
Updated by Neha Ojha almost 3 years ago
- Project changed from Ceph to RADOS
- Category deleted (
OSD) - Status changed from New to Fix Under Review
- Pull request ID set to 41604
Actions