Bug #62568
openCoredump in rados_aio_write_op_operate
0%
Description
We are facing crash issue in the function rados_aio_write_op_operate().Please find the stack trace below,
current version which we are using is, ceph version 14.2.2 and librados version librados2-14.2.2-0.el7.x86_64
bt
#0 0x00007f81f12b8207 in raise () from /lib64/libc.so.6
#1 0x00007f81f12b98f8 in abort () from /lib64/libc.so.6
#2 0x00007f81f12fad27 in __libc_message () from /lib64/libc.so.6
#3 0x00007f81f13015d4 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f81f1304de4 in _int_malloc () from /lib64/libc.so.6
#5 0x00007f81f13071ac in malloc () from /lib64/libc.so.6
#6 0x00007f81f1bc5ecd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#7 0x00007f81f0fe9dfa in Objecter::_prepare_osd_op(Objecter::Op*) () from /lib64/librados.so.2
#8 0x00007f81f0ff43b8 in Objecter::_send_op(Objecter::Op*) () from /lib64/librados.so.2
#9 0x00007f81f0ff6723 in Objecter::_op_submit(Objecter::Op*, ceph::shunique_lock<std::shared_mutex>&, unsigned long*) () from /lib64/librados.so.2
#10 0x00007f81f10013cd in Objecter::_op_submit_with_budget(Objecter::Op*, ceph::shunique_lock<std::shared_mutex>&, unsigned long*, int*) () from /lib64/librados.so.2
#11 0x00007f81f1001610 in Objecter::op_submit(Objecter::Op*, unsigned long*, int*) () from /lib64/librados.so.2
#12 0x00007f81f0fc6f83 in librados::IoCtxImpl::aio_operate(object_t const&, ObjectOperation*, librados::AioCompletionImpl*, SnapContext const&, int, blkin_trace_info const*) () from /lib64/librados.so.2
#13 0x00007f81f0f88d8f in rados_aio_write_op_operate () from /lib64/librados.so.2
Updated by Nokia ceph-users 8 months ago
Is there any recommendation to proceed with latest version to overcome this coredump?
Updated by Radoslaw Zarzynski 8 months ago
At this stage I'm not sure it's actually a bug (not always coredump indicates a bug).
The crash was caused by calling abort()
within the memory allocator.
I could imagine a situation when this is a result of e.g. exhausted memory.
And BTW, nautilus is EOL, so testing on quincy / reef would really useful.
Updated by Nokia ceph-users 8 months ago
We are considering for using quincy/reef to test BTW, Even if the memory is exhausted, the expected response from malloc should be memory allocation failure isn't. instead of crash?
Updated by Radoslaw Zarzynski 8 months ago
#3 0x00007f81f13015d4 in malloc_printerr () from /lib64/libc.so.6
What if in order to print an error msg a malloc implementation / libc needs a dynamic allocation? ;-)