Project

General

Profile

Actions

Bug #62568

open

Coredump in rados_aio_write_op_operate

Added by Nokia ceph-users 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Dev Interfaces
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
librados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are facing crash issue in the function rados_aio_write_op_operate().Please find the stack trace below,

current version which we are using is, ceph version 14.2.2 and librados version librados2-14.2.2-0.el7.x86_64

bt
#0 0x00007f81f12b8207 in raise () from /lib64/libc.so.6
#1 0x00007f81f12b98f8 in abort () from /lib64/libc.so.6
#2 0x00007f81f12fad27 in __libc_message () from /lib64/libc.so.6
#3 0x00007f81f13015d4 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f81f1304de4 in _int_malloc () from /lib64/libc.so.6
#5 0x00007f81f13071ac in malloc () from /lib64/libc.so.6
#6 0x00007f81f1bc5ecd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#7 0x00007f81f0fe9dfa in Objecter::_prepare_osd_op(Objecter::Op*) () from /lib64/librados.so.2
#8 0x00007f81f0ff43b8 in Objecter::_send_op(Objecter::Op*) () from /lib64/librados.so.2
#9 0x00007f81f0ff6723 in Objecter::_op_submit(Objecter::Op*, ceph::shunique_lock<std::shared_mutex>&, unsigned long*) () from /lib64/librados.so.2
#10 0x00007f81f10013cd in Objecter::_op_submit_with_budget(Objecter::Op*, ceph::shunique_lock<std::shared_mutex>&, unsigned long*, int*) () from /lib64/librados.so.2
#11 0x00007f81f1001610 in Objecter::op_submit(Objecter::Op*, unsigned long*, int*) () from /lib64/librados.so.2
#12 0x00007f81f0fc6f83 in librados::IoCtxImpl::aio_operate(object_t const&, ObjectOperation*, librados::AioCompletionImpl*, SnapContext const&, int, blkin_trace_info const*) () from /lib64/librados.so.2
#13 0x00007f81f0f88d8f in rados_aio_write_op_operate () from /lib64/librados.so.2

Actions #1

Updated by Nokia ceph-users 8 months ago

Is there any recommendation to proceed with latest version to overcome this coredump?

Actions #2

Updated by Radoslaw Zarzynski 8 months ago

At this stage I'm not sure it's actually a bug (not always coredump indicates a bug).
The crash was caused by calling abort() within the memory allocator.
I could imagine a situation when this is a result of e.g. exhausted memory.

And BTW, nautilus is EOL, so testing on quincy / reef would really useful.

Actions #3

Updated by Nokia ceph-users 8 months ago

We are considering for using quincy/reef to test BTW, Even if the memory is exhausted, the expected response from malloc should be memory allocation failure isn't. instead of crash?

Actions #4

Updated by Nokia ceph-users 8 months ago

Hi, Any feedback ?

Actions #5

Updated by Radoslaw Zarzynski 8 months ago

#3 0x00007f81f13015d4 in malloc_printerr () from /lib64/libc.so.6

What if in order to print an error msg a malloc implementation / libc needs a dynamic allocation? ;-)

Actions

Also available in: Atom PDF