Project

General

Profile

Actions

Bug #22253

closed

"rbd info" crashed: stack smashing detected

Added by Sebastian Wagner over 6 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Environment: quite small vstart cluster.

This is the stack trace:

#3  0x00007fffed44711c in __GI___fortify_fail (msg=<optimized out>, msg@entry=0x7fffed4bd441 "stack smashing detected") at fortify_fail.c:37
#4  0x00007fffed4470c0 in __stack_chk_fail () at stack_chk_fail.c:28
#5  0x00007ffff78f0beb in librbd::ImageCtx::perf_start (this=this@entry=0x555555b7bf70, name="librbd-8c39e2ae8944a-rbd-huge2") at /home/sebastian/Repos/ceph/src/librbd/ImageCtx.cc:397
#6  0x00007ffff78f3cb4 in librbd::ImageCtx::init (this=0x555555b7bf70) at /home/sebastian/Repos/ceph/src/librbd/ImageCtx.cc:275
#7  0x00007ffff799dacd in librbd::image::OpenRequest<librbd::ImageCtx>::send_register_watch (this=this@entry=0x555555b7fe60) at /home/sebastian/Repos/ceph/src/librbd/image/OpenRequest.cc:477
#8  0x00007ffff79a3102 in librbd::image::OpenRequest<librbd::ImageCtx>::handle_v2_apply_metadata (this=this@entry=0x555555b7fe60, result=result@entry=0x7fffb77fa374) at /home/sebastian/Repos/ceph/src/librbd/image/OpenRequest.cc:471
#9  0x00007ffff79a351f in librbd::util::detail::rados_state_callback<librbd::image::OpenRequest<librbd::ImageCtx>, &librbd::image::OpenRequest<librbd::ImageCtx>::handle_v2_apply_metadata, true> (c=<optimized out>, arg=0x555555b7fe60) at /home/sebastian/Repos/ceph/src/librbd/Utils.h:39
#10 0x00007ffff75d678d in librados::C_AioComplete::finish (this=0x7fffd0001b60, r=<optimized out>) at /home/sebastian/Repos/ceph/src/librados/AioCompletionImpl.h:169
#11 0x0000555555613949 in Context::complete (this=0x7fffd0001b60, r=<optimized out>) at /home/sebastian/Repos/ceph/src/include/Context.h:70
#12 0x00007fffeeab6010 in Finisher::finisher_thread_entry (this=0x555555acb3e8) at /home/sebastian/Repos/ceph/src/common/Finisher.cc:72
#13 0x00007fffee3a86ba in start_thread (arg=0x7fffb77fe700) at pthread_create.c:333
#14 0x00007fffed4353dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


Files

gdb_session.log (55.5 KB) gdb_session.log Sebastian Wagner, 11/27/2017 02:31 PM
rbd_info_out.log (59.7 KB) rbd_info_out.log Sebastian Wagner, 11/27/2017 02:33 PM
client.admin.29881.log (58.7 KB) client.admin.29881.log Sebastian Wagner, 11/27/2017 02:34 PM
rbd_ls_out.log (19.4 KB) rbd_ls_out.log Sebastian Wagner, 11/27/2017 02:35 PM
ceph.conf (5.66 KB) ceph.conf Sebastian Wagner, 11/27/2017 02:37 PM
valgrind_out.log (18.8 KB) valgrind_out.log Sebastian Wagner, 11/27/2017 03:29 PM
Actions #1

Updated by Joao Eduardo Luis over 6 years ago

  • Description updated (diff)
Actions #2

Updated by Sebastian Wagner over 6 years ago

my correct version number is:

git describe
$ v12.2.0-1124-g5e519ae
Actions #3

Updated by Jason Dillaman over 6 years ago

  • Status changed from New to Need More Info

@Sebastian I.: please retest on the latest available version. Your line numbers do not align with v12.2.0.

Actions #4

Updated by Sebastian Wagner over 6 years ago

I don't think it is easy to reproduce it, because

  • That RBD is untouched, thus no data was ever written to this RBD, no snapshots were created, it wasn't resized, etc.
  • I had restarted the cluster many times since the creation of this RBD.
Actions #5

Updated by Jason Dillaman over 6 years ago

@Sebastian I.: without line numbers that actually align to the code at the mentioned version, there really isn't much I can do to assist. Perhaps attempt to run it through "valgrind --tool=memcheck rbd XYZ".

Actions #6

Updated by Sebastian Wagner over 6 years ago

Added valgrind output.

@Jason Borden, should I recompile and retest on the latest luminous branch or on v12.2.1?

Actions #7

Updated by Jason Dillaman over 6 years ago

@Sebastian I.: that Valgrind output doesn't help since it failed on an "unknown instruction" error. Can you reproduce on distro or Ceph-provided packages instead of your home-grown build?

Actions #8

Updated by Sebastian Wagner over 6 years ago

Jason Dillaman wrote:

Can you reproduce on distro or Ceph-provided packages instead of your home-grown build?

If there is a documentation of how to run a cluster created by vstart.sh with distro packages, I'd give it a try.

Actions #9

Updated by Jason Dillaman over 6 years ago

@Sebastian I.: vstart is for development. Just install the Ceph client packages on a VM, copy the vstart-generated ceph.conf to that host, and retest.

Actions #10

Updated by Sebastian Wagner over 6 years ago

Jason Dillaman wrote:

@Sebastian I.: vstart is for development. Just install the Ceph client packages on a VM, copy the vstart-generated ceph.conf to that host, and retest.

Ok, but I also have to make sure, daemons are using the existing data, e.g OSDs.

Actions #11

Updated by Jason Dillaman over 6 years ago

@Sebastian I.: yup, that's why you would copy the "ceph.conf" so that the VM can connect to your vstart-created cluster.

Actions #12

Updated by Sebastian Wagner over 6 years ago

So, I recompiled 12.2.1 and can no longer reproduce this one. seems to be gone now.

Actions #13

Updated by Jason Dillaman over 6 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF