Project

General

Profile

Actions

Bug #23187

closed

librados: segfault when fetching omap vals

Added by Jeff Layton about 6 years ago. Updated about 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Build the attached program with something like:

$ gcc -o test_rados ./test_rados.c -lrados

...then run it against an object in an existing pool. Pass it any old object name (it doesn't matter if it exists). I see consistent segfaults:

(gdb) bt
#0  librados::IoCtxImpl::operate_read (this=this@entry=0x7fffffffd978, oid=..., o=o@entry=0x72f9b0, pbl=pbl@entry=0x0, flags=0)
    at /usr/src/debug/ceph-13.0.1-2356.gf2b88f364515.fc27.x86_64/src/librados/IoCtxImpl.cc:750
#1  0x00007ffff7adb48f in rados_read_op_operate (read_op=0x72f9b0, io=0x7fffffffd978, oid=0x7fffffffde33 "foo", flags=0) at /usr/src/debug/ceph-13.0.1-2356.gf2b88f364515.fc27.x86_64/src/librados/librados.cc:6247
#2  0x0000000000400b6d in main ()
(gdb) list
745      version_t ver;
746    
747      Context *onack = new C_SafeCond(&mylock, &cond, &done, &r);
748    
749      int op = o->ops[0].op.op;
750      ldout(client->cct, 10) << ceph_osd_op_name(op) << " oid=" << oid << " nspace=" << oloc.nspace << dendl;
751      Objecter::Op *objecter_op = objecter->prepare_read_op(oid, oloc,
752                                              *o, snap_seq, pbl, flags,
753                                              onack, &ver);
754      objecter->op_submit(objecter_op);
(gdb) p oloc
$1 = {pool = 140737354129712, key = "\004", nspace = <error: Cannot access memory at address 0x395de2c042aa3930>, hash = 0}

Looks like it's dying because oloc.nspace was bogus.

It's possible I'm doing something wrong here, but I'm seeing other, similar problems in ganesha. I first noticed this with my own hand-built librados packages (based on a relatively recent master branch), but I seem to get the same results with the packages in the Fedora 27 and CentOS7 repos as well.


Files

test_rados.c (1.91 KB) test_rados.c Jeff Layton, 03/01/2018 11:55 AM
test_rados.c (1.96 KB) test_rados.c Updated reproducer Jeff Layton, 03/01/2018 03:05 PM
Actions #1

Updated by Patrick Donnelly about 6 years ago

  • Project changed from Ceph to RADOS
  • Subject changed from segfault when fetching omap vals to librados: segfault when fetching omap vals
  • Category deleted (librados)
  • Source set to Development
  • Component(RADOS) librados added
Actions #2

Updated by Jeff Layton about 6 years ago

Sightly updated reproducer. Doug F. suggested that it might be the clnt going out of scope in cluster_connect, but changing that didn't help anything. Patrick suggested using rados_create2, but that also didn't help.

Actions #3

Updated by Jeff Layton about 6 years ago

  • Description updated (diff)
Actions #4

Updated by Jeff Layton about 6 years ago

  • Project changed from RADOS to Ceph
  • Status changed from New to Rejected
  • Source deleted (Development)

Ahh this was my bug. I was passing &io_ctx to rados_read_op_operate instead of io_ctx. Since rados_ioctx_t is an aliased void pointer, the compiler can't catch this. I wonder if we ought to change that to something better-defined?

In any case, false alarm on this bug.

Actions

Also available in: Atom PDF