Bug #23187: librados: segfault when fetching omap vals - Ceph - Ceph

Actions

Copy link

Bug #23187

closed

librados: segfault when fetching omap vals

Added by Jeff Layton about 6 years ago. Updated about 6 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Build the attached program with something like:

$ gcc -o test_rados ./test_rados.c -lrados

...then run it against an object in an existing pool. Pass it any old object name (it doesn't matter if it exists). I see consistent segfaults:

(gdb) bt
#0  librados::IoCtxImpl::operate_read (this=this@entry=0x7fffffffd978, oid=..., o=o@entry=0x72f9b0, pbl=pbl@entry=0x0, flags=0)
    at /usr/src/debug/ceph-13.0.1-2356.gf2b88f364515.fc27.x86_64/src/librados/IoCtxImpl.cc:750
#1  0x00007ffff7adb48f in rados_read_op_operate (read_op=0x72f9b0, io=0x7fffffffd978, oid=0x7fffffffde33 "foo", flags=0) at /usr/src/debug/ceph-13.0.1-2356.gf2b88f364515.fc27.x86_64/src/librados/librados.cc:6247
#2  0x0000000000400b6d in main ()
(gdb) list
745      version_t ver;
746    
747      Context *onack = new C_SafeCond(&mylock, &cond, &done, &r);
748    
749      int op = o->ops[0].op.op;
750      ldout(client->cct, 10) << ceph_osd_op_name(op) << " oid=" << oid << " nspace=" << oloc.nspace << dendl;
751      Objecter::Op *objecter_op = objecter->prepare_read_op(oid, oloc,
752                                              *o, snap_seq, pbl, flags,
753                                              onack, &ver);
754      objecter->op_submit(objecter_op);
(gdb) p oloc
$1 = {pool = 140737354129712, key = "\004", nspace = <error: Cannot access memory at address 0x395de2c042aa3930>, hash = 0}

Looks like it's dying because oloc.nspace was bogus.

It's possible I'm doing something wrong here, but I'm seeing other, similar problems in ganesha. I first noticed this with my own hand-built librados packages (based on a relatively recent master branch), but I seem to get the same results with the packages in the Fedora 27 and CentOS7 repos as well.

Files

Download all files

test_rados.c (1.91 KB) test_rados.c		Jeff Layton, 03/01/2018 11:55 AM
test_rados.c (1.96 KB) test_rados.c	Updated reproducer	Jeff Layton, 03/01/2018 03:05 PM

Actions

Copy link

Updated by Patrick Donnelly about 6 years ago

Project changed from Ceph to RADOS
Subject changed from segfault when fetching omap vals to librados: segfault when fetching omap vals
Category deleted (~~librados~~)
Source set to Development
Component(RADOS) librados added

Actions

Copy link

Updated by Jeff Layton about 6 years ago

File test_rados.c test_rados.c added

Sightly updated reproducer. Doug F. suggested that it might be the clnt going out of scope in cluster_connect, but changing that didn't help anything. Patrick suggested using rados_create2, but that also didn't help.

Actions

Copy link

Updated by Jeff Layton about 6 years ago

Description updated (diff)

Actions

Copy link

Updated by Jeff Layton about 6 years ago

Project changed from RADOS to Ceph
Status changed from New to Rejected
Source deleted (~~Development~~)

Ahh this was my bug. I was passing &io_ctx to rados_read_op_operate instead of io_ctx. Since rados_ioctx_t is an aliased void pointer, the compiler can't catch this. I wonder if we ought to change that to something better-defined?

In any case, false alarm on this bug.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #23187

librados: segfault when fetching omap vals

Updated by Patrick Donnelly about 6 years ago

Updated by Jeff Layton about 6 years ago

Updated by Jeff Layton about 6 years ago

Updated by Jeff Layton about 6 years ago