Bug #23187
closedlibrados: segfault when fetching omap vals
0%
Description
Build the attached program with something like:
$ gcc -o test_rados ./test_rados.c -lrados
...then run it against an object in an existing pool. Pass it any old object name (it doesn't matter if it exists). I see consistent segfaults:
(gdb) bt #0 librados::IoCtxImpl::operate_read (this=this@entry=0x7fffffffd978, oid=..., o=o@entry=0x72f9b0, pbl=pbl@entry=0x0, flags=0) at /usr/src/debug/ceph-13.0.1-2356.gf2b88f364515.fc27.x86_64/src/librados/IoCtxImpl.cc:750 #1 0x00007ffff7adb48f in rados_read_op_operate (read_op=0x72f9b0, io=0x7fffffffd978, oid=0x7fffffffde33 "foo", flags=0) at /usr/src/debug/ceph-13.0.1-2356.gf2b88f364515.fc27.x86_64/src/librados/librados.cc:6247 #2 0x0000000000400b6d in main () (gdb) list 745 version_t ver; 746 747 Context *onack = new C_SafeCond(&mylock, &cond, &done, &r); 748 749 int op = o->ops[0].op.op; 750 ldout(client->cct, 10) << ceph_osd_op_name(op) << " oid=" << oid << " nspace=" << oloc.nspace << dendl; 751 Objecter::Op *objecter_op = objecter->prepare_read_op(oid, oloc, 752 *o, snap_seq, pbl, flags, 753 onack, &ver); 754 objecter->op_submit(objecter_op); (gdb) p oloc $1 = {pool = 140737354129712, key = "\004", nspace = <error: Cannot access memory at address 0x395de2c042aa3930>, hash = 0}
Looks like it's dying because oloc.nspace was bogus.
It's possible I'm doing something wrong here, but I'm seeing other, similar problems in ganesha. I first noticed this with my own hand-built librados packages (based on a relatively recent master branch), but I seem to get the same results with the packages in the Fedora 27 and CentOS7 repos as well.
Files
Updated by Patrick Donnelly about 6 years ago
- Project changed from Ceph to RADOS
- Subject changed from segfault when fetching omap vals to librados: segfault when fetching omap vals
- Category deleted (
librados) - Source set to Development
- Component(RADOS) librados added
Updated by Jeff Layton about 6 years ago
- File test_rados.c test_rados.c added
Sightly updated reproducer. Doug F. suggested that it might be the clnt going out of scope in cluster_connect, but changing that didn't help anything. Patrick suggested using rados_create2, but that also didn't help.
Updated by Jeff Layton about 6 years ago
- Project changed from RADOS to Ceph
- Status changed from New to Rejected
- Source deleted (
Development)
Ahh this was my bug. I was passing &io_ctx to rados_read_op_operate instead of io_ctx. Since rados_ioctx_t is an aliased void pointer, the compiler can't catch this. I wonder if we ought to change that to something better-defined?
In any case, false alarm on this bug.