Bug #3300
ceph::buffer::end_of_buffer isn't caught
0%
Description
A program using librados terminates with the following uncaught exception:
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer' what(): buffer::end_of_buffer **24564** custom_sigaction, signal 6 at 0x516E87C: VALGRIND_PRINTF_BACKTRACE(char const*, ...) (valgrind.h:3771) ==24564== by 0x516EE64: img2::custom_sigaction(int) (conf.cpp:77) ==24564== by 0x8B004EF: ??? (in /lib/x86_64-linux-gnu/libc-2.13.so) ==24564== by 0x8B00474: raise (raise.c:64) ==24564== by 0x8B036EF: abort (abort.c:92) ==24564== by 0x83946DC: __gnu_cxx::__verbose_terminate_handler() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17) ==24564== by 0x83927E5: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17) ==24564== by 0x8392812: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17) ==24564== by 0x8392A3D: __cxa_throw (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17) ==24564== by 0xD3B2762: ceph::buffer::list::iterator::get_current_ptr() (buffer.cc:509) ==24564== by 0xD356DBE: Pipe::read_message(Message**) (Pipe.cc:1492) ==24564== by 0xD364CFB: Pipe::reader() (Pipe.cc:1138)
Operating system is Debian Wheezy.
dpkg -l | grep ceph ii ceph 0.52-1~bpo70+1 amd64 distributed storage and file system ii ceph-common 0.52-1~bpo70+1 amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-fs-common 0.52-1~bpo70+1 amd64 common utilities to mount and interact with a ceph file system ii ceph-fuse 0.52-1~bpo70+1 amd64 FUSE-based client for the Ceph distributed file system ii ceph-mds 0.52-1~bpo70+1 amd64 metadata server for the ceph distributed file system ii libcephfs1 0.52-1~bpo70+1 amd64 Ceph distributed file system client library
History
#1 Updated by Artem Grinblat almost 11 years ago
Same problem with librados-dev 0.53-1~bpo70+1
.
#2 Updated by Artem Grinblat almost 11 years ago
GDB stack trace:
(gdb) bt full #0 *__GI_abort () at abort.c:59 act = {__sigaction_handler = {sa_handler = 0x100000000, sa_sigaction = 0x100000000}, sa_mask = {__val = {0, 0, 140363076780821, 0, 0, 140363068152736, 140363148288000, 0, 4294967295, 0, 5, 3080464, 0, 28611552, 4790, 140363076640768}}, sa_flags = -679974414, sa_restorer = 0x5} sigs = {__val = {10, 140363068152736, 1, 140362987698374, 140363071802776, 29882848, 140362489698416, 140363146250837, 140362489702080, 140363068962717, 1, 0, 31033520, 10, 140362489702144, 140363148292096}} #1 0x00007fa8d358f6dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 No symbol table info available. #2 0x00007fa8d358d7e6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 No symbol table info available. #3 0x00007fa8d358d813 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 No symbol table info available. #4 0x00007fa8d358da3e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 No symbol table info available. #5 0x00007fa8ce004bd3 in ceph::buffer::list::iterator::get_current_ptr (this=<optimized out>) at common/buffer.cc:509 No locals. #6 0x00007fa8cdfa82fd in Pipe::read_message (this=this@entry=0x1b493e0, pm=pm@entry=0x7fa8b0569c38) at msg/Pipe.cc:1482 p = <optimized out> bp = {_raw = 0x0, _off = 0, _len = 82} read = <optimized out> got = <optimized out> rxbuf = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = { _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x1d98840, _M_prev = 0x1c84e70}}}, <No data fields>}, _len = 4872, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7fa8b0569a20, ls = 0x7fa8b0569a20, off = 0, p = {_M_node = 0x7fa8b0569a20}, p_off = 0}} offset = <optimized out> left = 4790 newbuf = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = { _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x7fa8b05699d0, _M_prev = 0x7fa8b05699d0}}}, <No data fields>}, _len = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = { bl = 0x7fa8b05699d0, ls = 0x7fa8b05699d0, off = 0, p = {_M_node = 0x7fa8b05699d0}, p_off = 0}} blp = {bl = 0x7fa8c0100e38, ls = 0x7fa8c0100e38, off = 82, p = {_M_node = 0x7fa8c0100e38}, p_off = 0} rxbuf_version = 1 ret = -1 front = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = { _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x1d99060, _M_prev = 0x1d99060}}}, <No data fields>}, _len = 128, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7fa8b05698e0, ls = 0x7fa8b05698e0, off = 0, p = {_M_node = 0x7fa8b05698e0}, p_off = 0}} data_len = 4872 header = {seq = {v = 59}, tid = {v = 121}, type = {v = 43}, priority = {v = 127}, version = {v = 4}, front_len = {v = 128}, middle_len = {v = 0}, data_len = {v = 4872}, data_off = {v = 0}, src = {type = 4 '\004', num = {v = 1}}, compat_version = {v = 2}, reserved = {v = 0}, crc = {v = 3483066176}} middle = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = { _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x7fa8b0569930, _M_prev = 0x7fa8b0569930}}}, <No data fields>}, _len = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = { bl = 0x7fa8b0569930, ls = 0x7fa8b0569930, off = 0, p = {_M_node = 0x7fa8b0569930}, p_off = 0}} message_size = 5000 header_crc = <optimized out> data = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = { _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x1c7f9b0, _M_prev = 0x1c7f9b0}}}, <No data fields>}, _len = 82, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7fa8b0569980, ls = 0x7fa8b0569980, off = 0, p = {_M_node = 0x7fa8b0569980}, p_off = 0}} front_len = <optimized out> data_off = 0 aborted = <optimized out> message = <optimized out> recv_stamp = {tv = {tv_sec = 1350648484, tv_nsec = 631445000}} throttle_stamp = {tv = {tv_sec = 1350648484, tv_nsec = 631467000}} footer = {front_crc = {v = 31033408}, middle_crc = {v = 0}, data_crc = {v = 29904496}, flags = 0 '\000'} middle_len = <optimized out> waited_on_throttle = <optimized out> #7 0x00007fa8cdfb60a4 in Pipe::reader (this=0x1b493e0) at msg/Pipe.cc:1128 m = 0x0 r = <optimized out> buf = '\000' <repeats 79 times> tag = 7 '\a' __PRETTY_FUNCTION__ = "void Pipe::reader()" #8 0x00007fa8cdfb87bd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:45 No locals. #9 0x00007fa8d2af5b50 in start_thread (arg=<optimized out>) at pthread_create.c:304 __res = <optimized out> pd = 0x7fa8b056a700 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140362489702144, -6857338112705148839, 28270096, 140362489702848, 140363148398656, 7, 6882828424773519449, 6882751542522955865}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "start_thread" #10 0x00007fa8d2de370d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#3 Updated by Josh Durgin almost 11 years ago
What are you doing with librados that's causing this? A log with 'debug ms = 1' and 'debug rados = 20' would help, as well as a description of any objects that aren't created as part of the log.
#4 Updated by Artem Grinblat almost 11 years ago
Here's the code that's causing this:
// iterating: for (auto it = radosCtx->objects_begin(), end = radosCtx->objects_end(); !EXIT && it != end; ++it) { const string& id = it->first; // extracting: librados::bufferlist bl; int rc = radosCtx->getxattr (id, "img2meta", bl); if (rc == 0 && bl.length()) { bl.copy (0, bl.length(), meta); // Have meta in xattr. } else { uint64_t size; time_t time; rc = radosCtx->stat (id, &size, &time); if (rc == -ENOENT) continue; // File was removed in the process, go on. if (rc) {log_warn ("ceph stat (" << id << "): " << strerror (-rc)); continue;} if (size == 0) {log_warn ("ceph - empty file? " << id); continue;} rc = radosCtx->read (id, bl, size, 0); if (rc < 0) {log_warn ("ceph read (" << id << "): " << strerror (-rc)); continue;} if (rc != (int) size) {log_warn ("ceph read: expected " << size << " but got " << rc); continue;} string value; value.reserve (bl.length()); bl.copy (0, bl.length(), value); Img2Value iv; iv.ParseFromString (value); ptree mt; infoToPtree (iv, mt); std::ostringstream buf; write_json (buf, mt, false); meta = buf.str(); bl.clear(); bl.append (meta); radosCtx->setxattr (id, "img2meta", bl); // Cache meta in Ceph's xattr. }
If I leave only the "iterating" part it works okay, but with the "extracting" part program dies with that exception.
I'll try to produce the log.
Should it be from a single osd or from all of them?
#5 Updated by Sage Weil almost 11 years ago
Also curious what version of librados is running vs what version of ceph-osd it is talking to.
#6 Updated by Artem Grinblat almost 11 years ago
Version is the already mentioned "librados-dev 0.53-1~bpo70+1".
There are four OSD servers, all track the same Debian apt repository.
#7 Updated by Artem Grinblat almost 11 years ago
E.g. both ceph and librados are from:
deb http://ceph.com/debian-testing/ wheezy main
#8 Updated by Josh Durgin almost 11 years ago
I was thinking a log of the client side. I'm not sure if the log from an osd will be necessary. You can set the debugging and e.g. 'log file = /tmp/librados.log' via librados' conf_set() or ceph.conf if your program is using conf_read_file().
#9 Updated by Artem Grinblat almost 11 years ago
- File librados.log View added
Attached is the log (with "log file = /tmp/librados.log\n debug ms = 1\n debug rados = 20" in ceph.conf).
#10 Updated by Artem Grinblat almost 11 years ago
Modifying the "extracting" part not to reuse the bufferlist seems to workaround the problem:
librados::bufferlist bl1; int rc = radosCtx->getxattr (id, "img2meta", bl1); if (rc == 0 && bl1.length()) { bl1.copy (0, bl1.length(), meta); // Have meta in xattr. } else { uint64_t size; time_t time; rc = radosCtx->stat (id, &size, &time); if (rc == -ENOENT) continue; // File was removed in the process, go on. if (rc) {log_warn ("ceph stat (" << id << "): " << strerror (-rc)); continue;} if (size == 0) {log_warn ("ceph - empty file? " << id); continue;} glim::NsecTimer readTimer; librados::bufferlist bl2; rc = radosCtx->read (id, bl2, size, 0); if (rc < 0) {log_warn ("ceph read (" << id << "): " << strerror (-rc)); continue;} if (rc != (int) size) {log_warn ("ceph read: expected " << size << " but got " << rc); continue;} string value; value.reserve (bl2.length()); bl2.copy (0, bl2.length(), value); log_info ("ceph: read " << id << " in " << readTimer.seconds (4) << " seconds (" << value.size() << " bytes)"); Img2Value iv; iv.ParseFromString (value); ptree mt; infoToPtree (iv, mt); std::ostringstream buf; write_json (buf, mt, false); meta = buf.str(); librados::bufferlist bl3; bl3.clear(); bl3.append (meta); radosCtx->setxattr (id, "img2meta", bl3); // Cache meta in Ceph's xattr. }
#11 Updated by Josh Durgin almost 11 years ago
So I think there are two things here:
1) your return code check for getxattr will never pass - getxattr() returns the length read of the attribute read, so it'll never be 0 if bl.length() is non-zero. This is documented in the C api, but not the C++ one. That should be documented.
2) the bufferlist interface is confusing and easy to use in a way that won't do what you expect. If you used a single buffer list, but called bl.clear() before doing the read, it should work, just like using separate bufferlists worked. This requirement to use an empty bufferlist should be fixed or documented, and should not cause a crash.
#12 Updated by Artem Grinblat almost 11 years ago
1) yes, some documentation in librados.hpp would be nice
2) that `ctx->read (id, bl, size, 0)` might append into `bl` is intuitive enough, i don't see an interface problem here! in my case it would've been caught by protobuf's ParseFromString. the real problem is that it shouldn't crash.
#13 Updated by Artem Grinblat almost 11 years ago
2) e.g. if `ctx->read` needs an empty list, it should throw from `ctx->read`, where the application can catch exceptions and make sense of things, not from the internal Ceph thread where it terminates the application and looks like some internal bug.
#14 Updated by Josh Durgin almost 11 years ago
- Status changed from New to In Progress
- Assignee set to Josh Durgin
- Source changed from Development to Community (dev)
#15 Updated by Ian Colle over 10 years ago
- Priority changed from High to Normal
Josh - Is this just a case where the documentation needs to be updated?
#16 Updated by Sage Weil about 10 years ago
- Status changed from In Progress to Resolved