Project

General

Profile

Actions

Bug #3300

closed

ceph::buffer::end_of_buffer isn't caught

Added by Artem Grinblat over 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
librados
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A program using librados terminates with the following uncaught exception:

terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer

**24564** custom_sigaction, signal 6   at 0x516E87C: VALGRIND_PRINTF_BACKTRACE(char const*, ...) (valgrind.h:3771)
==24564==    by 0x516EE64: img2::custom_sigaction(int) (conf.cpp:77)
==24564==    by 0x8B004EF: ??? (in /lib/x86_64-linux-gnu/libc-2.13.so)
==24564==    by 0x8B00474: raise (raise.c:64)
==24564==    by 0x8B036EF: abort (abort.c:92)
==24564==    by 0x83946DC: __gnu_cxx::__verbose_terminate_handler() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)
==24564==    by 0x83927E5: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)
==24564==    by 0x8392812: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)
==24564==    by 0x8392A3D: __cxa_throw (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)
==24564==    by 0xD3B2762: ceph::buffer::list::iterator::get_current_ptr() (buffer.cc:509)
==24564==    by 0xD356DBE: Pipe::read_message(Message**) (Pipe.cc:1492)
==24564==    by 0xD364CFB: Pipe::reader() (Pipe.cc:1138)

Operating system is Debian Wheezy.

 dpkg -l | grep ceph
ii  ceph                                           0.52-1~bpo70+1               amd64        distributed storage and file system
ii  ceph-common                                    0.52-1~bpo70+1               amd64        common utilities to mount and interact with a ceph storage cluster
ii  ceph-fs-common                                 0.52-1~bpo70+1               amd64        common utilities to mount and interact with a ceph file system
ii  ceph-fuse                                      0.52-1~bpo70+1               amd64        FUSE-based client for the Ceph distributed file system
ii  ceph-mds                                       0.52-1~bpo70+1               amd64        metadata server for the ceph distributed file system
ii  libcephfs1                                     0.52-1~bpo70+1               amd64        Ceph distributed file system client library

Files

librados.log (27.3 KB) librados.log Artem Grinblat, 10/19/2012 09:33 AM
Actions #1

Updated by Artem Grinblat over 11 years ago

Same problem with librados-dev 0.53-1~bpo70+1.

Actions #2

Updated by Artem Grinblat over 11 years ago

GDB stack trace:

(gdb) bt full
#0  *__GI_abort () at abort.c:59
        act = {__sigaction_handler = {sa_handler = 0x100000000, sa_sigaction = 0x100000000}, sa_mask = {__val = {0, 0, 140363076780821, 0, 0, 140363068152736, 140363148288000, 0,
              4294967295, 0, 5, 3080464, 0, 28611552, 4790, 140363076640768}}, sa_flags = -679974414, sa_restorer = 0x5}
        sigs = {__val = {10, 140363068152736, 1, 140362987698374, 140363071802776, 29882848, 140362489698416, 140363146250837, 140362489702080, 140363068962717, 1, 0, 31033520,
            10, 140362489702144, 140363148292096}}
#1  0x00007fa8d358f6dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#2  0x00007fa8d358d7e6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#3  0x00007fa8d358d813 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#4  0x00007fa8d358da3e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#5  0x00007fa8ce004bd3 in ceph::buffer::list::iterator::get_current_ptr (this=<optimized out>) at common/buffer.cc:509
No locals.
#6  0x00007fa8cdfa82fd in Pipe::read_message (this=this@entry=0x1b493e0, pm=pm@entry=0x7fa8b0569c38) at msg/Pipe.cc:1482
        p = <optimized out>
        bp = {_raw = 0x0, _off = 0, _len = 82}
        read = <optimized out>
        got = <optimized out>
        rxbuf = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = {
              _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x1d98840, _M_prev = 0x1c84e70}}}, <No data fields>}, _len = 4872, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7fa8b0569a20,
            ls = 0x7fa8b0569a20, off = 0, p = {_M_node = 0x7fa8b0569a20}, p_off = 0}}
        offset = <optimized out>
        left = 4790
        newbuf = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = {
              _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x7fa8b05699d0, _M_prev = 0x7fa8b05699d0}}}, <No data fields>}, _len = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {
            bl = 0x7fa8b05699d0, ls = 0x7fa8b05699d0, off = 0, p = {_M_node = 0x7fa8b05699d0}, p_off = 0}}
        blp = {bl = 0x7fa8c0100e38, ls = 0x7fa8c0100e38, off = 82, p = {_M_node = 0x7fa8c0100e38}, p_off = 0}
        rxbuf_version = 1
        ret = -1
        front = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = {
              _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x1d99060, _M_prev = 0x1d99060}}}, <No data fields>}, _len = 128, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7fa8b05698e0,
            ls = 0x7fa8b05698e0, off = 0, p = {_M_node = 0x7fa8b05698e0}, p_off = 0}}
        data_len = 4872
        header = {seq = {v = 59}, tid = {v = 121}, type = {v = 43}, priority = {v = 127}, version = {v = 4}, front_len = {v = 128}, middle_len = {v = 0}, data_len = {v = 4872},
          data_off = {v = 0}, src = {type = 4 '\004', num = {v = 1}}, compat_version = {v = 2}, reserved = {v = 0}, crc = {v = 3483066176}}
        middle = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = {
              _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x7fa8b0569930, _M_prev = 0x7fa8b0569930}}}, <No data fields>}, _len = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {
            bl = 0x7fa8b0569930, ls = 0x7fa8b0569930, off = 0, p = {_M_node = 0x7fa8b0569930}, p_off = 0}}
        message_size = 5000
        header_crc = <optimized out>
        data = {_buffers = {<std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >> = {
              _M_impl = {<std::allocator<std::_List_node<ceph::buffer::ptr> >> = {<__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >> = {<No data fields>}, <No data fields>}, _M_node = {_M_next = 0x1c7f9b0, _M_prev = 0x1c7f9b0}}}, <No data fields>}, _len = 82, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7fa8b0569980,
            ls = 0x7fa8b0569980, off = 0, p = {_M_node = 0x7fa8b0569980}, p_off = 0}}
        front_len = <optimized out>
        data_off = 0
        aborted = <optimized out>
        message = <optimized out>
        recv_stamp = {tv = {tv_sec = 1350648484, tv_nsec = 631445000}}
        throttle_stamp = {tv = {tv_sec = 1350648484, tv_nsec = 631467000}}
        footer = {front_crc = {v = 31033408}, middle_crc = {v = 0}, data_crc = {v = 29904496}, flags = 0 '\000'}
        middle_len = <optimized out>
        waited_on_throttle = <optimized out>
#7  0x00007fa8cdfb60a4 in Pipe::reader (this=0x1b493e0) at msg/Pipe.cc:1128
        m = 0x0
        r = <optimized out>
        buf = '\000' <repeats 79 times>
        tag = 7 '\a'
        __PRETTY_FUNCTION__ = "void Pipe::reader()" 
#8  0x00007fa8cdfb87bd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:45
No locals.
#9  0x00007fa8d2af5b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
        __res = <optimized out>
        pd = 0x7fa8b056a700
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140362489702144, -6857338112705148839, 28270096, 140362489702848, 140363148398656, 7, 6882828424773519449,
                6882751542522955865}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread" 
#10 0x00007fa8d2de370d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Actions #3

Updated by Josh Durgin over 11 years ago

What are you doing with librados that's causing this? A log with 'debug ms = 1' and 'debug rados = 20' would help, as well as a description of any objects that aren't created as part of the log.

Actions #4

Updated by Artem Grinblat over 11 years ago

Here's the code that's causing this:

// iterating:
    for (auto it = radosCtx->objects_begin(), end = radosCtx->objects_end(); !EXIT && it != end; ++it) {
      const string& id = it->first;

// extracting:
          librados::bufferlist bl;
          int rc = radosCtx->getxattr (id, "img2meta", bl);
          if (rc == 0 && bl.length()) {
            bl.copy (0, bl.length(), meta); // Have meta in xattr.
          } else {
            uint64_t size; time_t time; rc = radosCtx->stat (id, &size, &time);
            if (rc == -ENOENT) continue; // File was removed in the process, go on.
            if (rc) {log_warn ("ceph stat (" << id << "): " << strerror (-rc)); continue;}
            if (size == 0) {log_warn ("ceph - empty file? " << id); continue;}
            rc = radosCtx->read (id, bl, size, 0);
            if (rc < 0) {log_warn ("ceph read (" << id << "): " << strerror (-rc)); continue;}
            if (rc != (int) size) {log_warn ("ceph read: expected " << size << " but got " << rc); continue;}
            string value; value.reserve (bl.length()); bl.copy (0, bl.length(), value);
            Img2Value iv; iv.ParseFromString (value); ptree mt; infoToPtree (iv, mt);
            std::ostringstream buf; write_json (buf, mt, false); meta = buf.str();
            bl.clear(); bl.append (meta); radosCtx->setxattr (id, "img2meta", bl); // Cache meta in Ceph's xattr.
          }

If I leave only the "iterating" part it works okay, but with the "extracting" part program dies with that exception.

I'll try to produce the log.
Should it be from a single osd or from all of them?

Actions #5

Updated by Sage Weil over 11 years ago

Also curious what version of librados is running vs what version of ceph-osd it is talking to.

Actions #6

Updated by Artem Grinblat over 11 years ago

Version is the already mentioned "librados-dev 0.53-1~bpo70+1".
There are four OSD servers, all track the same Debian apt repository.

Actions #7

Updated by Artem Grinblat over 11 years ago

E.g. both ceph and librados are from:

deb http://ceph.com/debian-testing/ wheezy main

Actions #8

Updated by Josh Durgin over 11 years ago

I was thinking a log of the client side. I'm not sure if the log from an osd will be necessary. You can set the debugging and e.g. 'log file = /tmp/librados.log' via librados' conf_set() or ceph.conf if your program is using conf_read_file().

Actions #9

Updated by Artem Grinblat over 11 years ago

Attached is the log (with "log file = /tmp/librados.log\n debug ms = 1\n debug rados = 20" in ceph.conf).

Actions #10

Updated by Artem Grinblat over 11 years ago

Modifying the "extracting" part not to reuse the bufferlist seems to workaround the problem:

          librados::bufferlist bl1;
          int rc = radosCtx->getxattr (id, "img2meta", bl1);
          if (rc == 0 && bl1.length()) {
            bl1.copy (0, bl1.length(), meta); // Have meta in xattr.
          } else {
            uint64_t size; time_t time; rc = radosCtx->stat (id, &size, &time);
            if (rc == -ENOENT) continue; // File was removed in the process, go on.
            if (rc) {log_warn ("ceph stat (" << id << "): " << strerror (-rc)); continue;}
            if (size == 0) {log_warn ("ceph - empty file? " << id); continue;}
            glim::NsecTimer readTimer;
            librados::bufferlist bl2;
            rc = radosCtx->read (id, bl2, size, 0);
            if (rc < 0) {log_warn ("ceph read (" << id << "): " << strerror (-rc)); continue;}
            if (rc != (int) size) {log_warn ("ceph read: expected " << size << " but got " << rc); continue;}
            string value; value.reserve (bl2.length()); bl2.copy (0, bl2.length(), value);
            log_info ("ceph: read " << id << " in " << readTimer.seconds (4) << " seconds (" << value.size() << " bytes)");
            Img2Value iv; iv.ParseFromString (value); ptree mt; infoToPtree (iv, mt);
            std::ostringstream buf; write_json (buf, mt, false); meta = buf.str();
            librados::bufferlist bl3;
            bl3.clear(); bl3.append (meta); radosCtx->setxattr (id, "img2meta", bl3); // Cache meta in Ceph's xattr.
          }

Actions #11

Updated by Josh Durgin over 11 years ago

So I think there are two things here:

1) your return code check for getxattr will never pass - getxattr() returns the length read of the attribute read, so it'll never be 0 if bl.length() is non-zero. This is documented in the C api, but not the C++ one. That should be documented.

2) the bufferlist interface is confusing and easy to use in a way that won't do what you expect. If you used a single buffer list, but called bl.clear() before doing the read, it should work, just like using separate bufferlists worked. This requirement to use an empty bufferlist should be fixed or documented, and should not cause a crash.

Actions #12

Updated by Artem Grinblat over 11 years ago

1) yes, some documentation in librados.hpp would be nice
2) that `ctx->read (id, bl, size, 0)` might append into `bl` is intuitive enough, i don't see an interface problem here! in my case it would've been caught by protobuf's ParseFromString. the real problem is that it shouldn't crash.

Actions #13

Updated by Artem Grinblat over 11 years ago

2) e.g. if `ctx->read` needs an empty list, it should throw from `ctx->read`, where the application can catch exceptions and make sense of things, not from the internal Ceph thread where it terminates the application and looks like some internal bug.

Actions #14

Updated by Josh Durgin over 11 years ago

  • Status changed from New to In Progress
  • Assignee set to Josh Durgin
  • Source changed from Development to Community (dev)
Actions #15

Updated by Ian Colle over 11 years ago

  • Priority changed from High to Normal

Josh - Is this just a case where the documentation needs to be updated?

Actions #16

Updated by Sage Weil over 10 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF