Project

General

Profile

Actions

Bug #12253

closed

Sometimes mds dump_ops_in_flight will crash mds

Added by Zhi Zhang almost 9 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Sometimes there are no suitable ops in mds, then if we run dump_ops_in_flight, mds will crash due to assertion failure. See below core dumps.

#10 0x00000000008c755a in ceph::__ceph_assert_fail (assertion=0x53c46b0 "\260\306\356\004", file=<value optimized out>, line=77856768, 
    func=0xaf4ca0 "virtual void MDRequestImpl::_dump(utime_t, ceph::Formatter*) const") at common/assert.cc:77
        tss = <incomplete type>
        buf = "mds/Mutation.cc: In function 'virtual void MDRequestImpl::_dump(utime_t, ceph::Formatter*) const' thread 7f2692a40700 time 2015-07-02 18:35:49.820326\nmds/Mutation.cc: 352: FAILED assert(internal_op !="...
        bt = 0x4af16c0
        oss = <incomplete type>
#11 0x0000000000650d51 in MDRequestImpl::_dump (this=0x616cb00, now=<value optimized out>, f=0x4af2d80) at mds/Mutation.cc:352
        __PRETTY_FUNCTION__ = "virtual void MDRequestImpl::_dump(utime_t, ceph::Formatter*) const" 
#12 0x000000000081c21b in TrackedOp::dump (this=0x616cd58, now=..., f=0x4af2d80) at common/TrackedOp.cc:331
        name = <incomplete type>
#13 0x000000000081c63f in OpTracker::dump_ops_in_flight (this=0x4b00590, f=0x4af2d80) at common/TrackedOp.cc:109
        p = {cur = 0x616cd60}
---Type <return> to continue, or q <return> to quit---
        sdata = 0x4a502a0
        locker = {mutex = @0x4a502a0}
        i = <value optimized out>
        total_ops_in_flight = <value optimized out>
        now = {tv = {tv_sec = 1435833349, tv_nsec = 820179660}}
        __PRETTY_FUNCTION__ = "void OpTracker::dump_ops_in_flight(ceph::Formatter*)" 
#14 0x00000000005b87f3 in MDS::asok_command (this=0x4b00000, command="dump_ops_in_flight", cmdmap=std::map with 1 elements = {...}, format="json-pretty", ss=...) at mds/MDS.cc:245
        f = 0x4af2d80
        __func__ = "asok_command" 
        __PRETTY_FUNCTION__ = "bool MDS::asok_command(std::string, cmdmap_t&, std::string, std::ostream&)" 
#15 0x00000000005d3ea0 in MDSSocketHook::call (this=0x49f01b0, command="dump_ops_in_flight", cmdmap=std::map with 1 elements = {...}, format=<value optimized out>, out=...)
    at mds/MDS.cc:213
        ss = <incomplete type>
        r = <value optimized out>
#16 0x00000000008b5bfa in AdminSocket::do_accept (this=0x4a80000) at common/admin_socket.cc:362
        args = "" 
        success = <value optimized out>
        len = <value optimized out>
        ret = <value optimized out>
        cmdvec = std::vector of length 1, capacity 1 = {"{\"prefix\": \"dump_ops_in_flight\"}"}
        errss = <incomplete type>
        match = "dump_ops_in_flight" 
        format = "json-pretty" 
        p = {first = "dump_ops_in_flight", second = }
        out = {_buffers = empty std::list, _len = 0, _memcopy_count = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7f2692a3f490, ls = 0x7f2692a3f490, off = 0, p = 
    {_raw = , _off = 0, _len = 0}, p_off = 0}, static CLAIM_DEFAULT = 0, static CLAIM_ALLOW_NONSHAREABLE = 1}
        connection_fd = 21
        c = "dump_ops_in_flight" 
        cmdmap = std::map with 1 elements = {
          ["prefix"] = {which_ = 0, storage_ = {<boost::detail::aligned_storage::aligned_storage_imp<24ul, 8ul>> = {data_ = {
                  buf = "\250B\263\n\000\000\000\000\200\005\237\004\000\000\000\000\000\026\f\006\000\000\000", align_ = {<No data fields>}}}, static size = <optimized out>, 
              static alignment = <optimized out>}}
        }
        address = {sun_family = 1, 
          sun_path = "\000\000\000\000\000\000\003", '\000' <repeats 15 times>"\260, \317q\226&\177\000\000\005", '\000' <repeats 23 times>, "0v\274\224&\177\000\000n\022\277\000\000\000\000\000\000\000\250\004\000\000\000\000\300\374\243\222&\177\000\000\220\244\250\004\000\000\000\000\000\000\244\004\000\000\000\000u6r\226&\177"}
        address_length = 2
        cmd = "{\"prefix\": \"dump_ops_in_flight\"}\000ons\"}\000\000\260\367\243\222&\177\000\000\022\000\000\000\000\000\000\000\070~\221\226&\177\000\000\064\000\206\356\000\000\000\000\320\367\243\222&\177\000\000\022\000\000\000\000\000\000\000\070~\221\226&\177\000\000\346\037\250\201\000\000\000\000\207\213q\226&\177\000\000\000\000\000\000\000\000\000\000\177\240\006\002\000\000\000\000&\000\000\000&\177\000\000\310\064J\226&\177\000\000\000\000\000\000\000\000\000\000P\371\243\222&\177\000\000\230\067J\226&\177\000\000\030xJ\226&\177\000\000\000\000\000\000\000\000\000\000`\326\222\226&\177\000\000`\326\222\226&\177", '\000' <repeats 18 times>, "`\326\222\226&\177\000\000`\326\222\226&\177", '\000' <repeats 18 times>, "`\326\222\226&\177\000\000\327\021K\226&\177"...
---Type <return> to continue, or q <return> to quit---
        pos = <value optimized out>
        rval = false
#17 0x00000000008b6e20 in AdminSocket::entry (this=0x4a80000) at common/admin_socket.cc:252
        fds = {{fd = 7, events = 129, revents = 1}, {fd = 5, events = 129, revents = 0}}
        ret = <value optimized out>
#18 0x00007f2695d157f1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#19 0x00007f2694ca3ccd in clone () from /lib64/libc.so.6
No symbol table info available.

It is not necessary to do such strict checking in admin socket dump command. It is better to return some hints rather than crashing mds.

Actions #1

Updated by Zhi Zhang almost 9 years ago

https://github.com/ceph/ceph/pull/5175

We've made some minor changes. Can someone help to take a look?

Thanks.

Actions #2

Updated by Zheng Yan almost 9 years ago

are you running multiple mds? could you check value of mdr->slave_to_mds

Actions #3

Updated by Zhi Zhang almost 9 years ago

Yes, we were running multiple mds when hitting this issue. Now we have switched to single mds.

Actions #4

Updated by Zhi Zhang almost 9 years ago

Pls check the mdr->slave_to_mds below:

(gdb) frame 11 
#11 0x0000000000650d51 in MDRequestImpl::_dump (this=0x616cb00, now=<value optimized out>, f=0x4af2d80) at mds/Mutation.cc:352
352          assert(internal_op != -1);
(gdb) p this->slave_to_mds
$2 = {<boost::totally_ordered1<mds_rank_t, boost::totally_ordered2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> > >> = {<boost::less_than_comparable1<mds_rank_t, boost::equality_comparable1<mds_rank_t, boost::totally_ordered2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> > > >> = {<boost::equality_comparable1<mds_rank_t, boost::totally_ordered2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> > >> = {<boost::totally_ordered2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> >> = {<boost::less_than_comparable2<mds_rank_t, int, boost::equality_comparable2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> > >> = {<boost::equality_comparable2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> >> = {<boost::detail::empty_base<mds_rank_t>> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, t = 2}
Actions #5

Updated by Zheng Yan almost 9 years ago

Please make the code check is_slave() first, then check if slave_request is NULL. If slave_request is NULL, just output nothing.

Actions #6

Updated by Zheng Yan almost 9 years ago

  • Status changed from New to Resolved
Actions #7

Updated by Greg Farnum almost 8 years ago

  • Component(FS) MDS added
Actions

Also available in: Atom PDF