Actions
Bug #12253
closedSometimes mds dump_ops_in_flight will crash mds
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Sometimes there are no suitable ops in mds, then if we run dump_ops_in_flight, mds will crash due to assertion failure. See below core dumps.
#10 0x00000000008c755a in ceph::__ceph_assert_fail (assertion=0x53c46b0 "\260\306\356\004", file=<value optimized out>, line=77856768, func=0xaf4ca0 "virtual void MDRequestImpl::_dump(utime_t, ceph::Formatter*) const") at common/assert.cc:77 tss = <incomplete type> buf = "mds/Mutation.cc: In function 'virtual void MDRequestImpl::_dump(utime_t, ceph::Formatter*) const' thread 7f2692a40700 time 2015-07-02 18:35:49.820326\nmds/Mutation.cc: 352: FAILED assert(internal_op !="... bt = 0x4af16c0 oss = <incomplete type> #11 0x0000000000650d51 in MDRequestImpl::_dump (this=0x616cb00, now=<value optimized out>, f=0x4af2d80) at mds/Mutation.cc:352 __PRETTY_FUNCTION__ = "virtual void MDRequestImpl::_dump(utime_t, ceph::Formatter*) const" #12 0x000000000081c21b in TrackedOp::dump (this=0x616cd58, now=..., f=0x4af2d80) at common/TrackedOp.cc:331 name = <incomplete type> #13 0x000000000081c63f in OpTracker::dump_ops_in_flight (this=0x4b00590, f=0x4af2d80) at common/TrackedOp.cc:109 p = {cur = 0x616cd60} ---Type <return> to continue, or q <return> to quit--- sdata = 0x4a502a0 locker = {mutex = @0x4a502a0} i = <value optimized out> total_ops_in_flight = <value optimized out> now = {tv = {tv_sec = 1435833349, tv_nsec = 820179660}} __PRETTY_FUNCTION__ = "void OpTracker::dump_ops_in_flight(ceph::Formatter*)" #14 0x00000000005b87f3 in MDS::asok_command (this=0x4b00000, command="dump_ops_in_flight", cmdmap=std::map with 1 elements = {...}, format="json-pretty", ss=...) at mds/MDS.cc:245 f = 0x4af2d80 __func__ = "asok_command" __PRETTY_FUNCTION__ = "bool MDS::asok_command(std::string, cmdmap_t&, std::string, std::ostream&)" #15 0x00000000005d3ea0 in MDSSocketHook::call (this=0x49f01b0, command="dump_ops_in_flight", cmdmap=std::map with 1 elements = {...}, format=<value optimized out>, out=...) at mds/MDS.cc:213 ss = <incomplete type> r = <value optimized out> #16 0x00000000008b5bfa in AdminSocket::do_accept (this=0x4a80000) at common/admin_socket.cc:362 args = "" success = <value optimized out> len = <value optimized out> ret = <value optimized out> cmdvec = std::vector of length 1, capacity 1 = {"{\"prefix\": \"dump_ops_in_flight\"}"} errss = <incomplete type> match = "dump_ops_in_flight" format = "json-pretty" p = {first = "dump_ops_in_flight", second = } out = {_buffers = empty std::list, _len = 0, _memcopy_count = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7f2692a3f490, ls = 0x7f2692a3f490, off = 0, p = {_raw = , _off = 0, _len = 0}, p_off = 0}, static CLAIM_DEFAULT = 0, static CLAIM_ALLOW_NONSHAREABLE = 1} connection_fd = 21 c = "dump_ops_in_flight" cmdmap = std::map with 1 elements = { ["prefix"] = {which_ = 0, storage_ = {<boost::detail::aligned_storage::aligned_storage_imp<24ul, 8ul>> = {data_ = { buf = "\250B\263\n\000\000\000\000\200\005\237\004\000\000\000\000\000\026\f\006\000\000\000", align_ = {<No data fields>}}}, static size = <optimized out>, static alignment = <optimized out>}} } address = {sun_family = 1, sun_path = "\000\000\000\000\000\000\003", '\000' <repeats 15 times>"\260, \317q\226&\177\000\000\005", '\000' <repeats 23 times>, "0v\274\224&\177\000\000n\022\277\000\000\000\000\000\000\000\250\004\000\000\000\000\300\374\243\222&\177\000\000\220\244\250\004\000\000\000\000\000\000\244\004\000\000\000\000u6r\226&\177"} address_length = 2 cmd = "{\"prefix\": \"dump_ops_in_flight\"}\000ons\"}\000\000\260\367\243\222&\177\000\000\022\000\000\000\000\000\000\000\070~\221\226&\177\000\000\064\000\206\356\000\000\000\000\320\367\243\222&\177\000\000\022\000\000\000\000\000\000\000\070~\221\226&\177\000\000\346\037\250\201\000\000\000\000\207\213q\226&\177\000\000\000\000\000\000\000\000\000\000\177\240\006\002\000\000\000\000&\000\000\000&\177\000\000\310\064J\226&\177\000\000\000\000\000\000\000\000\000\000P\371\243\222&\177\000\000\230\067J\226&\177\000\000\030xJ\226&\177\000\000\000\000\000\000\000\000\000\000`\326\222\226&\177\000\000`\326\222\226&\177", '\000' <repeats 18 times>, "`\326\222\226&\177\000\000`\326\222\226&\177", '\000' <repeats 18 times>, "`\326\222\226&\177\000\000\327\021K\226&\177"... ---Type <return> to continue, or q <return> to quit--- pos = <value optimized out> rval = false #17 0x00000000008b6e20 in AdminSocket::entry (this=0x4a80000) at common/admin_socket.cc:252 fds = {{fd = 7, events = 129, revents = 1}, {fd = 5, events = 129, revents = 0}} ret = <value optimized out> #18 0x00007f2695d157f1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #19 0x00007f2694ca3ccd in clone () from /lib64/libc.so.6 No symbol table info available.
It is not necessary to do such strict checking in admin socket dump command. It is better to return some hints rather than crashing mds.
Updated by Zhi Zhang almost 9 years ago
https://github.com/ceph/ceph/pull/5175
We've made some minor changes. Can someone help to take a look?
Thanks.
Updated by Zheng Yan almost 9 years ago
are you running multiple mds? could you check value of mdr->slave_to_mds
Updated by Zhi Zhang almost 9 years ago
Yes, we were running multiple mds when hitting this issue. Now we have switched to single mds.
Updated by Zhi Zhang almost 9 years ago
Pls check the mdr->slave_to_mds below:
(gdb) frame 11 #11 0x0000000000650d51 in MDRequestImpl::_dump (this=0x616cb00, now=<value optimized out>, f=0x4af2d80) at mds/Mutation.cc:352 352 assert(internal_op != -1); (gdb) p this->slave_to_mds $2 = {<boost::totally_ordered1<mds_rank_t, boost::totally_ordered2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> > >> = {<boost::less_than_comparable1<mds_rank_t, boost::equality_comparable1<mds_rank_t, boost::totally_ordered2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> > > >> = {<boost::equality_comparable1<mds_rank_t, boost::totally_ordered2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> > >> = {<boost::totally_ordered2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> >> = {<boost::less_than_comparable2<mds_rank_t, int, boost::equality_comparable2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> > >> = {<boost::equality_comparable2<mds_rank_t, int, boost::detail::empty_base<mds_rank_t> >> = {<boost::detail::empty_base<mds_rank_t>> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, t = 2}
Updated by Zheng Yan almost 9 years ago
Please make the code check is_slave() first, then check if slave_request is NULL. If slave_request is NULL, just output nothing.
Actions