Bug #9916
closedosd: crash in check_ops_in_flight
0%
Description
Assertion failure:
0> 2014-10-27 06:53:05.979871 7fd7b2156700 -1 common/buffer.cc: In function 'ceph::buffer::ptr::ptr(const ceph::buffer::ptr& , unsigned int, unsigned int)' thread 7fd7b2156700 time 2014-10-27 06:53:05.941971 common/buffer.cc: 574: FAILED assert(_raw) ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f) 1: /home/y/bin64/ceph-osd() [0xaca133] 2: (ceph::buffer::list::substr_of(ceph::buffer::list const&, unsigned int, unsigned int)+0xe4) [0xacbc84] 3: (ceph::buffer::list::write(int, int, std::ostream&) const+0x81) [0xacbe61] 4: (operator<<(std::ostream&, OSDOp const&)+0x1c8) [0x6cddc8] 5: (MOSDOp::print(std::ostream&) const+0x1df) [0x71e51f] 6: (OpTracker::check_ops_in_flight(std::vector<std::string, std::allocator<std::string> >&)+0x89a) [0x6ba68a] 7: (OSD::check_ops_in_flight()+0x4d) [0x60767d] 8: (OSD::tick()+0x146) [0x65f956] 9: (Context::complete(int)+0x9) [0x660d69] 10: (SafeTimer::timer_thread()+0x453) [0x9d1823] 11: (SafeTimerThread::entry()+0xd) [0x9d39dd] 12: /lib64/libpthread.so.0() [0x37aa0079d1] 13: (clone()+0x6d) [0x37a98e8b5d]
Core dump shows the follow stack (with more details):
ostream& operator<<(ostream& out, const OSDOp& op) -> case CEPH_OSD_OP_SRC_CMPXATTR: op.indata.write(0, op.op.xattr.name_len, out); -> buffer::list::write(int off, int len, std::ostream& out)
However, the op.indata's internal pointer for this OP is null so that it results in an assertion failure.
ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f) on RHEL6.5
Updated by Sage Weil over 9 years ago
- Status changed from New to Need More Info
how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops should have name_len bytes in the indata bufferlist with the attr name ...
Updated by Guang Yang over 9 years ago
The crash happened with radosgw as the client, so I guess it is formed by objecter - https://github.com/ceph/ceph/blob/master/src/osdc/Objecter.h#L136 ?
Updated by Guang Yang over 9 years ago
Hi Yehuda,
After taking a look at the rgw code, I failed to find which (http) request would need CEPH_OSD_OP_SRC_CMPXATTR op, maybe I missed something obvious?
Thanks,
Guang
Updated by Wenjun Huang over 9 years ago
Sage Weil wrote:
how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops should have name_len bytes in the indata bufferlist with the attr name ...
Our code base is ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
Update a new core dump stack of this bug:
#0 0x0000003bd200f5db in raise () from /lib64/libpthread.so.0 #1 0x00000000009af6c7 in reraise_fatal (signum=6) at global/signal_handler.cc:59 #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:105 #3 <signal handler called> #4 0x0000003bd1832925 in raise () from /lib64/libc.so.6 #5 0x0000003bd1834105 in abort () from /lib64/libc.so.6 #6 0x0000003bd3cbea5d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #7 0x0000003bd3cbcbe6 in ?? () from /usr/lib64/libstdc++.so.6 #8 0x0000003bd3cbcc13 in std::terminate() () from /usr/lib64/libstdc++.so.6 #9 0x0000003bd3cbcd0e in __cxa_throw () from /usr/lib64/libstdc++.so.6 #10 0x0000000000a36fc9 in ceph::__ceph_assert_fail (assertion=0x355a230 "\001", file=0x4054d520 "Pt`8", line=574, func=0xc54120 "ceph::buffer::ptr::ptr(const ceph::buffer::ptr&, unsigned int, unsigned int)") at common/assert.cc:77 #11 0x0000000000aca133 in ceph::buffer::ptr::ptr (this=<value optimized out>, p=<value optimized out>, o=<value optimized out>, l=<value optimized out>) at common/buffer.cc:574 #12 0x0000000000acbc84 in ceph::buffer::list::substr_of (this=0x7f11b43201d0, other=<value optimized out>, off=<value optimized out>, len=<value optimized out>) at common/buffer.cc:1357 #13 0x0000000000acbe61 in ceph::buffer::list::write (this=0xe8bc038, off=0, len=14, out=...) at common/buffer.cc:1441 #14 0x00000000006cddc8 in operator<< (out=..., op=...) at osd/osd_types.cc:4447 #15 0x000000000071e51f in operator<< <OSDOp> (this=0x4812afc0, out=...) at ./include/types.h:99 #16 MOSDOp::print (this=0x4812afc0, out=...) at ./messages/MOSDOp.h:365 #17 0x00000000006ba68a in operator<< (this=0x357cd20, warning_vector=std::vector of length 3, capacity 6 = {...}) at ./msg/Message.h:535 #18 OpTracker::check_ops_in_flight (this=0x357cd20, warning_vector=std::vector of length 3, capacity 6 = {...}) at common/TrackedOp.cc:175 #19 0x000000000060767d in OSD::check_ops_in_flight (this=0x357c000) at osd/OSD.cc:3324 #20 0x000000000065f956 in OSD::tick (this=0x357c000) at osd/OSD.cc:3316 #21 0x0000000000660d69 in Context::complete (this=0x418cf220, r=<value optimized out>) at ./include/Context.h:64 #22 0x00000000009d1823 in SafeTimer::timer_thread (this=0x357c070) at common/Timer.cc:105 #23 0x00000000009d39dd in SafeTimerThread::entry (this=<value optimized out>) at common/Timer.cc:38 #24 0x0000003bd20079d1 in start_thread () from /lib64/libpthread.so.0 #25 0x0000003bd18e8b5d in clone () from /lib64/libc.so.6
We can see that the bug sourced form OSD::tick(), not from the client. Diving into the core dump, I find some strange data transmission.
Upon frame 15:
template<class A> inline ostream& operator<<(ostream& out, const vector<A>& v) { out << "["; for (typename vector<A>::const_iterator p = v.begin(); p != v.end(); ++p) { if (p != v.begin()) out << ","; out << *p; } out << "]"; return out; }
I can see the v's length is 0, But it enters the loop. And then in frame 14, I can see the op.indata is:
(gdb) p op.indata $26 = {_buffers = std::list = {[0] = {_raw = 0x0, _off = 0, _len = 38}, [1] = {_raw = 0x0, _off = 0, _len = 3039}, [2] = {_raw = 0x0, _off = 0, _len = 77824}, [3] = {_raw = 0xffffffff, _off = 1768189551, _len = 27507}, [4] = { _raw = 0x0, _off = 0, _len = 264}, [5] = {_raw = 0x0, _off = 2732, _len = 133}, [6] = {_raw = 0x0, _off = 2886, _len = 25}, [7] = {_raw = 0x0, _off = 2924, _len = 33}, [8] = {_raw = 0x0, _off = 52, _len = 24}, [9] = {_raw = 0x0, _off = 2250, _len = 470}, [10] = {_raw = 0x0, _off = 2983, _len = 56}, [11] = {_raw = 0xb740d78ffffffff, _off = 1885433459, _len = 7628147}, [12] = {_raw = 0x0, _off = 0, _len = 31}, [13] = {_raw = 0xfb9f080, _off = 1768161375, _len = 27507}, [14] = {_raw = 0xffffffff, _off = 1886678633, _len = 29300}, [15] = {_raw = 0xffffffff, _off = 1885433459, _len = 7628147}, [16] = {_raw = 0x0, _off = 0, _len = 31}, [17] = {_raw = 0xffffffff, _off = 95, _len = 56}, [18] = {_raw = 0x0, _off = 0, _len = 264}, [19] = {_raw = 0x0, _off = 2983, _len = 56}, [20] = {_raw = 0x0, _off = 2924, _len = 33}, [21] = {_raw = 0x0, _off = 2886, _len = 25}, [22] = {_raw = 0x0, _off = 2732, _len = 133}, [23] = { _raw = 0x0, _off = 2250, _len = 470}, [24] = {_raw = 0x0, _off = 52, _len = 24}, [25] = {_raw = 0x0, _off = 0, _len = 2233}, [26] = {_raw = 0x0, _off = 76, _len = 77748}, [27] = {_raw = 0x0, _off = 52, _len = 24}, [28] = { _raw = 0x59e56a8, _off = 0, _len = 31}, [29] = {_raw = 0x629cf50, _off = 0, _len = 4096}, [30] = {_raw = 0xffffffff, _off = 1769235297, _len = 25974}, [31] = <error reading variable>
But in frame 13, in buffer::list, I can see:
(gdb) p *this $31 = {_buffers = empty std::list, _len = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7f11b43201d0, ls = 0x7f11b43201d0, off = 0, p = {_raw = , _off = 0, _len = 0}, p_off = 0}}
So in these two data transfer process, the data appeared unexpected change:
1. In include/types.h:
template<class A> inline ostream& operator<<(ostream& out, const vector<A>& v) { out << "["; for (typename vector<A>::const_iterator p = v.begin(); p != v.end(); ++p) { if (p != v.begin()) out << ","; out << *p; } out << "]"; return out; }
gdb print v is empty, but executes the loop.
2. In osd/osd_types.cc:
In some cases it executes:
op.indata.write(0, op.op.xattr.name_len, out);
And then in common/buffer.cc:
void buffer::list::write(int off, int len, std::ostream& out) const { list s; s.substr_of(*this, off, len); for (std::list<ptr>::const_iterator it = s._buffers.begin(); it != s._buffers.end(); ++it) if (it->length()) out.write(it->c_str(), it->length()); /*iterator p(this, off); while (len > 0 && !p.end()) { int l = p.left_in_this_buf(); if (l > len) l = len; out.write(p.c_str(), l); len -= l; }*/ }
invoke the buffer::list::write(...) method, but step into the method, the *this is not the same as op.indata.
I think there may be some miltithread problem in the code flow.
Updated by Wenjun Huang over 9 years ago
In file osd/OSD.cc:
OSD::_dispatch(Message *m) method:
default: { OpRequestRef op = op_tracker.create_request<OpRequest, Message*>(m); op->mark_event("waiting_for_osdmap"); // no map? starting up? if (!osdmap) { dout(7) << "no OSDMap, not booted" << dendl; waiting_for_osdmap.push_back(op); break; } // need OSDMap dispatch_op(op); }
The op will goes two branch, one to op_tracker for monitoring the op, the other goes to dispatch_op(op) to handle the op.
So, when the tracker print the op and invoke the MOSDOp::print() method.
In the printing flow, the ops may be changed by the dispatch_op flow, and cause the issue while traversing the ops.
I have the pull request [[https://github.com/ceph/ceph/pull/3117]]
Updated by Wenjun Huang over 9 years ago
Updated by Sage Weil over 9 years ago
just saw this on another cluster:
-22> 2014-12-14 10:08:38.610096 7f3bd0593700 1 -- 1.2.68.105:6836/13539 --> 1.2.68.102:6789/0 -- pg_stats(13 pgs tid 15715 v 0) v1 -- ?+0 0x2c8ca080 con 0x12653c80 -21> 2014-12-14 10:08:38.610289 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 <== osd.39 1.2.3.115:0/10730 24944 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.343103) v2 ==== 47+0+0 (3810113608 0 0) 0x1c938a80 con 0x142c56a0 -20> 2014-12-14 10:08:38.610330 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/10730 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.343103) v2 -- ?+0 0x2ec9a d80 con 0x142c56a0 -19> 2014-12-14 10:08:38.616211 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 <== osd.33 1.2.3.115:0/17791 25002 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.254878) v2 ==== 47+0+0 (2978887605 0 0) 0x1d59d240 con 0x142c4360 -18> 2014-12-14 10:08:38.616278 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/17791 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.254878) v2 -- ?+0 0x1c938 a80 con 0x142c4360 -17> 2014-12-14 10:08:38.616315 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 <== osd.46 1.2.3.115:0/9312 25134 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:37.974745) v2 ==== 4 7+0+0 (2704133161 0 0) 0x2a2ce900 con 0x14384a40 -16> 2014-12-14 10:08:38.616337 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/9312 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:37.974745) v2 -- ?+0 0x1d59d2 40 con 0x14384a40 -15> 2014-12-14 10:08:38.616370 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 <== osd.30 1.2.3.115:0/20341 24994 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:37.239113) v2 ==== 47+0+0 (2127826468 0 0) 0x291d8a80 con 0x14384780 -14> 2014-12-14 10:08:38.616385 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/20341 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:37.239113) v2 -- ?+0 0x2a2ce 900 con 0x14384780 -13> 2014-12-14 10:08:38.616418 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 <== osd.53 1.2.3.114:0/6878 25036 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.071152) v2 ==== 4 7+0+0 (1828542634 0 0) 0x1fcf0fc0 con 0x14315c20 -12> 2014-12-14 10:08:38.616432 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 --> 1.2.3.114:0/6878 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.071152) v2 -- ?+0 0x291d8a 80 con 0x14315c20 -11> 2014-12-14 10:08:38.617036 7f3bbfb18700 1 -- 1.2.3.105:6825/13539 <== osd.65 1.2.68.114:0/18358 25053 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.610883) v2 ==== 47+0+0 (969273065 0 0) 0x16bc6580 con 0x14315800 -10> 2014-12-14 10:08:38.623682 7f3bc0b1a700 1 -- 1.2.68.105:6837/13539 <== osd.31 1.2.3.115:0/14423 25056 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.126331) v2 ==== 47+0+0 (2730504651 0 0) 0x1713ec80 con 0x14315960 -9> 2014-12-14 10:08:38.623743 7f3bbfb18700 1 -- 1.2.3.105:6825/13539 --> 1.2.68.114:0/18358 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.610883) v2 -- ?+0 0x21996 e40 con 0x14315800 -8> 2014-12-14 10:08:38.623678 7f3bc7be7700 1 -- 1.2.3.105:6824/13539 --> osd.59 1.2.3.114:6844/23266 -- osd_sub_op_reply(client.8564947.0:709059 3.2d1 d35552d1/rbd_data.204 9bc14917142.00000000000002f8/head//3 [] ondisk, result = 0) v2 -- ?+0 0x2ed92580 -7> 2014-12-14 10:08:38.646460 7f3bb8309700 1 -- 1.2.3.105:6824/13539 --> 1.2.3.115:6810/7283 -- osd_sub_op(unknown.0.0:0 3.4dd 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] sn apc=0=[]) v10 -- ?+2067 0x19d55800 con 0x12691b80 -6> 2014-12-14 10:08:38.696854 7f3bbfb18700 1 -- 1.2.3.105:6825/13539 <== osd.43 1.2.3.115:0/16262 25058 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.696662) v2 ==== 47+0+0 (778527481 0 0) 0x23a8c0c0 con 0x14339a20 -5> 2014-12-14 10:08:38.757048 7f3b737db700 2 -- 1.2.68.105:6836/13539 >> 1.2.68.71:0/2021866 pipe(0x15e74b00 sd=81 :6836 s=2 pgs=206482 cs=1 l=1 c=0x7f80420).reader couldn't read tag, (0) Success -4> 2014-12-14 10:08:38.757158 7f3b737db700 2 -- 1.2.68.105:6836/13539 >> 1.2.68.71:0/2021866 pipe(0x15e74b00 sd=81 :6836 s=2 pgs=206482 cs=1 l=1 c=0x7f80420).fault (0) Succes s -3> 2014-12-14 10:08:38.757382 7f3bc331f700 1 osd.4 73183 ms_handle_reset con 0x7f80420 session 0x1afcb560 -2> 2014-12-14 10:08:38.761998 7f3b737db700 1 -- 1.2.68.105:6836/13539 >> :/0 pipe(0x1a8a4d80 sd=81 :6836 s=0 pgs=0 cs=0 l=0 c=0x912e9e0).accept sd=81 1.2.68.71:49159/0 -1> 2014-12-14 10:08:38.889330 7f3bc331f700 1 -- 1.2.68.105:6836/13539 <== mon.1 1.2.68.102:6789/0 18082 ==== pg_stats_ack(13 pgs tid 15715) v1 ==== 381+0+0 (904121805 0 0) 0x 2d74c7c0 con 0x12653c80 0> 2014-12-14 10:08:40.221316 7f3bd0593700 -1 *** Caught signal (Segmentation fault) ** in thread 7f3bd0593700 ceph version 0.80.5-22-g17dc605 (17dc605483640ce8c23b5ae71cb984264cb296bf) 1: /usr/bin/ceph-osd() [0x98eaf6] 2: (()+0xf130) [0x7f3bd55af130] 3: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x72) [0x7f3bd4b26482] 4: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)+0x5c) [0x7f3bd4b26afc] 5: (OpTracker::check_ops_in_flight(std::vector<std::string, std::allocator<std::string> >&)+0xe2c) [0x6a9cfc] 6: (OSD::check_ops_in_flight()+0x51) [0x5f5541] 7: (OSD::tick()+0x15f) [0x64a0af] 8: (Context::complete(int)+0x9) [0x652599] 9: (SafeTimer::timer_thread()+0x104) [0xa631f4] 10: (SafeTimerThread::entry()+0xd) [0xa6428d] 11: (()+0x7df3) [0x7f3bd55a7df3]
Updated by Sage Weil over 9 years ago
- Subject changed from OSD assertion failure when printing CEPH_OSD_OP_SRC_CMPXATTR op to osd: crash in check_ops_in_flight
- Status changed from Need More Info to 12
Updated by Dong Lei over 9 years ago
I think I find the problem.
Firstly, some clarification about the strange data change Wenjun Huang found.
1. vector size 0 but still enter the loop
gdb doesnot work well with std::vector, we need "p v._M_impl._M_start" to see the vector
2. "this" pointer changes
we mess frame 13 with frame 12, this pointer did not change.
Secondly, we broke at printing CEPH_OSD_OP_CMPXATTR not at printing CEPH_OSD_OP_SRC_CMPXATTR. Sorry for our careless.
Here comes my analysis:
1. from the core dump, I found the data of this op is already corrupted just after entering check_ops_in_flight
2. from the log, I see two lines about the op, one line says this op is applied:
119> 2014-11-06 05:42:00.738922 7f11adb14700 5 - op tracker -- , seq: 38064197, time: 2014-11-06 05:42:00.738921, event: sub_op_applied, request: osd_op(client.128435.0:95025845 default.12488.236_2465159741_f33b486caf_o.jpg [cmpxattr user.rgw.idtag (24) op 1 mode 1,create 0~0,delete,setxattr user.rgw.idtag (24),writefull 0~79981,setxattr user.rgw.manifest (470),setxattr user.rgw.acl (133),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-meta-origin (56)] 3.cc09cb01 ondisk+write e24684) v4
another line says it's done:
54> 2014-11-06 05:42:00.761210 7f11a06e0700 5 - op tracker -- , seq: 38064197, time: 2014-11-06 05:42:00.761209, event: done, request: osd_op(client.128435.0:95025845 default.12488.236_2465159741_f33b486caf_o.jpg [cmpxattr user.rgw.idtag (24) op 1 mode 1,create 0~0,delete,setxattr user.rgw.idtag (24),writefull 0~79981,setxattr user.rgw.manifest (470),setxattr user.rgw.acl (133),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-meta-origin (56)] 3.cc09cb01 ondisk+write e24684) v4
so this op is marked done but still get checked in check_ops_in_flight.
BTW: I'm sure this op is the one trying to print(by comparing to the core dump and comparing the time).
3. I check the logic of marking it "done". the flow is
mark it done
clear_data
clear_payload // the problem is here
// start to unregister the op
get check_ops_in_flight lock
remove from list
return lock
so, you clear the data, before unregistering it and do not protect the clear under lock.
4. I confirmed that the thread unregister it and the thread do check_ops_in_flight is different.
5. I confirmed that the clear_data() function will deconstruct the field vector<OSDOp> ops and this will cause null pointer. Accessing std::vector in multi threads will cause undefined behavior. When printing a null pointer, it crashed.
Updated by Dong Lei over 9 years ago
I think this has been fixed by in master:
https://github.com/ceph/ceph/commit/11082f7aa02f3f41c7c74c5930b67216dba88ea1
I'm using firely, the code is:
void OpTracker::unregister_inflight_op(TrackedOp *i)
{
// caller checks;
assert(tracking_enabled);
// move the following two lines in the bottom the function will fix it
i->request->clear_data();
i->request->clear_payload();
//
Mutex::Locker locker(ops_in_flight_lock);
assert(i->xitem.get_list() == &ops_in_flight);
utime_t now = ceph_clock_now(cct);
i->xitem.remove_myself();
history.insert(now, TrackedOpRef(i));
}
Updated by Dong Lei over 9 years ago
PR for firely: