Bug #9916: osd: crash in check_ops_in_flight - Ceph - Ceph

Actions

Copy link

Bug #9916

closed

osd: crash in check_ops_in_flight

Added by Guang Yang over 9 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

firefly

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Assertion failure:

 0> 2014-10-27 06:53:05.979871 7fd7b2156700 -1 common/buffer.cc: In function 'ceph::buffer::ptr::ptr(const ceph::buffer::ptr&
, unsigned int, unsigned int)' thread 7fd7b2156700 time 2014-10-27 06:53:05.941971
common/buffer.cc: 574: FAILED assert(_raw)

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: /home/y/bin64/ceph-osd() [0xaca133]
 2: (ceph::buffer::list::substr_of(ceph::buffer::list const&, unsigned int, unsigned int)+0xe4) [0xacbc84]
 3: (ceph::buffer::list::write(int, int, std::ostream&) const+0x81) [0xacbe61]
 4: (operator<<(std::ostream&, OSDOp const&)+0x1c8) [0x6cddc8]
 5: (MOSDOp::print(std::ostream&) const+0x1df) [0x71e51f]
 6: (OpTracker::check_ops_in_flight(std::vector<std::string, std::allocator<std::string> >&)+0x89a) [0x6ba68a]
 7: (OSD::check_ops_in_flight()+0x4d) [0x60767d]
 8: (OSD::tick()+0x146) [0x65f956]
 9: (Context::complete(int)+0x9) [0x660d69]
 10: (SafeTimer::timer_thread()+0x453) [0x9d1823]
 11: (SafeTimerThread::entry()+0xd) [0x9d39dd]
 12: /lib64/libpthread.so.0() [0x37aa0079d1]
 13: (clone()+0x6d) [0x37a98e8b5d]

Core dump shows the follow stack (with more details):

   ostream& operator<<(ostream& out, const OSDOp& op)
    -> case CEPH_OSD_OP_SRC_CMPXATTR: op.indata.write(0, op.op.xattr.name_len, out);
      -> buffer::list::write(int off, int len, std::ostream& out)

However, the op.indata's internal pointer for this OP is null so that it results in an assertion failure.

ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f) on RHEL6.5

Actions

Copy link

Updated by Sage Weil over 9 years ago

Status changed from New to Need More Info

how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops should have name_len bytes in the indata bufferlist with the attr name ...

Actions

Copy link

Updated by Guang Yang over 9 years ago

The crash happened with radosgw as the client, so I guess it is formed by objecter - https://github.com/ceph/ceph/blob/master/src/osdc/Objecter.h#L136 ?

Actions

Copy link

Updated by Guang Yang over 9 years ago

Hi Yehuda,
After taking a look at the rgw code, I failed to find which (http) request would need CEPH_OSD_OP_SRC_CMPXATTR op, maybe I missed something obvious?

Thanks,
Guang

Actions

Copy link

Updated by Wenjun Huang over 9 years ago

Sage Weil wrote:

how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops should have name_len bytes in the indata bufferlist with the attr name ...

Our code base is ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
Update a new core dump stack of this bug:

#0  0x0000003bd200f5db in raise () from /lib64/libpthread.so.0
#1  0x00000000009af6c7 in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:105
#3  <signal handler called>
#4  0x0000003bd1832925 in raise () from /lib64/libc.so.6
#5  0x0000003bd1834105 in abort () from /lib64/libc.so.6
#6  0x0000003bd3cbea5d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#7  0x0000003bd3cbcbe6 in ?? () from /usr/lib64/libstdc++.so.6
#8  0x0000003bd3cbcc13 in std::terminate() () from /usr/lib64/libstdc++.so.6
#9  0x0000003bd3cbcd0e in __cxa_throw () from /usr/lib64/libstdc++.so.6
#10 0x0000000000a36fc9 in ceph::__ceph_assert_fail (assertion=0x355a230 "\001", file=0x4054d520 "Pt`8", line=574, func=0xc54120 "ceph::buffer::ptr::ptr(const ceph::buffer::ptr&, unsigned int, unsigned int)") at common/assert.cc:77
#11 0x0000000000aca133 in ceph::buffer::ptr::ptr (this=<value optimized out>, p=<value optimized out>, o=<value optimized out>, l=<value optimized out>) at common/buffer.cc:574
#12 0x0000000000acbc84 in ceph::buffer::list::substr_of (this=0x7f11b43201d0, other=<value optimized out>, off=<value optimized out>, len=<value optimized out>) at common/buffer.cc:1357
#13 0x0000000000acbe61 in ceph::buffer::list::write (this=0xe8bc038, off=0, len=14, out=...) at common/buffer.cc:1441
#14 0x00000000006cddc8 in operator<< (out=..., op=...) at osd/osd_types.cc:4447
#15 0x000000000071e51f in operator<< <OSDOp> (this=0x4812afc0, out=...) at ./include/types.h:99
#16 MOSDOp::print (this=0x4812afc0, out=...) at ./messages/MOSDOp.h:365
#17 0x00000000006ba68a in operator<< (this=0x357cd20, warning_vector=std::vector of length 3, capacity 6 = {...}) at ./msg/Message.h:535
#18 OpTracker::check_ops_in_flight (this=0x357cd20, warning_vector=std::vector of length 3, capacity 6 = {...}) at common/TrackedOp.cc:175
#19 0x000000000060767d in OSD::check_ops_in_flight (this=0x357c000) at osd/OSD.cc:3324
#20 0x000000000065f956 in OSD::tick (this=0x357c000) at osd/OSD.cc:3316
#21 0x0000000000660d69 in Context::complete (this=0x418cf220, r=<value optimized out>) at ./include/Context.h:64
#22 0x00000000009d1823 in SafeTimer::timer_thread (this=0x357c070) at common/Timer.cc:105
#23 0x00000000009d39dd in SafeTimerThread::entry (this=<value optimized out>) at common/Timer.cc:38
#24 0x0000003bd20079d1 in start_thread () from /lib64/libpthread.so.0
#25 0x0000003bd18e8b5d in clone () from /lib64/libc.so.6

We can see that the bug sourced form OSD::tick(), not from the client. Diving into the core dump, I find some strange data transmission.
Upon frame 15:

template<class A>
inline ostream& operator<<(ostream& out, const vector<A>& v) {
  out << "[";
  for (typename vector<A>::const_iterator p = v.begin(); p != v.end(); ++p) {
    if (p != v.begin()) out << ",";
    out << *p;
  }
  out << "]";
  return out;
}

I can see the v's length is 0, But it enters the loop. And then in frame 14, I can see the op.indata is:

(gdb) p op.indata
$26 = {_buffers = std::list = {[0] = {_raw = 0x0, _off = 0, _len = 38}, [1] = {_raw = 0x0, _off = 0, _len = 3039}, [2] = {_raw = 0x0, _off = 0, _len = 77824}, [3] = {_raw = 0xffffffff, _off = 1768189551, _len = 27507}, [4] = {
      _raw = 0x0, _off = 0, _len = 264}, [5] = {_raw = 0x0, _off = 2732, _len = 133}, [6] = {_raw = 0x0, _off = 2886, _len = 25}, [7] = {_raw = 0x0, _off = 2924, _len = 33}, [8] = {_raw = 0x0, _off = 52, _len = 24}, [9] = {_raw = 0x0,
      _off = 2250, _len = 470}, [10] = {_raw = 0x0, _off = 2983, _len = 56}, [11] = {_raw = 0xb740d78ffffffff, _off = 1885433459, _len = 7628147}, [12] = {_raw = 0x0, _off = 0, _len = 31}, [13] = {_raw = 0xfb9f080, _off = 1768161375,
      _len = 27507}, [14] = {_raw = 0xffffffff, _off = 1886678633, _len = 29300}, [15] = {_raw = 0xffffffff, _off = 1885433459, _len = 7628147}, [16] = {_raw = 0x0, _off = 0, _len = 31}, [17] = {_raw = 0xffffffff, _off = 95, _len = 56},
    [18] = {_raw = 0x0, _off = 0, _len = 264}, [19] = {_raw = 0x0, _off = 2983, _len = 56}, [20] = {_raw = 0x0, _off = 2924, _len = 33}, [21] = {_raw = 0x0, _off = 2886, _len = 25}, [22] = {_raw = 0x0, _off = 2732, _len = 133}, [23] = {
      _raw = 0x0, _off = 2250, _len = 470}, [24] = {_raw = 0x0, _off = 52, _len = 24}, [25] = {_raw = 0x0, _off = 0, _len = 2233}, [26] = {_raw = 0x0, _off = 76, _len = 77748}, [27] = {_raw = 0x0, _off = 52, _len = 24}, [28] = {
      _raw = 0x59e56a8, _off = 0, _len = 31}, [29] = {_raw = 0x629cf50, _off = 0, _len = 4096}, [30] = {_raw = 0xffffffff, _off = 1769235297, _len = 25974}, [31] = <error reading variable>

But in frame 13, in buffer::list, I can see:

(gdb) p *this
$31 = {_buffers = empty std::list, _len = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7f11b43201d0, ls = 0x7f11b43201d0, off = 0, p = {_raw = , _off = 0, _len = 0}, p_off = 0}}

So in these two data transfer process, the data appeared unexpected change:
1. In include/types.h:

template<class A>
inline ostream& operator<<(ostream& out, const vector<A>& v) {
  out << "[";
  for (typename vector<A>::const_iterator p = v.begin(); p != v.end(); ++p) {
    if (p != v.begin()) out << ",";
    out << *p;
  }
  out << "]";
  return out;
}

gdb print v is empty, but executes the loop.

2. In osd/osd_types.cc:
In some cases it executes:

op.indata.write(0, op.op.xattr.name_len, out);

And then in common/buffer.cc:

void buffer::list::write(int off, int len, std::ostream& out) const
{
  list s;
  s.substr_of(*this, off, len);
  for (std::list<ptr>::const_iterator it = s._buffers.begin(); 
     it != s._buffers.end(); 
     ++it)
    if (it->length())
    out.write(it->c_str(), it->length());
  /*iterator p(this, off);
    while (len > 0 && !p.end()) {
    int l = p.left_in_this_buf();
    if (l > len)
    l = len;
    out.write(p.c_str(), l);
    len -= l;
    }*/
}

invoke the buffer::list::write(...) method, but step into the method, the *this is not the same as op.indata.

I think there may be some miltithread problem in the code flow.

Actions

Copy link

Updated by Wenjun Huang over 9 years ago

In file osd/OSD.cc:
OSD::_dispatch(Message *m) method:

  default:
    {
      OpRequestRef op = op_tracker.create_request<OpRequest, Message*>(m);
      op->mark_event("waiting_for_osdmap");
      // no map?  starting up?
      if (!osdmap) {
        dout(7) << "no OSDMap, not booted" << dendl;
        waiting_for_osdmap.push_back(op);
        break;
      }

      // need OSDMap
      dispatch_op(op);
    }

The op will goes two branch, one to op_tracker for monitoring the op, the other goes to dispatch_op(op) to handle the op.
So, when the tracker print the op and invoke the MOSDOp::print() method.
In the printing flow, the ops may be changed by the dispatch_op flow, and cause the issue while traversing the ops.
I have the pull request [[https://github.com/ceph/ceph/pull/3117]]

Actions

Copy link

Updated by Wenjun Huang over 9 years ago

Sorry for the repetition of #5 and #6 for network problem.

Actions

Copy link

Updated by Sage Weil over 9 years ago

just saw this on another cluster:

   -22> 2014-12-14 10:08:38.610096 7f3bd0593700  1 -- 1.2.68.105:6836/13539 --> 1.2.68.102:6789/0 -- pg_stats(13 pgs tid 15715 v 0) v1 -- ?+0 0x2c8ca080 con 0x12653c80
   -21> 2014-12-14 10:08:38.610289 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.39 1.2.3.115:0/10730 24944 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.343103) v2 ==== 
47+0+0 (3810113608 0 0) 0x1c938a80 con 0x142c56a0
   -20> 2014-12-14 10:08:38.610330 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/10730 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.343103) v2 -- ?+0 0x2ec9a
d80 con 0x142c56a0
   -19> 2014-12-14 10:08:38.616211 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.33 1.2.3.115:0/17791 25002 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.254878) v2 ==== 
47+0+0 (2978887605 0 0) 0x1d59d240 con 0x142c4360
   -18> 2014-12-14 10:08:38.616278 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/17791 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.254878) v2 -- ?+0 0x1c938
a80 con 0x142c4360
   -17> 2014-12-14 10:08:38.616315 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.46 1.2.3.115:0/9312 25134 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:37.974745) v2 ==== 4
7+0+0 (2704133161 0 0) 0x2a2ce900 con 0x14384a40
   -16> 2014-12-14 10:08:38.616337 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/9312 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:37.974745) v2 -- ?+0 0x1d59d2
40 con 0x14384a40
   -15> 2014-12-14 10:08:38.616370 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.30 1.2.3.115:0/20341 24994 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:37.239113) v2 ==== 
47+0+0 (2127826468 0 0) 0x291d8a80 con 0x14384780
   -14> 2014-12-14 10:08:38.616385 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/20341 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:37.239113) v2 -- ?+0 0x2a2ce
900 con 0x14384780
   -13> 2014-12-14 10:08:38.616418 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.53 1.2.3.114:0/6878 25036 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.071152) v2 ==== 4
7+0+0 (1828542634 0 0) 0x1fcf0fc0 con 0x14315c20
   -12> 2014-12-14 10:08:38.616432 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.114:0/6878 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.071152) v2 -- ?+0 0x291d8a
80 con 0x14315c20
   -11> 2014-12-14 10:08:38.617036 7f3bbfb18700  1 -- 1.2.3.105:6825/13539 <== osd.65 1.2.68.114:0/18358 25053 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.610883) v2 ==== 
47+0+0 (969273065 0 0) 0x16bc6580 con 0x14315800
   -10> 2014-12-14 10:08:38.623682 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.31 1.2.3.115:0/14423 25056 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.126331) v2 ==== 
47+0+0 (2730504651 0 0) 0x1713ec80 con 0x14315960
    -9> 2014-12-14 10:08:38.623743 7f3bbfb18700  1 -- 1.2.3.105:6825/13539 --> 1.2.68.114:0/18358 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.610883) v2 -- ?+0 0x21996
e40 con 0x14315800
    -8> 2014-12-14 10:08:38.623678 7f3bc7be7700  1 -- 1.2.3.105:6824/13539 --> osd.59 1.2.3.114:6844/23266 -- osd_sub_op_reply(client.8564947.0:709059 3.2d1 d35552d1/rbd_data.204
9bc14917142.00000000000002f8/head//3 [] ondisk, result = 0) v2 -- ?+0 0x2ed92580
    -7> 2014-12-14 10:08:38.646460 7f3bb8309700  1 -- 1.2.3.105:6824/13539 --> 1.2.3.115:6810/7283 -- osd_sub_op(unknown.0.0:0 3.4dd 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] sn
apc=0=[]) v10 -- ?+2067 0x19d55800 con 0x12691b80
    -6> 2014-12-14 10:08:38.696854 7f3bbfb18700  1 -- 1.2.3.105:6825/13539 <== osd.43 1.2.3.115:0/16262 25058 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.696662) v2 ==== 
47+0+0 (778527481 0 0) 0x23a8c0c0 con 0x14339a20
    -5> 2014-12-14 10:08:38.757048 7f3b737db700  2 -- 1.2.68.105:6836/13539 >> 1.2.68.71:0/2021866 pipe(0x15e74b00 sd=81 :6836 s=2 pgs=206482 cs=1 l=1 c=0x7f80420).reader couldn't 
read tag, (0) Success
    -4> 2014-12-14 10:08:38.757158 7f3b737db700  2 -- 1.2.68.105:6836/13539 >> 1.2.68.71:0/2021866 pipe(0x15e74b00 sd=81 :6836 s=2 pgs=206482 cs=1 l=1 c=0x7f80420).fault (0) Succes
s
    -3> 2014-12-14 10:08:38.757382 7f3bc331f700  1 osd.4 73183 ms_handle_reset con 0x7f80420 session 0x1afcb560
    -2> 2014-12-14 10:08:38.761998 7f3b737db700  1 -- 1.2.68.105:6836/13539 >> :/0 pipe(0x1a8a4d80 sd=81 :6836 s=0 pgs=0 cs=0 l=0 c=0x912e9e0).accept sd=81 1.2.68.71:49159/0
    -1> 2014-12-14 10:08:38.889330 7f3bc331f700  1 -- 1.2.68.105:6836/13539 <== mon.1 1.2.68.102:6789/0 18082 ==== pg_stats_ack(13 pgs tid 15715) v1 ==== 381+0+0 (904121805 0 0) 0x
2d74c7c0 con 0x12653c80
     0> 2014-12-14 10:08:40.221316 7f3bd0593700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f3bd0593700

 ceph version 0.80.5-22-g17dc605 (17dc605483640ce8c23b5ae71cb984264cb296bf)
 1: /usr/bin/ceph-osd() [0x98eaf6]
 2: (()+0xf130) [0x7f3bd55af130]
 3: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x72) [0x7f3bd4b26482]
 4: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)+0x5c) [0x7f3bd4b26afc]
 5: (OpTracker::check_ops_in_flight(std::vector<std::string, std::allocator<std::string> >&)+0xe2c) [0x6a9cfc]
 6: (OSD::check_ops_in_flight()+0x51) [0x5f5541]
 7: (OSD::tick()+0x15f) [0x64a0af]
 8: (Context::complete(int)+0x9) [0x652599]
 9: (SafeTimer::timer_thread()+0x104) [0xa631f4]
 10: (SafeTimerThread::entry()+0xd) [0xa6428d]
 11: (()+0x7df3) [0x7f3bd55a7df3]

Actions

Copy link

Updated by Sage Weil over 9 years ago

Subject changed from OSD assertion failure when printing CEPH_OSD_OP_SRC_CMPXATTR op to osd: crash in check_ops_in_flight
Status changed from Need More Info to 12

Actions

Copy link

Updated by Dong Lei over 9 years ago

I think I find the problem.

Firstly, some clarification about the strange data change Wenjun Huang found.

1. vector size 0 but still enter the loop

gdb doesnot work well with std::vector, we need "p v._M_impl._M_start" to see the vector

2. "this" pointer changes

we mess frame 13 with frame 12, this pointer did not change.

Secondly, we broke at printing CEPH_OSD_OP_CMPXATTR not at printing CEPH_OSD_OP_SRC_CMPXATTR. Sorry for our careless.

Here comes my analysis:
1. from the core dump, I found the data of this op is already corrupted just after entering check_ops_in_flight

2. from the log, I see two lines about the op, one line says this op is applied:

119> 2014-11-06 05:42:00.738922 7f11adb14700  5 - op tracker -- , seq: 38064197, time: 2014-11-06 05:42:00.738921, event: sub_op_applied, request: osd_op(client.128435.0:95025845 default.12488.236_2465159741_f33b486caf_o.jpg [cmpxattr user.rgw.idtag (24) op 1 mode 1,create 0~0,delete,setxattr user.rgw.idtag (24),writefull 0~79981,setxattr user.rgw.manifest (470),setxattr user.rgw.acl (133),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-meta-origin (56)] 3.cc09cb01 ondisk+write e24684) v4

another line says it's done:

~~54> 2014-11-06 05:42:00.761210 7f11a06e0700 5 -~~ op tracker -- , seq: 38064197, time: 2014-11-06 05:42:00.761209, event: done, request: osd_op(client.128435.0:95025845 default.12488.236_2465159741_f33b486caf_o.jpg [cmpxattr user.rgw.idtag (24) op 1 mode 1,create 0~0,delete,setxattr user.rgw.idtag (24),writefull 0~79981,setxattr user.rgw.manifest (470),setxattr user.rgw.acl (133),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-meta-origin (56)] 3.cc09cb01 ondisk+write e24684) v4

so this op is marked done but still get checked in check_ops_in_flight.

BTW: I'm sure this op is the one trying to print(by comparing to the core dump and comparing the time).

3. I check the logic of marking it "done". the flow is
mark it done
clear_data
clear_payload // the problem is here
// start to unregister the op
get check_ops_in_flight lock
remove from list
return lock

so, you clear the data, before unregistering it and do not protect the clear under lock.

4. I confirmed that the thread unregister it and the thread do check_ops_in_flight is different.

5. I confirmed that the clear_data() function will deconstruct the field vector<OSDOp> ops and this will cause null pointer. Accessing std::vector in multi threads will cause undefined behavior. When printing a null pointer, it crashed.

Actions

Copy link

#10

Updated by Dong Lei over 9 years ago

I think this has been fixed by in master:
https://github.com/ceph/ceph/commit/11082f7aa02f3f41c7c74c5930b67216dba88ea1

I'm using firely, the code is:
void OpTracker::unregister_inflight_op(TrackedOp *i) {
// caller checks;
assert(tracking_enabled);

//  move the following two lines in the bottom the function will fix it
  i->request->clear_data();
  i->request->clear_payload();
  //

Mutex::Locker locker(ops_in_flight_lock);
  assert(i->xitem.get_list() == &ops_in_flight);
  utime_t now = ceph_clock_now(cct);
  i->xitem.remove_myself();
  history.insert(now, TrackedOpRef(i));
}

Actions

Copy link

#11