Project

General

Profile

Actions

Bug #9916

closed

osd: crash in check_ops_in_flight

Added by Guang Yang over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
firefly
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Assertion failure:

 0> 2014-10-27 06:53:05.979871 7fd7b2156700 -1 common/buffer.cc: In function 'ceph::buffer::ptr::ptr(const ceph::buffer::ptr&
, unsigned int, unsigned int)' thread 7fd7b2156700 time 2014-10-27 06:53:05.941971
common/buffer.cc: 574: FAILED assert(_raw)

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: /home/y/bin64/ceph-osd() [0xaca133]
 2: (ceph::buffer::list::substr_of(ceph::buffer::list const&, unsigned int, unsigned int)+0xe4) [0xacbc84]
 3: (ceph::buffer::list::write(int, int, std::ostream&) const+0x81) [0xacbe61]
 4: (operator<<(std::ostream&, OSDOp const&)+0x1c8) [0x6cddc8]
 5: (MOSDOp::print(std::ostream&) const+0x1df) [0x71e51f]
 6: (OpTracker::check_ops_in_flight(std::vector<std::string, std::allocator<std::string> >&)+0x89a) [0x6ba68a]
 7: (OSD::check_ops_in_flight()+0x4d) [0x60767d]
 8: (OSD::tick()+0x146) [0x65f956]
 9: (Context::complete(int)+0x9) [0x660d69]
 10: (SafeTimer::timer_thread()+0x453) [0x9d1823]
 11: (SafeTimerThread::entry()+0xd) [0x9d39dd]
 12: /lib64/libpthread.so.0() [0x37aa0079d1]
 13: (clone()+0x6d) [0x37a98e8b5d]

Core dump shows the follow stack (with more details):

   ostream& operator<<(ostream& out, const OSDOp& op)
    -> case CEPH_OSD_OP_SRC_CMPXATTR: op.indata.write(0, op.op.xattr.name_len, out);
      -> buffer::list::write(int off, int len, std::ostream& out)

However, the op.indata's internal pointer for this OP is null so that it results in an assertion failure.

ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f) on RHEL6.5

Actions #1

Updated by Sage Weil over 9 years ago

  • Status changed from New to Need More Info

how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops should have name_len bytes in the indata bufferlist with the attr name ...

Actions #2

Updated by Guang Yang over 9 years ago

The crash happened with radosgw as the client, so I guess it is formed by objecter - https://github.com/ceph/ceph/blob/master/src/osdc/Objecter.h#L136 ?

Actions #3

Updated by Guang Yang over 9 years ago

Hi Yehuda,
After taking a look at the rgw code, I failed to find which (http) request would need CEPH_OSD_OP_SRC_CMPXATTR op, maybe I missed something obvious?

Thanks,
Guang

Actions #4

Updated by Wenjun Huang over 9 years ago

Sage Weil wrote:

how is the OSDOp being formed? this looks like a bug on the client side to me. the attr ops should have name_len bytes in the indata bufferlist with the attr name ...

Our code base is ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
Update a new core dump stack of this bug:

#0  0x0000003bd200f5db in raise () from /lib64/libpthread.so.0
#1  0x00000000009af6c7 in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:105
#3  <signal handler called>
#4  0x0000003bd1832925 in raise () from /lib64/libc.so.6
#5  0x0000003bd1834105 in abort () from /lib64/libc.so.6
#6  0x0000003bd3cbea5d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#7  0x0000003bd3cbcbe6 in ?? () from /usr/lib64/libstdc++.so.6
#8  0x0000003bd3cbcc13 in std::terminate() () from /usr/lib64/libstdc++.so.6
#9  0x0000003bd3cbcd0e in __cxa_throw () from /usr/lib64/libstdc++.so.6
#10 0x0000000000a36fc9 in ceph::__ceph_assert_fail (assertion=0x355a230 "\001", file=0x4054d520 "Pt`8", line=574, func=0xc54120 "ceph::buffer::ptr::ptr(const ceph::buffer::ptr&, unsigned int, unsigned int)") at common/assert.cc:77
#11 0x0000000000aca133 in ceph::buffer::ptr::ptr (this=<value optimized out>, p=<value optimized out>, o=<value optimized out>, l=<value optimized out>) at common/buffer.cc:574
#12 0x0000000000acbc84 in ceph::buffer::list::substr_of (this=0x7f11b43201d0, other=<value optimized out>, off=<value optimized out>, len=<value optimized out>) at common/buffer.cc:1357
#13 0x0000000000acbe61 in ceph::buffer::list::write (this=0xe8bc038, off=0, len=14, out=...) at common/buffer.cc:1441
#14 0x00000000006cddc8 in operator<< (out=..., op=...) at osd/osd_types.cc:4447
#15 0x000000000071e51f in operator<< <OSDOp> (this=0x4812afc0, out=...) at ./include/types.h:99
#16 MOSDOp::print (this=0x4812afc0, out=...) at ./messages/MOSDOp.h:365
#17 0x00000000006ba68a in operator<< (this=0x357cd20, warning_vector=std::vector of length 3, capacity 6 = {...}) at ./msg/Message.h:535
#18 OpTracker::check_ops_in_flight (this=0x357cd20, warning_vector=std::vector of length 3, capacity 6 = {...}) at common/TrackedOp.cc:175
#19 0x000000000060767d in OSD::check_ops_in_flight (this=0x357c000) at osd/OSD.cc:3324
#20 0x000000000065f956 in OSD::tick (this=0x357c000) at osd/OSD.cc:3316
#21 0x0000000000660d69 in Context::complete (this=0x418cf220, r=<value optimized out>) at ./include/Context.h:64
#22 0x00000000009d1823 in SafeTimer::timer_thread (this=0x357c070) at common/Timer.cc:105
#23 0x00000000009d39dd in SafeTimerThread::entry (this=<value optimized out>) at common/Timer.cc:38
#24 0x0000003bd20079d1 in start_thread () from /lib64/libpthread.so.0
#25 0x0000003bd18e8b5d in clone () from /lib64/libc.so.6

We can see that the bug sourced form OSD::tick(), not from the client. Diving into the core dump, I find some strange data transmission.
Upon frame 15:
template<class A>
inline ostream& operator<<(ostream& out, const vector<A>& v) {
  out << "[";
  for (typename vector<A>::const_iterator p = v.begin(); p != v.end(); ++p) {
    if (p != v.begin()) out << ",";
    out << *p;
  }
  out << "]";
  return out;
}

I can see the v's length is 0, But it enters the loop. And then in frame 14, I can see the op.indata is:
(gdb) p op.indata
$26 = {_buffers = std::list = {[0] = {_raw = 0x0, _off = 0, _len = 38}, [1] = {_raw = 0x0, _off = 0, _len = 3039}, [2] = {_raw = 0x0, _off = 0, _len = 77824}, [3] = {_raw = 0xffffffff, _off = 1768189551, _len = 27507}, [4] = {
      _raw = 0x0, _off = 0, _len = 264}, [5] = {_raw = 0x0, _off = 2732, _len = 133}, [6] = {_raw = 0x0, _off = 2886, _len = 25}, [7] = {_raw = 0x0, _off = 2924, _len = 33}, [8] = {_raw = 0x0, _off = 52, _len = 24}, [9] = {_raw = 0x0,
      _off = 2250, _len = 470}, [10] = {_raw = 0x0, _off = 2983, _len = 56}, [11] = {_raw = 0xb740d78ffffffff, _off = 1885433459, _len = 7628147}, [12] = {_raw = 0x0, _off = 0, _len = 31}, [13] = {_raw = 0xfb9f080, _off = 1768161375,
      _len = 27507}, [14] = {_raw = 0xffffffff, _off = 1886678633, _len = 29300}, [15] = {_raw = 0xffffffff, _off = 1885433459, _len = 7628147}, [16] = {_raw = 0x0, _off = 0, _len = 31}, [17] = {_raw = 0xffffffff, _off = 95, _len = 56},
    [18] = {_raw = 0x0, _off = 0, _len = 264}, [19] = {_raw = 0x0, _off = 2983, _len = 56}, [20] = {_raw = 0x0, _off = 2924, _len = 33}, [21] = {_raw = 0x0, _off = 2886, _len = 25}, [22] = {_raw = 0x0, _off = 2732, _len = 133}, [23] = {
      _raw = 0x0, _off = 2250, _len = 470}, [24] = {_raw = 0x0, _off = 52, _len = 24}, [25] = {_raw = 0x0, _off = 0, _len = 2233}, [26] = {_raw = 0x0, _off = 76, _len = 77748}, [27] = {_raw = 0x0, _off = 52, _len = 24}, [28] = {
      _raw = 0x59e56a8, _off = 0, _len = 31}, [29] = {_raw = 0x629cf50, _off = 0, _len = 4096}, [30] = {_raw = 0xffffffff, _off = 1769235297, _len = 25974}, [31] = <error reading variable>

But in frame 13, in buffer::list, I can see:
(gdb) p *this
$31 = {_buffers = empty std::list, _len = 0, append_buffer = {_raw = 0x0, _off = 0, _len = 0}, last_p = {bl = 0x7f11b43201d0, ls = 0x7f11b43201d0, off = 0, p = {_raw = , _off = 0, _len = 0}, p_off = 0}}

So in these two data transfer process, the data appeared unexpected change:
1. In include/types.h:

template<class A>
inline ostream& operator<<(ostream& out, const vector<A>& v) {
  out << "[";
  for (typename vector<A>::const_iterator p = v.begin(); p != v.end(); ++p) {
    if (p != v.begin()) out << ",";
    out << *p;
  }
  out << "]";
  return out;
}

gdb print v is empty, but executes the loop.

2. In osd/osd_types.cc:
In some cases it executes:

op.indata.write(0, op.op.xattr.name_len, out);

And then in common/buffer.cc:
void buffer::list::write(int off, int len, std::ostream& out) const
{
  list s;
  s.substr_of(*this, off, len);
  for (std::list<ptr>::const_iterator it = s._buffers.begin(); 
     it != s._buffers.end(); 
     ++it)
    if (it->length())
    out.write(it->c_str(), it->length());
  /*iterator p(this, off);
    while (len > 0 && !p.end()) {
    int l = p.left_in_this_buf();
    if (l > len)
    l = len;
    out.write(p.c_str(), l);
    len -= l;
    }*/
}

invoke the buffer::list::write(...) method, but step into the method, the *this is not the same as op.indata.

I think there may be some miltithread problem in the code flow.

Actions #5

Updated by Wenjun Huang over 9 years ago

In file osd/OSD.cc:
OSD::_dispatch(Message *m) method:

  default:
    {
      OpRequestRef op = op_tracker.create_request<OpRequest, Message*>(m);
      op->mark_event("waiting_for_osdmap");
      // no map?  starting up?
      if (!osdmap) {
        dout(7) << "no OSDMap, not booted" << dendl;
        waiting_for_osdmap.push_back(op);
        break;
      }

      // need OSDMap
      dispatch_op(op);
    }

The op will goes two branch, one to op_tracker for monitoring the op, the other goes to dispatch_op(op) to handle the op.
So, when the tracker print the op and invoke the MOSDOp::print() method.
In the printing flow, the ops may be changed by the dispatch_op flow, and cause the issue while traversing the ops.
I have the pull request [[https://github.com/ceph/ceph/pull/3117]]

Actions #6

Updated by Wenjun Huang over 9 years ago

Sorry for the repetition of #5 and #6 for network problem.

Actions #7

Updated by Sage Weil over 9 years ago

just saw this on another cluster:

   -22> 2014-12-14 10:08:38.610096 7f3bd0593700  1 -- 1.2.68.105:6836/13539 --> 1.2.68.102:6789/0 -- pg_stats(13 pgs tid 15715 v 0) v1 -- ?+0 0x2c8ca080 con 0x12653c80
   -21> 2014-12-14 10:08:38.610289 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.39 1.2.3.115:0/10730 24944 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.343103) v2 ==== 
47+0+0 (3810113608 0 0) 0x1c938a80 con 0x142c56a0
   -20> 2014-12-14 10:08:38.610330 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/10730 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.343103) v2 -- ?+0 0x2ec9a
d80 con 0x142c56a0
   -19> 2014-12-14 10:08:38.616211 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.33 1.2.3.115:0/17791 25002 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.254878) v2 ==== 
47+0+0 (2978887605 0 0) 0x1d59d240 con 0x142c4360
   -18> 2014-12-14 10:08:38.616278 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/17791 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.254878) v2 -- ?+0 0x1c938
a80 con 0x142c4360
   -17> 2014-12-14 10:08:38.616315 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.46 1.2.3.115:0/9312 25134 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:37.974745) v2 ==== 4
7+0+0 (2704133161 0 0) 0x2a2ce900 con 0x14384a40
   -16> 2014-12-14 10:08:38.616337 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/9312 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:37.974745) v2 -- ?+0 0x1d59d2
40 con 0x14384a40
   -15> 2014-12-14 10:08:38.616370 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.30 1.2.3.115:0/20341 24994 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:37.239113) v2 ==== 
47+0+0 (2127826468 0 0) 0x291d8a80 con 0x14384780
   -14> 2014-12-14 10:08:38.616385 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.115:0/20341 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:37.239113) v2 -- ?+0 0x2a2ce
900 con 0x14384780
   -13> 2014-12-14 10:08:38.616418 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.53 1.2.3.114:0/6878 25036 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.071152) v2 ==== 4
7+0+0 (1828542634 0 0) 0x1fcf0fc0 con 0x14315c20
   -12> 2014-12-14 10:08:38.616432 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 --> 1.2.3.114:0/6878 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.071152) v2 -- ?+0 0x291d8a
80 con 0x14315c20
   -11> 2014-12-14 10:08:38.617036 7f3bbfb18700  1 -- 1.2.3.105:6825/13539 <== osd.65 1.2.68.114:0/18358 25053 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.610883) v2 ==== 
47+0+0 (969273065 0 0) 0x16bc6580 con 0x14315800
   -10> 2014-12-14 10:08:38.623682 7f3bc0b1a700  1 -- 1.2.68.105:6837/13539 <== osd.31 1.2.3.115:0/14423 25056 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.126331) v2 ==== 
47+0+0 (2730504651 0 0) 0x1713ec80 con 0x14315960
    -9> 2014-12-14 10:08:38.623743 7f3bbfb18700  1 -- 1.2.3.105:6825/13539 --> 1.2.68.114:0/18358 -- osd_ping(ping_reply e73183 stamp 2014-12-14 10:08:38.610883) v2 -- ?+0 0x21996
e40 con 0x14315800
    -8> 2014-12-14 10:08:38.623678 7f3bc7be7700  1 -- 1.2.3.105:6824/13539 --> osd.59 1.2.3.114:6844/23266 -- osd_sub_op_reply(client.8564947.0:709059 3.2d1 d35552d1/rbd_data.204
9bc14917142.00000000000002f8/head//3 [] ondisk, result = 0) v2 -- ?+0 0x2ed92580
    -7> 2014-12-14 10:08:38.646460 7f3bb8309700  1 -- 1.2.3.105:6824/13539 --> 1.2.3.115:6810/7283 -- osd_sub_op(unknown.0.0:0 3.4dd 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] sn
apc=0=[]) v10 -- ?+2067 0x19d55800 con 0x12691b80
    -6> 2014-12-14 10:08:38.696854 7f3bbfb18700  1 -- 1.2.3.105:6825/13539 <== osd.43 1.2.3.115:0/16262 25058 ==== osd_ping(ping e73183 stamp 2014-12-14 10:08:38.696662) v2 ==== 
47+0+0 (778527481 0 0) 0x23a8c0c0 con 0x14339a20
    -5> 2014-12-14 10:08:38.757048 7f3b737db700  2 -- 1.2.68.105:6836/13539 >> 1.2.68.71:0/2021866 pipe(0x15e74b00 sd=81 :6836 s=2 pgs=206482 cs=1 l=1 c=0x7f80420).reader couldn't 
read tag, (0) Success
    -4> 2014-12-14 10:08:38.757158 7f3b737db700  2 -- 1.2.68.105:6836/13539 >> 1.2.68.71:0/2021866 pipe(0x15e74b00 sd=81 :6836 s=2 pgs=206482 cs=1 l=1 c=0x7f80420).fault (0) Succes
s
    -3> 2014-12-14 10:08:38.757382 7f3bc331f700  1 osd.4 73183 ms_handle_reset con 0x7f80420 session 0x1afcb560
    -2> 2014-12-14 10:08:38.761998 7f3b737db700  1 -- 1.2.68.105:6836/13539 >> :/0 pipe(0x1a8a4d80 sd=81 :6836 s=0 pgs=0 cs=0 l=0 c=0x912e9e0).accept sd=81 1.2.68.71:49159/0
    -1> 2014-12-14 10:08:38.889330 7f3bc331f700  1 -- 1.2.68.105:6836/13539 <== mon.1 1.2.68.102:6789/0 18082 ==== pg_stats_ack(13 pgs tid 15715) v1 ==== 381+0+0 (904121805 0 0) 0x
2d74c7c0 con 0x12653c80
     0> 2014-12-14 10:08:40.221316 7f3bd0593700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f3bd0593700

 ceph version 0.80.5-22-g17dc605 (17dc605483640ce8c23b5ae71cb984264cb296bf)
 1: /usr/bin/ceph-osd() [0x98eaf6]
 2: (()+0xf130) [0x7f3bd55af130]
 3: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x72) [0x7f3bd4b26482]
 4: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)+0x5c) [0x7f3bd4b26afc]
 5: (OpTracker::check_ops_in_flight(std::vector<std::string, std::allocator<std::string> >&)+0xe2c) [0x6a9cfc]
 6: (OSD::check_ops_in_flight()+0x51) [0x5f5541]
 7: (OSD::tick()+0x15f) [0x64a0af]
 8: (Context::complete(int)+0x9) [0x652599]
 9: (SafeTimer::timer_thread()+0x104) [0xa631f4]
 10: (SafeTimerThread::entry()+0xd) [0xa6428d]
 11: (()+0x7df3) [0x7f3bd55a7df3]

Actions #8

Updated by Sage Weil over 9 years ago

  • Subject changed from OSD assertion failure when printing CEPH_OSD_OP_SRC_CMPXATTR op to osd: crash in check_ops_in_flight
  • Status changed from Need More Info to 12
Actions #9

Updated by Dong Lei over 9 years ago

I think I find the problem.

Firstly, some clarification about the strange data change Wenjun Huang found.

1. vector size 0 but still enter the loop

gdb doesnot work well with std::vector, we need "p v._M_impl._M_start" to see the vector

2. "this" pointer changes

we mess frame 13 with frame 12, this pointer did not change.

Secondly, we broke at printing CEPH_OSD_OP_CMPXATTR not at printing CEPH_OSD_OP_SRC_CMPXATTR. Sorry for our careless.

Here comes my analysis:
1. from the core dump, I found the data of this op is already corrupted just after entering check_ops_in_flight

2. from the log, I see two lines about the op, one line says this op is applied:

119> 2014-11-06 05:42:00.738922 7f11adb14700  5 - op tracker -- , seq: 38064197, time: 2014-11-06 05:42:00.738921, event: sub_op_applied, request: osd_op(client.128435.0:95025845 default.12488.236_2465159741_f33b486caf_o.jpg [cmpxattr user.rgw.idtag (24) op 1 mode 1,create 0~0,delete,setxattr user.rgw.idtag (24),writefull 0~79981,setxattr user.rgw.manifest (470),setxattr user.rgw.acl (133),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-meta-origin (56)] 3.cc09cb01 ondisk+write e24684) v4

another line says it's done:

54> 2014-11-06 05:42:00.761210 7f11a06e0700 5 - op tracker -- , seq: 38064197, time: 2014-11-06 05:42:00.761209, event: done, request: osd_op(client.128435.0:95025845 default.12488.236_2465159741_f33b486caf_o.jpg [cmpxattr user.rgw.idtag (24) op 1 mode 1,create 0~0,delete,setxattr user.rgw.idtag (24),writefull 0~79981,setxattr user.rgw.manifest (470),setxattr user.rgw.acl (133),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-meta-origin (56)] 3.cc09cb01 ondisk+write e24684) v4

so this op is marked done but still get checked in check_ops_in_flight.

BTW: I'm sure this op is the one trying to print(by comparing to the core dump and comparing the time).

3. I check the logic of marking it "done". the flow is
mark it done
clear_data
clear_payload
// the problem is here
// start to unregister the op
get check_ops_in_flight lock
remove from list
return lock

so, you clear the data, before unregistering it and do not protect the clear under lock.

4. I confirmed that the thread unregister it and the thread do check_ops_in_flight is different.

5. I confirmed that the clear_data() function will deconstruct the field vector<OSDOp> ops and this will cause null pointer. Accessing std::vector in multi threads will cause undefined behavior. When printing a null pointer, it crashed.

Actions #10

Updated by Dong Lei over 9 years ago

I think this has been fixed by in master:
https://github.com/ceph/ceph/commit/11082f7aa02f3f41c7c74c5930b67216dba88ea1

I'm using firely, the code is:
void OpTracker::unregister_inflight_op(TrackedOp *i) {
// caller checks;
assert(tracking_enabled);

//  move the following two lines in the bottom the function will fix it
i->request->clear_data();
i->request->clear_payload();
//
Mutex::Locker locker(ops_in_flight_lock);
assert(i->xitem.get_list() == &ops_in_flight);
utime_t now = ceph_clock_now(cct);
i->xitem.remove_myself();
history.insert(now, TrackedOpRef(i));
}
Actions #12

Updated by Loïc Dachary over 9 years ago

  • Backport set to firefly
Actions #13

Updated by Samuel Just over 9 years ago

  • Status changed from 12 to Resolved
Actions

Also available in: Atom PDF