Project

General

Profile

Actions

Bug #3715

closed

Crash during 0.55 -> 0.56 upgrade

Added by Faidon Liambotis over 11 years ago. Updated over 11 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I started upgrading my 0.55.1 cluster to 0.56 and at one point in the middle of the upgrade, all 0.55.1 OSDs started to crash at the same time. Restarting them didn't fix it but upgrading them to 0.56 too did. I didn't get a chance to get debug logs, but I do have backtraces. Platform is Ubuntu 12.04 LTS with ceph.com binary packages.

4 6900.3__shadow__x20Y7D5BxFlJ-prC9UGtn-T1fwKU9j1_1 [??? refcount.put] 3.4863bd4b) v4
-15> 2013-01-02 15:05:33.036824 7ffa4301b700 -1 ./messages/MOSDOp.h: In function 'bool MOSDOp::check_rmw(int)' thread 7ffa4301b700 time 2013-01-02 15:05:33.035794
./messages/MOSDOp.h: 57: FAILED assert(rmw_flags)
ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
1: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x12aa) [0x6174da]
2: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0xe9) [0x61ef69]
3: (OSD::_dispatch(Message*)+0x26e) [0x626fbe]
4: (OSD::ms_dispatch(Message*)+0x1ba) [0x62772a]
5: (DispatchQueue::entry()+0x349) [0x8ae079]
6: (DispatchQueue::DispatchThread::entry()+0xd) [0x8071fd]
7: (()+0x7e9a) [0x7ffa4f625e9a]
8: (clone()+0x6d) [0x7ffa4e0a9cbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
14> 2013-01-02 15:05:33.039943 7ffa42019700  1 - 10.64.0.176:6866/16116 <== osd.11 10.64.0.173:6834/10428 119 ==== pg_info(1 pgs e10603:3.2eea) v3 ==== 512+0+0 (1285696962 0 0) 0x1afa2c40 con 0x1
aabb160
13> 2013-01-02 15:05:33.262252 7ffa40816700 1 - 10.64.0.176:6867/16116 <== osd.16 10.64.0.174:0/3734 231 ==== osd_ping(ping e10603 stamp 2013-01-02 15:05:33.261347) v2 ==== 47+0+0 (1605449094 0
0) 0x27f67c40 con 0x19b5f840
[...]
--- end dump of recent events ---
2013-01-02 15:05:33.743016 7ffa4301b700 -1 ** Caught signal (Aborted) *
in thread 7ffa4301b700
ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
1: /usr/bin/ceph-osd() [0x771c2a]
2: (()+0xfcb0) [0x7ffa4f62dcb0]
3: (gsignal()+0x35) [0x7ffa4dfec425]
4: (abort()+0x17b) [0x7ffa4dfefb8b]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7ffa4e93e69d]
6: (()+0xb5846) [0x7ffa4e93c846]
7: (()+0xb5873) [0x7ffa4e93c873]
8: (()+0xb596e) [0x7ffa4e93c96e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x81bbbf]
10: (OSD::handle_op(std::tr1::shared_ptr&lt;OpRequest&gt;)+0x12aa) [0x6174da]
11: (OSD::dispatch_op(std::tr1::shared_ptr&lt;OpRequest&gt;)+0xe9) [0x61ef69]
12: (OSD::_dispatch(Message*)+0x26e) [0x626fbe]
13: (OSD::ms_dispatch(Message*)+0x1ba) [0x62772a]
14: (DispatchQueue::entry()+0x349) [0x8ae079]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x8071fd]
16: (()+0x7e9a) [0x7ffa4f625e9a]
17: (clone()+0x6d) [0x7ffa4e0a9cbd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #3731: rados.h: recent change to CEPH_OSD_OP_CALL constitutes an incompatible protocol changeResolved01/04/2013

Actions
Actions #1

Updated by Sage Weil over 11 years ago

  • Status changed from New to 12

is someone sending an MOSDOp that has no ops? init_op_flags() is called before can_*(), so this sounds like an empty message.

(11:06:25 PM) paravoid: I think the crashes started when I upgraded radosgw

Actions #2

Updated by Ian Colle over 11 years ago

  • Assignee set to caleb miles
Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from 12 to Duplicate

this was #3731

Actions

Also available in: Atom PDF