Project

General

Profile

Bug #16714

async messenger process_events osd segfault

Added by Samuel Just almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

20> 2016-07-16 08:52:02.307392 7ff7cccf8700 10 osd.4 68 new session (outgoing) 0xa666000 con=0xa411800 addr=172.21.15.24:6812/27740
-19> 2016-07-16 08:52:02.307414 7ff7cccf8700 10 osd.4 68 OSD::ms_get_authorizer type=mon
-18> 2016-07-16 08:52:02.307607 7ff7cbcf6700 10 osd.4 68 ms_handle_connect on mon
-17> 2016-07-16 08:52:02.307615 7ff7cbcf6700 10 osd.4 68 send_alive up_thru currently 36 want 0
-16> 2016-07-16 08:52:02.307618 7ff7cbcf6700 10 osd.4 68 requeue_pg_temp 0 + 0 -> 0
-15> 2016-07-16 08:52:02.307621 7ff7cbcf6700 10 osd.4 68 requeue_failures 0 + 0 -> 0
-14> 2016-07-16 08:52:02.307624 7ff7cbcf6700 20 osd.4 68 send_pg_stats
-13> 2016-07-16 08:52:02.307772 7ff7cccf8700 1 -
172.21.15.24:6812/27740 >> 172.21.15.2:6789/0 conn(0xa653000 sd=32 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=81 cs=1 l=1). rx mon.0 seq 1 0xa40f840 mon_map magic: 0 v1
12> 2016-07-16 08:52:02.307803 7ff7cbcf6700 1 - 172.21.15.24:6812/27740 <== mon.0 172.21.15.2:6789/0 1 ==== mon_map magic: 0 v1 ==== 473+0+0 (1147783039 0 0) 0xa40f840 con 0xa653000
11> 2016-07-16 08:52:02.307818 7ff7cbcf6700 10 monclient(hunting): handle_monmap mon_map magic: 0 v1
-10> 2016-07-16 08:52:02.307813 7ff7cccf8700 1 -
172.21.15.24:6812/27740 >> 172.21.15.2:6789/0 conn(0xa653000 sd=32 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=81 cs=1 l=1). rx mon.0 seq 2 0xa66e000 auth_reply(proto 2 0 (0) Success) v1
9> 2016-07-16 08:52:02.307828 7ff7cbcf6700 10 monclient(hunting): got monmap 1, mon.a is now rank 0
-8> 2016-07-16 08:52:02.307830 7ff7cbcf6700 10 monclient(hunting): dump:
epoch 1
fsid 9bd582ee-a353-4bd2-aa1e-b8b10c2612c3
last_changed 2016-07-16 08:49:07.272531
created 2016-07-16 08:49:07.272531
0: 172.21.15.2:6789/0 mon.a
1: 172.21.15.24:6789/0 mon.b
2: 172.21.15.2:6790/0 mon.c

-7> 2016-07-16 08:52:02.307846 7ff7cbcf6700 1 - 172.21.15.24:6812/27740 <== mon.0 172.21.15.2:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3111445792 0 0) 0xa66e000 con 0xa653000
6> 2016-07-16 08:52:02.307860 7ff7da891700 5 asok(0xa3d23c0) AdminSocket: request 'get_command_descriptions' '' to 0xa3341f0 returned 1574 bytes
-5> 2016-07-16 08:52:02.307861 7ff7cbcf6700 10 monclient(hunting): my global_id is 4287
-4> 2016-07-16 08:52:02.307933 7ff7cbcf6700 10 monclient(hunting): _send_mon_message to mon.a at 172.21.15.2:6789/0
-3> 2016-07-16 08:52:02.307938 7ff7cbcf6700 1 -
172.21.15.24:6812/27740 >> 172.21.15.2:6789/0 conn(0xa653000 sd=32 :-1 s=STATE_OPEN pgs=81 cs=1 l=1). tx 0xa40d8c0 auth(proto 2 32 bytes epoch 0) v1
2> 2016-07-16 08:52:02.308197 7ff7cccf8700 1 - 172.21.15.24:6812/27740 >> 172.21.15.2:6789/0 conn(0xa653000 sd=32 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=81 cs=1 l=1). rx mon.0 seq 3 0xa66e240 auth_reply(proto 2 0 (0) Success) v1
1> 2016-07-16 08:52:02.308219 7ff7cbcf6700 1 - 172.21.15.24:6812/27740 <== mon.0 172.21.15.2:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (4082438654 0 0) 0xa66e240 con 0xa653000
0> 2016-07-16 08:52:02.308228 7ff7cc4f7700 -1 ** Caught signal (Segmentation fault) *
in thread 7ff7cc4f7700 thread_name:ms_async_worker

ceph version v11.0.0-632-g57264b9 (57264b9bf31cf6f77974fd7025775e963b0349e3)
1: ceph-osd() [0xbfe80a]
2: (()+0xf100) [0x7ff7e099c100]
3: ceph-osd() [0xdf6321]
4: (EventCenter::process_events(int)+0x85a) [0xe03d8a]
5: (Worker::entry()+0x1f8) [0xdf6b28]
6: (()+0x7dc5) [0x7ff7e0994dc5]
7: (clone()+0x6d) [0x7ff7de88f28d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

sjust@teuthology:/a/samuelj-2016-07-14_14:34:22-rados-wip-ec-pg-log-distro-basic-smithi/314564/remote$

History

#1 Updated by Samuel Just almost 4 years ago

  • Priority changed from Normal to Immediate

sjust@teuthology:/a/samuelj-2016-07-14_14:34:22-rados-wip-ec-pg-log-distro-basic-smithi/314514/remote$

#2 Updated by Samuel Just almost 4 years ago

sjust@teuthology:/a/samuelj-2016-07-14_14:34:22-rados-wip-ec-pg-log-distro-basic-smithi/314374$

#3 Updated by Samuel Just almost 4 years ago

sjust@teuthology:/a/samuelj-2016-07-14_14:34:22-rados-wip-ec-pg-log-distro-basic-smithi/314437$

#4 Updated by Kefu Chai almost 4 years ago

and in ceph-mon also, see /a/kchai-2016-07-24_21:25:48-rados-wip-16801---basic-mira/331678

   -21> 2016-07-24 17:19:41.943341 7f1ac8c194c0  1 Event(0xb255ac8 nevent=5000 time_id=1).wakeup
   -20> 2016-07-24 17:19:41.943345 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 >> 172.21.4.110:6790/0 conn(0xb4f4000 sd=-1 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0). == tx == 0xb3cc080 mon
_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h) v6
   -19> 2016-07-24 17:19:41.943351 7f1ac8c194c0  1 Event(0xb255ac8 nevent=5000 time_id=1).wakeup
   -18> 2016-07-24 17:19:41.943355 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 _send_message--> mon.4 172.21.4.112:6790/0 -- mon_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h)
 v6 -- ?+0 0xb3cbe00
   -17> 2016-07-24 17:19:41.943363 7f1ac8c194c0  1 Event(0xb255e08 nevent=5000 time_id=1).wakeup
   -16> 2016-07-24 17:19:41.943368 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 >> 172.21.4.112:6790/0 conn(0xb50a800 sd=-1 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0). == tx == 0xb3cbe00 mon
_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h) v6
   -15> 2016-07-24 17:19:41.943373 7f1ac8c194c0  1 Event(0xb255e08 nevent=5000 time_id=1).wakeup
   -14> 2016-07-24 17:19:41.943377 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 _send_message--> mon.5 172.21.5.112:6790/0 -- mon_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h)
 v6 -- ?+0 0xb3cbb80
   -13> 2016-07-24 17:19:41.943385 7f1ac8c194c0  1 Event(0xb255c68 nevent=5000 time_id=1).wakeup
   -12> 2016-07-24 17:19:41.943389 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 >> 172.21.5.112:6790/0 conn(0xb509000 sd=-1 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0). == tx == 0xb3cbb80 mon
_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h) v6
   -11> 2016-07-24 17:19:41.943394 7f1ac8c194c0  1 Event(0xb255c68 nevent=5000 time_id=1).wakeup
   -10> 2016-07-24 17:19:41.943398 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 _send_message--> mon.7 172.21.4.112:6791/0 -- mon_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h)
 v6 -- ?+0 0xb3cd480
    -9> 2016-07-24 17:19:41.943406 7f1ac8c194c0  1 Event(0xb255ac8 nevent=5000 time_id=1).wakeup
    -8> 2016-07-24 17:19:41.943410 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 >> 172.21.4.112:6791/0 conn(0xb507800 sd=-1 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0). == tx == 0xb3cd480 mon
_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h) v6
    -7> 2016-07-24 17:19:41.943416 7f1ac8c194c0  1 Event(0xb255ac8 nevent=5000 time_id=1).wakeup
    -6> 2016-07-24 17:19:41.943420 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 _send_message--> mon.8 172.21.5.112:6791/0 -- mon_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h)
 v6 -- ?+0 0xb3cd200
    -5> 2016-07-24 17:19:41.943427 7f1ac8c194c0  1 Event(0xb255e08 nevent=5000 time_id=1).wakeup
    -4> 2016-07-24 17:19:41.943431 7f1ac8c194c0  1 -- 172.21.4.110:6791/0 >> 172.21.5.112:6791/0 conn(0xb506000 sd=-1 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0). == tx == 0xb3cd200 mon
_probe(probe 4df6755b-b1bf-483f-9d13-ed9d5e8bff5b name h) v6
    -3> 2016-07-24 17:19:41.943436 7f1ac8c194c0  1 Event(0xb255e08 nevent=5000 time_id=1).wakeup
    -2> 2016-07-24 17:19:41.943826 7f1ac2185700  1 -- 172.21.4.110:6791/0 >> 172.21.4.110:6789/0 conn(0xb4f8800 sd=35 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection
reconnect failed
    -1> 2016-07-24 17:19:41.944490 7f1ac2185700  1 -- 172.21.4.110:6791/0 >> 172.21.4.112:6791/0 conn(0xb507800 sd=36 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection
reconnect failed
     0> 2016-07-24 17:19:41.945431 7f1ac2986700 -1 *** Caught signal (Segmentation fault) **

 in thread 7f1ac2986700 thread_name:ms_async_worker
ceph version v11.0.0-901-ge1c3b1e (e1c3b1ebd3a5dd73daaf6f9e875849800d163eab)
 1: ceph-mon() [0xa1c5e2]
 2: (()+0x10330) [0x7f1ac6947330]
 3: ceph-mon() [0x8c5001]
 4: (EventCenter::process_events(int)+0xa77) [0x8d2d27]
 5: (Worker::entry()+0x1e8) [0x8c57a8]
 6: (()+0x8184) [0x7f1ac693f184]
 7: (clone()+0x6d) [0x7f1ac5e4c37d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#5 Updated by Haomai Wang almost 4 years ago

  • Status changed from New to 7

#6 Updated by Haomai Wang almost 4 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF