Project

General

Profile

Actions

Bug #9224

closed

osd: segv in dlopen

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thread 1 (Thread 0x7f80a208a700 (LWP 37102)):
#0  0x00007f80b8a4df07 in _dl_map_object_deps (map=map@entry=0x7f80b8c4e4e8, preloads=preloads@entry=0x0, 
    npreloads=npreloads@entry=0, trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at dl-deps.c:528
#1  0x00007f80b8a54aab in dl_open_worker (a=a@entry=0x7f80a20881e8) at dl-open.c:272
#2  0x00007f80b8a4fff4 in _dl_catch_error (objname=objname@entry=0x7f80a20881d8, errstring=errstring@entry=0x7f80a20881e0, 
    mallocedp=mallocedp@entry=0x7f80a20881d0, operate=operate@entry=0x7f80b8a549a0 <dl_open_worker>, 
    args=args@entry=0x7f80a20881e8) at dl-error.c:187
#3  0x00007f80b8a543bb in _dl_open (file=0x7f80b69664de "em != ((void *)0)", mode=-2147483647, caller_dlopen=<optimized out>, 
    nsid=-2, argc=4, argv=0x7fff17991b68, env=0x347c000) at dl-open.c:661
#4  0x00007f80b6920002 in __GI___libc_dlopen_mode (name=0x7f80b8c4e4e8 "", mode=6628400) at dl-libc.c:157
#5  0x00007f80a2088400 in ?? ()
#6  0x00007f80b8c641c8 in _r_debug ()
#7  0x00007f80a20883e0 in ?? ()
#8  0x00007f80a20883f0 in ?? ()
#9  0x00007f80a20883d0 in ?? ()
#10 0x00007f80a20882d4 in ?? ()
#11 0x0000000000000000 in ?? ()

last few things the thread did...

2014-08-24 04:13:45.843641 7f80a208a700 10 osd.2 407 dequeue_op 0x44d2ee0 prio 127 cost 0 latency 0.000297 MOSDECSubOpReadReply(3.37s0 407 ECSubReadReply(tid=1915, attrs_read=2)) v1 pg pg[3.37s0( v 397'180 (0'0,397'180] local-les=407 n=8 ec=8 les/c 407/401 406/406/4
00) [2,5,3] r=0 lpr=406 pi=400-405/1 rops=5 crt=397'180 mlcod 0'0 active+recovering]
2014-08-24 04:13:45.843656 7f80a7895700 10 osd.2 pg_epoch: 407 pg[3.50s2( v 293'171 (0'0,293'171] lb 0//0//-1 local-les=403 n=4 ec=8 les/c 403/403 406/406/406) [2,5,3] r=-1 lpr=406 pi=324-405/10 luod=0'0 crt=293'171 active] on_change
2014-08-24 04:13:45.843665 7f80a208a700 10 osd.2 pg_epoch: 407 pg[3.37s0( v 397'180 (0'0,397'180] local-les=407 n=8 ec=8 les/c 407/401 406/406/400) [2,5,3] r=0 lpr=406 pi=400-405/1 rops=5 crt=397'180 mlcod 0'0 active+recovering] handle_message: MOSDECSubOpReadReply(
3.37s0 407 ECSubReadReply(tid=1915, attrs_read=2)) v1
2014-08-24 04:13:45.843668 7f80a7895700 10 osd.2 pg_epoch: 407 pg[3.50s2( v 293'171 (0'0,293'171] lb 0//0//-1 local-les=403 n=4 ec=8 les/c 403/403 406/406/406) [2,5,3] r=-1 lpr=406 pi=324-405/10 luod=0'0 crt=293'171 active] clear_primary_state
2014-08-24 04:13:45.843676 7f80a208a700 10 osd.2 pg_epoch: 407 pg[3.37s0( v 397'180 (0'0,397'180] local-les=407 n=8 ec=8 les/c 407/401 406/406/400) [2,5,3] r=0 lpr=406 pi=400-405/1 rops=5 crt=397'180 mlcod 0'0 active+recovering] handle_sub_read_reply: reply ECSubRea
dReply(tid=1915, attrs_read=2)
2014-08-24 04:13:45.843678 7f80a7895700 20 osd.2 pg_epoch: 407 pg[3.50s2( v 293'171 (0'0,293'171] lb 0//0//-1 local-les=403 n=4 ec=8 les/c 403/403 406/406/406) [2,5,3] r=-1 lpr=406 pi=324-405/10 luod=0'0 crt=293'171 active] agent_stop
2014-08-24 04:13:45.843687 7f80a7895700 10 osd.2 pg_epoch: 407 pg[3.50s2( v 293'171 (0'0,293'171] lb 0//0//-1 local-les=403 n=4 ec=8 les/c 403/403 406/406/406) [2,5,3] r=-1 lpr=406 pi=324-405/10 luod=0'0 crt=293'171 active] cancel_recovery
2014-08-24 04:13:45.843694 7f80a7895700 10 osd.2 pg_epoch: 407 pg[3.50s2( v 293'171 (0'0,293'171] lb 0//0//-1 local-les=403 n=4 ec=8 les/c 403/403 406/406/406) [2,5,3] r=-1 lpr=406 pi=324-405/10 luod=0'0 crt=293'171 active] clear_recovery_state
2014-08-24 04:13:45.843707 7f80a7895700  5 filestore(/var/lib/ceph/osd/ceph-2) queue_transactions existing osr(3.50s2 0x3d8f2c0)/0x3d8f2c0
2014-08-24 04:13:45.843711 7f80a7895700 10 journal op_submit_start 33344
2014-08-24 04:13:45.843713 7f80a7895700  5 filestore(/var/lib/ceph/osd/ceph-2) queue_transactions (parallel) 33344 0x4057840
2014-08-24 04:13:45.843714 7f80a7895700 10 journal op_journal_transactions 33344 0x4057840
2014-08-24 04:13:45.843694 7f80a208a700 10 osd.2 pg_epoch: 407 pg[3.37s0( v 397'180 (0'0,397'180] local-les=407 n=8 ec=8 les/c 407/401 406/406/400) [2,5,3] r=0 lpr=406 pi=400-405/1 rops=5 crt=397'180 mlcod 0'0 active+recovering] handle_sub_read_reply readop not comp
lete: ReadOp(tid=1915, to_read={69f54b37/burnupi5634160-393/head//3=read_request_t(to_read=[0,1052672], need=2(0),5(1), want_attrs=1),114a8db7/burnupi5634160-166/head//3=read_request_t(to_read=[0,1052672], need=2(0),5(1), want_attrs=1)}, complete={69f54b37/burnupi56
34160-393/head//3=read_result_t(r=0, errors={}, attrs=1, returned=(0, 1052672, [2(0),526336]),114a8db7/burnupi5634160-166/head//3=read_result_t(r=0, errors={}, attrs=1, returned=(0, 1052672, [2(0),526336])}, priority=10, obj_to_source={69f54b37/burnupi5634160-393/he
ad//3=2(0),5(1),114a8db7/burnupi5634160-166/head//3=2(0),5(1)}, source_to_obj={2(0)=69f54b37/burnupi5634160-393/head//3,114a8db7/burnupi5634160-166/head//3,5(1)=69f54b37/burnupi5634160-393/head//3,114a8db7/burnupi5634160-166/head//3}, in_progress=5(1))
2014-08-24 04:13:45.843718 7f80a7895700  5 journal submit_entry seq 33344 len 1501 (0x75c6580)
2014-08-24 04:13:45.843720 7f80a208a700  5 filestore(/var/lib/ceph/osd/ceph-2) queue_transactions existing osr(3.37s0 0x4cf6150)/0x4cf6150
2014-08-24 04:13:45.843726 7f80a7895700  5 filestore(/var/lib/ceph/osd/ceph-2) queue_op 0x5b5a870 seq 33344 osr(3.50s2 0x3d8f2c0) 1495 bytes   (queue has 3 ops and 1055978 bytes)
2014-08-24 04:13:45.843735 7f80a7895700 10 journal op_submit_finish 33344
2014-08-24 04:13:45.843744 7f80ac9df700 10 journal op_apply_start 33344 open_ops 1 -> 2
2014-08-24 04:13:45.843746 7f80ac9df700  5 filestore(/var/lib/ceph/osd/ceph-2) _do_op 0x5b5a870 seq 33344 osr(3.50s2 0x3d8f2c0)/0x3d8f2c0 start
2014-08-24 04:13:45.843746 7f80a208a700 10 journal op_submit_start 33345
2014-08-24 04:13:45.843750 7f80ac9df700 10 filestore(/var/lib/ceph/osd/ceph-2) _do_transaction on 0x4057840
2014-08-24 04:13:45.843750 7f80a208a700  5 filestore(/var/lib/ceph/osd/ceph-2) queue_transactions (parallel) 33345 0x4166f40
2014-08-24 04:13:45.843753 7f80a208a700 10 journal op_journal_transactions 33345 0x4166f40
2014-08-24 04:13:45.843756 7f80a208a700  5 journal submit_entry seq 33345 len 39 (0x75c7a80)
2014-08-24 04:13:45.843757 7f80ac9df700 10 filestore(/var/lib/ceph/osd/ceph-2) collection_setattr /var/lib/ceph/osd/ceph-2/current/3.50s2_head 'info' len 1
2014-08-24 04:13:45.843759 7f80a7895700 10 osd.2 407 do_waiters -- start
2014-08-24 04:13:45.843760 7f80a7895700 10 osd.2 407 do_waiters -- finish
2014-08-24 04:13:45.843761 7f80a208a700  5 filestore(/var/lib/ceph/osd/ceph-2) queue_op 0x4d81700 seq 33345 osr(3.37s0 0x4cf6150) 33 bytes   (queue has 3 ops and 1055978 bytes)
2014-08-24 04:13:45.843764 7f80a208a700 10 journal op_submit_finish 33345
2014-08-24 04:13:45.843766 7f80a208a700 10 osd.2 407 dequeue_op 0x44d2ee0 finish

Actions #1

Updated by Loïc Dachary over 9 years ago

  • Assignee set to Loïc Dachary
Actions #2

Updated by Loïc Dachary over 9 years ago

Will re-run a grep on flab.front.sepia.ceph.com on the osd logs of 2014-08-23 and 2014-08-25 because it does not show in the runs started 2014-08-24

Actions #3

Updated by Loïc Dachary over 9 years ago

 for file in  /ceph/teuthology-archive/*2014-08-2[35]*/*/remote/*/log/ceph-osd.*.log.gz ; do echo zgrep _dl_map_object_deps $file ; done

running in a screen on
Actions #4

Updated by Loïc Dachary over 9 years ago

grep running

Actions #5

Updated by Loïc Dachary over 9 years ago

  • Status changed from New to Need More Info

grep in the ceph-osd logs from runs august, 23,24,25 found no match for _dl_map_object_deps . I'm unable to find a lead in the logs from the description. Hopefully the problem will show up again and I'll dig further.

Actions #6

Updated by Loïc Dachary over 9 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF