https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2011-08-09T11:09:37Z
Ceph
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=5361
2011-08-09T11:09:37Z
Brian Chrisman
brchrisman@gmail.com
<ul></ul><p>It looks like the issue stems from having a bunch of osds going out.<br />I'm not certain why these osds fail, but this brings up the point that in this case, the rbd client and osd servers occupy the same nodes.</p>
<p>[brianchrisman ~] $ grep libceph crashmsgs<br />Aug 9 01:19:14 10.200.98.109 libceph<br />Aug 9 01:30:48 10.200.98.109 libceph<br />Aug 9 01:31:10 10.200.98.109 libceph: osd7 192.168.98.110:6810 socket closed<br />Aug 9 01:31:10 10.200.98.109 libceph: osd7 192.168.98.110:6810 connection failed<br />Aug 9 01:31:11 10.200.98.109 libceph: osd7 192.168.98.110:6810 connection failed<br />Aug 9 01:31:12 10.200.98.109 libceph: osd7 192.168.98.110:6810 connection failed<br />Aug 9 01:31:14 10.200.98.109 libceph: osd7 192.168.98.110:6810 connection failed<br />Aug 9 01:31:18 10.200.98.109 libceph: osd7 192.168.98.110:6810 connection failed<br />Aug 9 01:31:26 10.200.98.109 libceph: osd7 192.168.98.110:6810 connection failed<br />Aug 9 01:31:30 10.200.98.109 libceph: osd7 down<br />Aug 9 01:31:31 10.200.98.109 libceph: get_reply unknown tid 5852851 from osd11<br />Aug 9 01:31:31 10.200.98.109 libceph: get_reply unknown tid 5852850 from osd11<br />Aug 9 01:31:31 10.200.98.109 libceph: get_reply unknown tid 5852849 from osd11<br />Aug 9 01:31:31 10.200.98.109 libceph: get_reply unknown tid 5852848 from osd11<br />Aug 9 01:31:31 10.200.98.109 libceph: get_reply unknown tid 5852847 from osd11<br />Aug 9 01:31:32 10.200.98.109 libceph: get_reply unknown tid 5852846 from osd11<br />Aug 9 01:31:32 10.200.98.109 libceph: get_reply unknown tid 5852845 from osd11<br />Aug 9 01:31:36 10.200.98.109 libceph: get_reply unknown tid 5852841 from osd11<br />Aug 9 01:31:41 10.200.98.109 libceph: get_reply unknown tid 5852840 from osd11<br />Aug 9 01:31:46 10.200.98.109 libceph: get_reply unknown tid 5852839 from osd11<br />Aug 9 01:31:51 10.200.98.109 libceph: get_reply unknown tid 5852838 from osd11<br />Aug 9 01:31:57 10.200.98.109 libceph: get_reply unknown tid 5852837 from osd11<br />Aug 9 01:32:02 10.200.98.109 libceph: get_reply unknown tid 5852836 from osd11<br />Aug 9 01:32:07 10.200.98.109 libceph: get_reply unknown tid 5852835 from osd11<br />Aug 9 01:32:12 10.200.98.109 libceph: get_reply unknown tid 5852834 from osd11<br />Aug 9 01:32:17 10.200.98.109 libceph: get_reply unknown tid 5852833 from osd11<br />Aug 9 01:32:22 10.200.98.109 libceph: get_reply unknown tid 5852832 from osd11<br />Aug 9 01:32:27 10.200.98.109 libceph: get_reply unknown tid 5852831 from osd11<br />Aug 9 01:32:32 10.200.98.109 libceph: tid 5852892 timed out on osd11, will reset osd<br />Aug 9 01:34:27 10.200.98.109 libceph: tid 5862269 timed out on osd8, will reset osd<br />Aug 9 01:34:32 10.200.98.109 libceph: tid 5862490 timed out on osd6, will reset osd<br />Aug 9 01:34:32 10.200.98.109 libceph: tid 5862853 timed out on osd0, will reset osd<br />Aug 9 01:35:27 10.200.98.109 libceph: tid 5862876 timed out on osd8, will reset osd<br />Aug 9 01:36:27 10.200.98.109 libceph: tid 5862269 timed out on osd8, will reset osd<br />Aug 9 01:36:31 10.200.98.109 libceph: osd7 weight 0x0 (out)</p>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=5362
2011-08-09T11:30:19Z
Brian Chrisman
brchrisman@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/284/objdump_libceph_ko">objdump_libceph_ko</a> added</li></ul>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=5418
2011-08-15T09:53:18Z
Sage Weil
sage@newdream.net
<ul><li><strong>Assignee</strong> set to <i>Sage Weil</i></li><li><strong>Target version</strong> set to <i>v0.34</i></li></ul>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=5523
2011-08-20T11:07:56Z
Sage Weil
sage@newdream.net
<ul><li><strong>Target version</strong> changed from <i>v0.34</i> to <i>v0.35</i></li><li><strong>translation missing: en.field_position</strong> set to <i>32</i></li></ul>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=5667
2011-08-28T21:46:31Z
Sage Weil
sage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul><p>Need to set up a teuthology job with rbd + thrasher and a suitable long-running workload.</p>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=6107
2011-09-06T21:49:16Z
Sage Weil
sage@newdream.net
<ul><li><strong>Target version</strong> changed from <i>v0.35</i> to <i>v0.36</i></li></ul>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=6340
2011-09-10T21:04:59Z
Sage Weil
sage@newdream.net
<ul></ul><p>Martin Mailand is also hitting this (see ceph-devel):<br /><pre>
[ 182.721180] libceph: osd2 192.168.42.114:6800 socket closed
[ 182.732642] libceph: osd2 192.168.42.114:6800 connection failed
[ 183.040233] libceph: osd2 192.168.42.114:6800 connection failed
[ 184.040204] libceph: osd2 192.168.42.114:6800 connection failed
[ 186.040244] libceph: osd2 192.168.42.114:6800 connection failed
[ 190.060233] libceph: osd2 192.168.42.114:6800 connection failed
[ 198.060214] libceph: osd2 192.168.42.114:6800 connection failed
[ 213.964994] ------------[ cut here ]------------
[ 213.974288] kernel BUG at net/ceph/messenger.c:2193!
[ 213.974470] invalid opcode: 0000 [#1] SMP
[ 213.974470] CPU 0
[ 213.974470] Modules linked in: rbd libceph libcrc32c ip6table_filter
ip6_tables iptable_filter ip_tables x_tables nv_tco bridge stp kvm_amd kvm
radeon lp psmouse shpchp parport i2c_nforce2 amd64_edac_mod ttm drm_kms_helper
drm edac_core i2c_algo_bit edac_mce_amd serio_raw k10temp ses enclosure aacraid
forcedeth
[ 213.974470]
[ 213.974470] Pid: 10, comm: kworker/0:1 Not tainted 3.1.0-rc5-custom #3
Supermicro H8DM8-2/H8DM8-2
[ 213.974470] RIP: 0010:[<ffffffffa02cf3f1>] [<ffffffffa02cf3f1>]
ceph_con_send+0x111/0x120 [libceph]
[ 213.974470] RSP: 0018:ffff880405cddbd0 EFLAGS: 00010283
[ 213.974470] RAX: ffff880403e93c78 RBX: ffff880803f97030 RCX: ffff8808034d2e50
[ 213.974470] RDX: ffff880405cddfd8 RSI: ffff880403e93c00 RDI: ffff880803f971a8
[ 213.974470] RBP: ffff880405cddbf0 R08: ffff88040fc0de40 R09: 000000000000fffb
[ 213.974470] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880803f971a8
[ 213.974470] R13: ffff880403e93c00 R14: ffff8808034d2e60 R15: ffff8808034d2e50
[ 213.974470] FS: 00007f5909978720(0000) GS:ffff88040fc00000(0000)
knlGS:0000000000000000
[ 213.974470] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 213.974470] CR2: ffffffffff600400 CR3: 0000000404e6f000 CR4: 00000000000006f0
[ 213.974470] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 213.974470] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 213.974470] Process kworker/0:1 (pid: 10, threadinfo ffff880405cdc000, task
ffff880405cb5bc0)
[ 213.974470] Stack:
[ 213.974470] ffff880405cddbf0 ffff880403e0ac00 ffff8808034d2e30
ffff8808034d2da8
[ 213.974470] ffff880405cddc40 ffffffffa02d490d ffff8808034d2c80
ffff8808034d2e00
[ 213.974470] ffff880405cddc40 ffff8804041d1c91 ffff8808034d2da8
0000000000000000
[ 213.974470] Call Trace:
[ 213.974470] [<ffffffffa02d490d>] send_queued+0xed/0x130 [libceph]
[ 213.974470] [<ffffffffa02d6d91>] ceph_osdc_handle_map+0x261/0x3b0 [libceph]
[ 213.974470] [<ffffffffa02d331f>] dispatch+0x10f/0x580 [libceph]
[ 213.974470] [<ffffffffa02d154f>] con_work+0x214f/0x21d0 [libceph]
[ 213.974470] [<ffffffffa02cf400>] ? ceph_con_send+0x120/0x120 [libceph]
[ 213.974470] [<ffffffff8108110d>] process_one_work+0x11d/0x430
[ 213.974470] [<ffffffff81081c69>] worker_thread+0x169/0x360
[ 213.974470] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240
[ 213.974470] [<ffffffff81086496>] kthread+0x96/0xa0
[ 213.974470] [<ffffffff815e5bb4>] kernel_thread_helper+0x4/0x10
[ 213.974470] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0
[ 213.974470] [<ffffffff815e5bb0>] ? gs_change+0x13/0x13
[ 213.974470] Code: 65 f0 4c 8b 6d f8 c9 c3 66 90 48 8d be 88 00 00 00 48 c7 c6
70 18 2d a0 e8 dd 2c 01 e1 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b 0f
0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57
[ 213.974470] RIP [<ffffffffa02cf3f1>] ceph_con_send+0x111/0x120 [libceph]
[ 213.974470] RSP <ffff880405cddbd0>
[ 214.640753] ---[ end trace 837698aee31a73fc ]---
</pre></p>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=6341
2011-09-10T21:05:34Z
Sage Weil
sage@newdream.net
<ul><li><strong>Subject</strong> changed from <i>RBD messenger error</i> to <i>libceph: crash on resending osd ops</i></li></ul>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=6342
2011-09-10T21:05:46Z
Sage Weil
sage@newdream.net
<ul><li><strong>Subject</strong> changed from <i>libceph: crash on resending osd ops</i> to <i>kclient: crash on resending osd ops</i></li></ul>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=6365
2011-09-15T15:51:30Z
Sage Weil
sage@newdream.net
<ul></ul><p>Maybe same crash, hit by Martin Mailand on ceph-devel: <a class="external" href="http://pastebin.com/9CNJk0Pw">http://pastebin.com/9CNJk0Pw</a></p>
Ceph - Bug #1382: kclient: crash on resending osd ops
https://tracker.ceph.com/issues/1382?journal_id=6377
2011-09-16T21:39:02Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul>