Project

General

Profile

Actions

Bug #8859

closed

krbd crash while serving linux-lio iscsi: rbd_assert(img_request != NULL);

Added by Walter Huf almost 10 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have Linux-HA configuring a pair of nodes to make highly-available iSCSI targets with Linux-LIO, and so it maps the RBD on all the nodes and has a roving iSCSI resource that wanders around the cluster.
Sometime every night, the node that is hosting the iSCSI resource kernel panics and shows a stack trace, with varying degrees of hard death. This morning, the node was able to log some messages to kern.log, and so I've attached them to this report.

The Ceph cluster is running Emperor (0.72.2-1precise). The iSCSI target hosts are Ubuntu 12.04, and I've tried a few different kernels to see if it alleviated issues. This report is from the server running the Ubuntu 12.04 package linux-image-generic-lts-trusty 3.13.0.30.26, which has installed kernel version 3.13.0-30.


Files

kern.20140717.log (1.76 MB) kern.20140717.log Kernel log of the crash Walter Huf, 07/17/2014 07:10 AM
Actions #1

Updated by Ian Colle over 9 years ago

  • Project changed from Ceph to rbd
Actions #2

Updated by Ilya Dryomov over 9 years ago

  • Status changed from New to Closed
Jul 17 12:00:31 cephmon2 kernel: [251372.911335] Assertion failure in rbd_img_obj_callback() at line 2127:
Jul 17 12:00:31 cephmon2 kernel: [251372.911335] 
Jul 17 12:00:31 cephmon2 kernel: [251372.911335]     rbd_assert(img_request != NULL);
Jul 17 12:00:31 cephmon2 kernel: [251372.911335] 
Jul 17 12:00:31 cephmon2 kernel: [251373.078772] ------------[ cut here ]------------
Jul 17 12:00:35 cephmon2 kernel: [251373.110912] kernel BUG at /build/buildd/linux-lts-trusty-3.13.0/drivers/block/rbd.c:2127!
Jul 17 12:00:35 cephmon2 kernel: [251373.175242] invalid opcode: 0000 [#1] SMP 
Jul 17 12:00:35 cephmon2 kernel: [251373.207086] Modules linked in: ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ib_srpt ib_cm ib_sa ib_mad ib_core ib_addr tcm_loop tcm_fc libfc tcm_qla2xxx qla2xxx scsi_transport_fc scsi_tgt iscsi_target_mod target_core_pscsi target_core_file target_core_iblock rbd ceph libceph fscache xt_multiport iptable_filter target_core_mod configfs xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables x_tables ip_vs_rr ip_vs nf_conntrack bonding 8021q mrp garp stp llc mei_me gpio_ich wmi sb_edac mei edac_core dcdbas ipmi_si joydev lpc_ich shpchp mac_hid lp acpi_power_meter parport ses enclosure hid_generic ixgbe btrfs tg3 dca usbhid ahci raid6_pq ptp libahci hid megaraid_sas pps_core mdio xor libcrc32c
Jul 17 12:00:35 cephmon2 kernel: [251373.569884] CPU: 2 PID: 5329 Comm: kworker/2:4 Not tainted 3.13.0-30-generic #55~precise1-Ubuntu
Jul 17 12:00:35 cephmon2 kernel: [251373.657892] Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.4.6 10/26/2012
Jul 17 12:00:35 cephmon2 kernel: [251373.747093] Workqueue: ceph-msgr con_work [libceph]
Jul 17 12:00:35 cephmon2 kernel: [251373.792127] task: ffff8801159817f0 ti: ffff880222cc2000 task.ti: ffff880222cc2000
Jul 17 12:00:35 cephmon2 kernel: [251373.883542] RIP: 0010:[<ffffffffa03fd1c8>]  [<ffffffffa03fd1c8>] rbd_img_obj_callback+0x308/0x380 [rbd]
Jul 17 12:00:35 cephmon2 kernel: [251373.982705] RSP: 0018:ffff880222cc3b58  EFLAGS: 00010296
Jul 17 12:00:35 cephmon2 kernel: [251374.034718] RAX: 000000000000005e RBX: ffff88008aa96420 RCX: 0000000000000000
Jul 17 12:00:35 cephmon2 kernel: [251374.138819] RDX: ffff880227c2fff0 RSI: ffff880227c2e3d8 RDI: 0000000000000246
Jul 17 12:00:35 cephmon2 kernel: [251374.245904] RBP: ffff880222cc3b98 R08: 0000000000000000 R09: 0000000000000001
Jul 17 12:00:35 cephmon2 kernel: [251374.357428] R10: 0000000000002dd3 R11: 0000000000005000 R12: 0000000000000000
Jul 17 12:00:35 cephmon2 kernel: [251374.470093] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000004
Jul 17 12:00:35 cephmon2 kernel: [251374.584815] FS:  0000000000000000(0000) GS:ffff880227c20000(0000) knlGS:0000000000000000
Jul 17 12:00:35 cephmon2 kernel: [251374.699986] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 17 12:00:35 cephmon2 kernel: [251374.757584] CR2: 000000000690af80 CR3: 0000000001c0d000 CR4: 00000000000407e0
Jul 17 12:00:35 cephmon2 kernel: [251374.870691] Stack:
Jul 17 12:00:35 cephmon2 kernel: [251374.925257]  ffffffffa03c3de1 ffff88008aa96420 ffff880012a468e8 ffff88008aa96420
Jul 17 12:00:35 cephmon2 kernel: [251375.035211]  ffff88010b921b40 0000000000f76a4e 0000000000000000 0000000000000004
Jul 17 12:00:35 cephmon2 kernel: [251375.145303]  ffff880222cc3bb8 ffffffffa03f779b 0000000000f76a4e ffff88008aa96420
Jul 17 12:00:35 cephmon2 kernel: [251375.255683] Call Trace:
Jul 17 12:00:35 cephmon2 kernel: [251375.309421]  [<ffffffffa03c3de1>] ? ceph_msg_revoke+0xc1/0x1b0 [libceph]
Jul 17 12:00:35 cephmon2 kernel: [251375.364732]  [<ffffffffa03f779b>] rbd_obj_request_complete+0x2b/0x80 [rbd]
Jul 17 12:00:35 cephmon2 kernel: [251375.419643]  [<ffffffffa03fc3e8>] rbd_osd_req_callback+0xd8/0x2e0 [rbd]
Jul 17 12:00:35 cephmon2 kernel: [251375.473600]  [<ffffffffa03cb7a2>] handle_reply.isra.31+0x3f2/0x670 [libceph]
Jul 17 12:00:35 cephmon2 kernel: [251375.526999]  [<ffffffffa03cc0d5>] dispatch+0xa5/0xc0 [libceph]
Jul 17 12:00:35 cephmon2 kernel: [251375.579332]  [<ffffffffa03beaf5>] process_message+0x95/0x190 [libceph]
Jul 17 12:00:35 cephmon2 kernel: [251375.631561]  [<ffffffffa03c2ce0>] ? read_partial_message+0x170/0x4e0 [libceph]
Jul 17 12:00:35 cephmon2 kernel: [251375.733733]  [<ffffffff811181ac>] ? acct_account_cputime+0x1c/0x20
Jul 17 12:00:35 cephmon2 kernel: [251375.785501]  [<ffffffff810a264e>] ? account_system_time+0xae/0x1a0
Jul 17 12:00:35 cephmon2 kernel: [251375.836164]  [<ffffffff8101d043>] ? native_sched_clock+0x13/0x80
Jul 17 12:00:35 cephmon2 kernel: [251375.885883]  [<ffffffffa03c3329>] try_read+0x2d9/0x5a0 [libceph]
Jul 17 12:00:35 cephmon2 kernel: [251375.934783]  [<ffffffffa03c3921>] con_work+0x91/0x290 [libceph]
Jul 17 12:00:35 cephmon2 kernel: [251375.982529]  [<ffffffff810877ff>] process_one_work+0x17f/0x4c0
Jul 17 12:00:35 cephmon2 kernel: [251376.029389]  [<ffffffff81088a2b>] worker_thread+0x11b/0x3d0
Jul 17 12:00:35 cephmon2 kernel: [251376.075157]  [<ffffffff81088910>] ? manage_workers.isra.21+0x190/0x190
Jul 17 12:00:35 cephmon2 kernel: [251376.120418]  [<ffffffff8108f9a9>] kthread+0xc9/0xe0
Jul 17 12:00:35 cephmon2 kernel: [251376.164350]  [<ffffffff8108f8e0>] ? flush_kthread_worker+0xb0/0xb0
Jul 17 12:00:35 cephmon2 kernel: [251376.207746]  [<ffffffff8176667c>] ret_from_fork+0x7c/0xb0
Jul 17 12:00:35 cephmon2 kernel: [251376.250553]  [<ffffffff8108f8e0>] ? flush_kthread_worker+0xb0/0xb0
Jul 17 12:00:35 cephmon2 kernel: [251376.293321] Code: a0 31 c0 e8 d1 7f 34 e1 0f 0b 48 c7 c1 db f6 3f a0 ba 4f 08 00 00 48 c7 c6 d0 15 40 a0 48 c7 c7 08 fc 3f a0 31 c0 e8 ae 7f 34 e1 <0f> 0b 49 89 f8 4c 89 e1 48 c7 c2 d0 15 40 a0 48 c7 c6 f8 f7 3f 
Jul 17 12:00:35 cephmon2 kernel: [251376.424913] RIP  [<ffffffffa03fd1c8>] rbd_img_obj_callback+0x308/0x380 [rbd]
Jul 17 12:00:35 cephmon2 kernel: [251376.468753]  RSP <ffff880222cc3b58>
Jul 17 12:00:35 cephmon2 kernel: [251376.577152] ---[ end trace c439f5118d695808 ]---

This bug was (mostly) fixed in 3.16-rc1, commit 0f2d5be792b0 ("rbd: use reference counts for image requests").

Actions

Also available in: Atom PDF