Project

General

Profile

Actions

Bug #8378

closed

krbd: Kernel oops in rbd_img_obj_callback

Added by Rain Li almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


 Assertion failure in rbd_img_obj_callback() at line 2143:

   rbd_assert(more ^ (which == img_request->obj_request_count));

 ------------[ cut here ]------------
 kernel BUG at /build/buildd/linux-3.11.0/drivers/block/rbd.c:2143!
 invalid opcode: 0000 [#1] SMP 
 Modules linked in: vxlan ip_tunnel cls_u32 ebt_mark ebtable_filter cls_fw ebt_arp ebt_ip vhost_net vhost macvtap macvlan rbd libceph sch_cbq ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nbd 8021q garp mrp vesafb coretemp kvm_intel bridge stp llc kvm joydev hid_generic gpio_ich dcdbas dm_multipath scsi_dh microcode psmouse serio_raw usbhid hid lpc_ich i7core_edac edac_core wmi mac_hid acpi_power_meter bonding lp parport btrfs xor zlib_deflate raid6_pq libcrc32c ses enclosure megaraid_sas bnx2
 CPU: 6 PID: 6340 Comm: kworker/6:4 Tainted: G          I  3.11.0-20-generic #35-Ubuntu
 Hardware name: Dell Inc. PowerEdge R610/0YF3T8, BIOS 6.4.0 07/23/2013
 Workqueue: ceph-msgr con_work [libceph]
 task: ffff88104b6b5dc0 ti: ffff880f72d52000 task.ti: ffff880f72d52000
 RIP: 0010:[<ffffffffa03b66d0>]  [<ffffffffa03b66d0>] rbd_img_obj_callback+0x530/0x540 [rbd]
 RSP: 0018:ffff880f72d53be8  EFLAGS: 00010082
 RAX: 000000000000007b RBX: ffff8807892cd7b0 RCX: 0000000000000000
 RDX: ffff88107fc70000 RSI: ffff88107fc6e428 RDI: 0000000000000046
 RBP: ffff880f72d53c30 R08: 0000000000000000 R09: 0000000000001400
 R10: 0000000000000050 R11: ffffc90017a2bff8 R12: 0000000000000001
 R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88107fc60000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00002b628abb3000 CR3: 0000000001c0e000 CR4: 00000000000027e0
 Stack:
  ffff8807892cd7bc ffffffffa036bb18 ffff8807892cd7e0 ffff8807892cd780
  ffff880793cdbe40 ffff88078a91e8a0 0000000000000000 0000000000000004
  ffff8806ad06a000 ffff880f72d53c48 ffffffffa03b48b7 ffff880793cdbe40
 Call Trace:
  [<ffffffffa036bb18>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
  [<ffffffffa03b48b7>] rbd_obj_request_complete+0x27/0x70 [rbd]
  [<ffffffffa03b7d6f>] rbd_osd_req_callback+0xdf/0x4e0 [rbd]
  [<ffffffffa0378049>] dispatch+0x489/0x8e0 [libceph]
  [<ffffffffa036e98b>] try_read+0x4ab/0x10a0 [libceph]
  [<ffffffffa0370829>] con_work+0xb9/0x630 [libceph]
  [<ffffffff8107d10c>] process_one_work+0x17c/0x430
  [<ffffffff8107e0cc>] worker_thread+0x11c/0x3c0
  [<ffffffff8107dfb0>] ? manage_workers.isra.25+0x2a0/0x2a0
  [<ffffffff810848f0>] kthread+0xc0/0xd0
  [<ffffffff816edea9>] ? schedule+0x29/0x70
  [<ffffffff81084830>] ? kthread_create_on_node+0x120/0x120
  [<ffffffff816f8a6c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81084830>] ? kthread_create_on_node+0x120/0x120
 Code: a0 31 c0 e8 3f cb 32 e1 0f 0b 48 c7 c1 10 cf 3b a0 ba 5f 08 00 00 48 c7 c6 50 e8 3b a0 48 c7 c7 d0 ca 3b a0 31 c0 e8 1c cb 32 e1 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 
 RIP  [<ffffffffa03b66d0>] rbd_img_obj_callback+0x530/0x540 [rbd]
  RSP <ffff880f72d53be8>
 ---[ end trace 260af37d80202466 ]---

Ubuntu Linux Kernel 3.11.0-20 is rebased to Kernel 3.11.10. And I read the issue about BUG #5647
And the kernel call trace looks almost the same, but Kernel 3.11.10 should have BUG #5647 patch.

Out CEPH cluster run on v0.72.2


Related issues 1 (0 open1 closed)

Is duplicate of rbd - Bug #5876: Assertion failure in rbd_img_obj_callback() : rbd_assert(which >= img_request->next_completion);ResolvedIlya Dryomov08/05/2013

Actions
Actions #1

Updated by Ian Colle almost 10 years ago

  • Assignee set to Ilya Dryomov
Actions #2

Updated by Ilya Dryomov almost 10 years ago

Hi Rain,

We have a patch pending that I hope will fix this, but it needs more testing.
I'll notify you when it's ready.

Actions #3

Updated by Rain Li almost 10 years ago

Hi Ilya,

I would like to know some details abount this patch, is this patch will applied to RBD Kernel Module or OSD Server? Could you show me some code about this patch?

Actions #4

Updated by Ilya Dryomov almost 10 years ago

  • Status changed from New to Resolved

Should be fixed by commit 0f2d5be792b0 ("rbd: use reference counts for image requests"), which went into 3.16-rc1.

Actions

Also available in: Atom PDF