Project

General

Profile

Actions

Bug #5876

closed

Assertion failure in rbd_img_obj_callback() : rbd_assert(which >= img_request->next_completion);

Added by Olivier Bonvalet over 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

with CuttleFish with RBD kernel client (from Linux 3.9.11), I have this kernel BUG :

Aug  3 03:51:35 murmillia kernel: [772458.641942] Assertion failure in rbd_img_obj_callback() at line 1708:
Aug  3 03:51:35 murmillia kernel: [772458.641942] 
Aug  3 03:51:35 murmillia kernel: [772458.641942]     rbd_assert(which >= img_request->next_completion);
Aug  3 03:51:35 murmillia kernel: [772458.641942] 
Aug  3 03:51:35 murmillia kernel: [772458.642022] ------------[ cut here ]------------
Aug  3 03:51:35 murmillia kernel: [772458.642038] kernel BUG at drivers/block/rbd.c:1708!
Aug  3 03:51:35 murmillia kernel: [772458.642054] invalid opcode: 0000 [#1] SMP 
Aug  3 03:51:35 murmillia kernel: [772458.642103] Modules linked in: xt_physdev iptable_filter ip_tables x_tables cbc rbd libceph libcrc32c loop xen_gntdev bridge coretemp ghash_clmulni_intel aesni_intel aes_x86_64 xts lrw gf128mul ablk_helper cryptd iTCO_wdt gpio_ich iTCO_vendor_support microcode serio_raw sb_edac edac_core evdev i2c_i801 lpc_ich mfd_core ioatdma shpchp wmi ac button dm_mod hid_generic usbhid hid sg sd_mod crc_t10dif crc32c_intel isci megaraid_sas ahci libsas libahci ehci_pci libata ehci_hcd scsi_transport_sas usbcore scsi_mod usb_common igb i2c_algo_bit i2c_core ixgbe dca ptp pps_core mdio
Aug  3 03:51:35 murmillia kernel: [772458.642687] CPU 2 
Aug  3 03:51:35 murmillia kernel: [772458.642698] Pid: 20090, comm: kworker/2:2 Not tainted 3.9-dae-dom0 #1 Supermicro X9DRW-7TPF+/X9DRW-7TPF+
Aug  3 03:51:35 murmillia kernel: [772458.642794] RIP: e030:[<ffffffffa020f1d3>]  [<ffffffffa020f1d3>] rbd_img_obj_callback+0x103/0x29a [rbd]
Aug  3 03:51:35 murmillia kernel: [772458.642876] RSP: e02b:ffff880015e43cf8  EFLAGS: 00010282
Aug  3 03:51:35 murmillia kernel: [772458.642916] RAX: 0000000000000070 RBX: ffff88001586f5c0 RCX: 0000000000000000
Aug  3 03:51:35 murmillia kernel: [772458.642982] RDX: ffff88003f84e8f0 RSI: ffff88003f84dea8 RDI: ffff880015e402b8
Aug  3 03:51:35 murmillia kernel: [772458.643048] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
Aug  3 03:51:35 murmillia kernel: [772458.643113] R10: 0000000000000000 R11: 00000000000002d5 R12: ffff880010ca7540
Aug  3 03:51:35 murmillia kernel: [772458.643179] R13: ffff88001c49e030 R14: 0000000000000000 R15: ffff880001c78720
Aug  3 03:51:35 murmillia kernel: [772458.643247] FS:  00007f4764e28700(0000) GS:ffff88003f840000(0000) knlGS:0000000000000000
Aug  3 03:51:35 murmillia kernel: [772458.643314] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  3 03:51:35 murmillia kernel: [772458.643356] CR2: 00007f4764e319b8 CR3: 000000000160c000 CR4: 0000000000042660
Aug  3 03:51:35 murmillia kernel: [772458.643421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug  3 03:51:35 murmillia kernel: [772458.643485] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug  3 03:51:35 murmillia kernel: [772458.643551] Process kworker/2:2 (pid: 20090, threadinfo ffff880015e42000, task ffff880015e44fe0)
Aug  3 03:51:35 murmillia kernel: [772458.643618] Stack:
Aug  3 03:51:35 murmillia kernel: [772458.643649]  000000000000a000 ffff88001486b480 ffff8800360d9400 ffff8800360d9408
Aug  3 03:51:35 murmillia kernel: [772458.643730]  ffff88001c49e030 0000000000000000 ffff880001c78720 ffffffffa029fd72
Aug  3 03:51:35 murmillia kernel: [772458.643811]  0000000000000015 ffff880001c78778 0020917c00000001 0001da5e00000000
Aug  3 03:51:35 murmillia kernel: [772458.643893] Call Trace:
Aug  3 03:51:35 murmillia kernel: [772458.643933]  [<ffffffffa029fd72>] ? dispatch+0x424/0x590 [libceph]
Aug  3 03:51:35 murmillia kernel: [772458.643980]  [<ffffffffa029ac2f>] ? con_work+0x104d/0x1d05 [libceph]
Aug  3 03:51:35 murmillia kernel: [772458.644026]  [<ffffffff8100721f>] ? __switch_to+0x13e/0x3c0
Aug  3 03:51:35 murmillia kernel: [772458.644069]  [<ffffffff8104b2a6>] ? mmdrop+0xd/0x1c
Aug  3 03:51:35 murmillia kernel: [772458.644109]  [<ffffffff8104be3a>] ? finish_task_switch+0x50/0x8a
Aug  3 03:51:35 murmillia kernel: [772458.644152]  [<ffffffff810410ce>] ? process_one_work+0x156/0x208
Aug  3 03:51:35 murmillia kernel: [772458.644195]  [<ffffffff810427a0>] ? worker_thread+0x114/0x1bb
Aug  3 03:51:35 murmillia kernel: [772458.644237]  [<ffffffff8104268c>] ? manage_workers+0x202/0x202
Aug  3 03:51:35 murmillia kernel: [772458.644279]  [<ffffffff81045711>] ? kthread+0x7d/0x85
Aug  3 03:51:35 murmillia kernel: [772458.644319]  [<ffffffff81045694>] ? __kthread_parkme+0x59/0x59
Aug  3 03:51:35 murmillia kernel: [772458.644364]  [<ffffffff81356e3c>] ? ret_from_fork+0x7c/0xb0
Aug  3 03:51:35 murmillia kernel: [772458.644405]  [<ffffffff81045694>] ? __kthread_parkme+0x59/0x59
Aug  3 03:51:35 murmillia kernel: [772458.644447] Code: d7 13 e1 0f 0b 3b 6b 34 73 23 48 c7 c1 4a 34 21 a0 ba ac 06 00 00 31 c0 48 c7 c6 b0 3f 21 a0 48 c7 c7 bf 30 21 a0 e8 4e d7 13 e1 <0f> 0b 4c 8d 73 30 41 b5 01 4c 89 f7 e8 f9 2d 14 e1 3b 6b 34 0f 
Aug  3 03:51:35 murmillia kernel: [772458.644851] RIP  [<ffffffffa020f1d3>] rbd_img_obj_callback+0x103/0x29a [rbd]
Aug  3 03:51:35 murmillia kernel: [772458.644901]  RSP <ffff880015e43cf8>
Aug  3 03:51:35 murmillia kernel: [772458.645328] ---[ end trace 2a2a66811d33dc9e ]---

followed by :

Aug  3 03:51:35 murmillia kernel: [772458.648317] BUG: unable to handle kernel paging request at ffffffffffffffd8
Aug  3 03:52:45 murmillia kernel: [772458.648485] IP: [<ffffffff81045a20>] kthread_data+0x7/0xc
Aug  3 03:52:45 murmillia kernel: [772458.648611] PGD 160f067 PUD 1611067 PMD 0 
Aug  3 03:52:45 murmillia kernel: [772458.648812] Oops: 0000 [#2] SMP 
Aug  3 03:52:45 murmillia kernel: [772458.648967] Modules linked in: xt_physdev iptable_filter ip_tables x_tables cbc rbd libceph libcrc32c loop xen_gntdev bridge coretemp ghash_clmulni_intel aesni_intel aes_x86_64 xts lrw gf128mul ablk_helper cryptd iTCO_wdt gpio_ich iTCO_vendor_support microcode serio_raw sb_edac edac_core evdev i2c_i801 lpc_ich mfd_core ioatdma shpchp wmi ac button dm_mod hid_generic usbhid hid sg sd_mod crc_t10dif crc32c_intel isci megaraid_sas ahci libsas libahci ehci_pci libata ehci_hcd scsi_transport_sas usbcore scsi_mod usb_common igb i2c_algo_bit i2c_core ixgbe dca ptp pps_core mdio
Aug  3 03:52:45 murmillia kernel: [772458.652063] CPU 2 
Aug  3 03:52:45 murmillia kernel: [772458.652125] Pid: 20090, comm: kworker/2:2 Tainted: G      D      3.9-dae-dom0 #1 Supermicro X9DRW-7TPF+/X9DRW-7TPF+
Aug  3 03:52:45 murmillia kernel: [772458.652307] RIP: e030:[<ffffffff81045a20>]  [<ffffffff81045a20>] kthread_data+0x7/0xc
Aug  3 03:52:45 murmillia kernel: [772458.652458] RSP: e02b:ffff880015e43ab0  EFLAGS: 00010002
Aug  3 03:52:45 murmillia kernel: [772458.652535] RAX: 0000000000000000 RBX: ffff88003f852b00 RCX: ffff88003f852b70
Aug  3 03:52:45 murmillia kernel: [772458.652637] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff880015e44fe0
Aug  3 03:52:45 murmillia kernel: [772458.652743] RBP: 0000000000000002 R08: ffffffff817b5910 R09: 0000000000000002
Aug  3 03:52:45 murmillia kernel: [772458.652850] R10: 000000000000b7ec R11: ffff880015e44fe0 R12: ffff880015e45300
Aug  3 03:52:45 murmillia kernel: [772458.652964] R13: ffff88003a349510 R14: 0000000000000002 R15: ffff880015e44fd0
Aug  3 03:52:45 murmillia kernel: [772458.653073] FS:  00007f4764e28700(0000) GS:ffff88003f840000(0000) knlGS:0000000000000000
Aug  3 03:52:45 murmillia kernel: [772458.653176] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  3 03:52:45 murmillia kernel: [772458.653261] CR2: ffffffffffffffd8 CR3: 000000000160c000 CR4: 0000000000042660
Aug  3 03:52:45 murmillia kernel: [772458.653370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug  3 03:52:45 murmillia kernel: [772458.653478] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug  3 03:52:45 murmillia kernel: [772458.653580] Process kworker/2:2 (pid: 20090, threadinfo ffff880015e42000, task ffff880015e44fe0)
Aug  3 03:52:45 murmillia kernel: [772458.653686] Stack:
Aug  3 03:52:45 murmillia kernel: [772458.653755]  ffffffff810428a1 ffff88003f852b00 ffff880015e44fe0 ffffffff81351294
Aug  3 03:52:45 murmillia kernel: [772458.654031]  0000000000012b00 ffff880015e43fd8 ffff880015e43fd8 ffff880015e44fe0
Aug  3 03:52:45 murmillia kernel: [772458.654315]  ffff880015e45520 0000000000000001 ffff88003a349510 ffff880015e45208
Aug  3 03:52:45 murmillia kernel: [772458.654585] Call Trace:
Aug  3 03:52:45 murmillia kernel: [772458.654660]  [<ffffffff810428a1>] ? wq_worker_sleeping+0x9/0x58
Aug  3 03:52:45 murmillia kernel: [772458.654743]  [<ffffffff81351294>] ? __schedule+0x109/0x47c
Aug  3 03:52:45 murmillia kernel: [772458.654823]  [<ffffffff81032c9b>] ? do_exit+0x8e2/0x8e4
Aug  3 03:52:45 murmillia kernel: [772458.654901]  [<ffffffff81352fd8>] ? oops_end+0x96/0x99
Aug  3 03:52:45 murmillia kernel: [772458.654987]  [<ffffffff8100864f>] ? do_invalid_op+0x84/0x8b
Aug  3 03:52:45 murmillia kernel: [772458.655077]  [<ffffffffa020f1d3>] ? rbd_img_obj_callback+0x103/0x29a [rbd]
Aug  3 03:52:45 murmillia kernel: [772458.655170]  [<ffffffff81005952>] ? check_events+0x12/0x20
Aug  3 03:52:45 murmillia kernel: [772458.655249]  [<ffffffff8100593f>] ? xen_restore_fl_direct_reloc+0x4/0x4
Aug  3 03:52:45 murmillia kernel: [772458.655332]  [<ffffffff8102f3d9>] ? arch_local_irq_restore+0x7/0x8
Aug  3 03:52:45 murmillia kernel: [772458.655412]  [<ffffffff81030ba8>] ? vprintk_emit+0x364/0x388
Aug  3 03:52:45 murmillia kernel: [772458.655492]  [<ffffffff8135801e>] ? invalid_op+0x1e/0x30
Aug  3 03:52:45 murmillia kernel: [772458.655572]  [<ffffffffa020f1d3>] ? rbd_img_obj_callback+0x103/0x29a [rbd]
Aug  3 03:52:45 murmillia kernel: [772458.655657]  [<ffffffffa020f1d3>] ? rbd_img_obj_callback+0x103/0x29a [rbd]
Aug  3 03:52:45 murmillia kernel: [772458.655749]  [<ffffffffa029fd72>] ? dispatch+0x424/0x590 [libceph]
Aug  3 03:52:45 murmillia kernel: [772458.655843]  [<ffffffffa029ac2f>] ? con_work+0x104d/0x1d05 [libceph]
Aug  3 03:52:45 murmillia kernel: [772458.655938]  [<ffffffff8100721f>] ? __switch_to+0x13e/0x3c0
Aug  3 03:52:45 murmillia kernel: [772458.656022]  [<ffffffff8104b2a6>] ? mmdrop+0xd/0x1c
Aug  3 03:52:45 murmillia kernel: [772458.656099]  [<ffffffff8104be3a>] ? finish_task_switch+0x50/0x8a
Aug  3 03:52:45 murmillia kernel: [772458.656182]  [<ffffffff810410ce>] ? process_one_work+0x156/0x208
Aug  3 03:52:45 murmillia kernel: [772458.656266]  [<ffffffff810427a0>] ? worker_thread+0x114/0x1bb
Aug  3 03:52:45 murmillia kernel: [772458.656347]  [<ffffffff8104268c>] ? manage_workers+0x202/0x202
Aug  3 03:52:45 murmillia kernel: [772458.656427]  [<ffffffff81045711>] ? kthread+0x7d/0x85
Aug  3 03:52:45 murmillia kernel: [772458.656504]  [<ffffffff81045694>] ? __kthread_parkme+0x59/0x59
Aug  3 03:52:45 murmillia kernel: [772458.656585]  [<ffffffff81356e3c>] ? ret_from_fork+0x7c/0xb0
Aug  3 03:52:45 murmillia kernel: [772458.656663]  [<ffffffff81045694>] ? __kthread_parkme+0x59/0x59
Aug  3 03:52:45 murmillia kernel: [772458.656741] Code: 78 5b 5d 41 5c 41 5d c3 65 48 8b 04 25 80 c7 00 00 48 8b 80 c8 02 00 00 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 48 8b 87 c8 02 00 00 <48> 8b 40 d8 c3 65 48 8b 04 25 80 c7 00 00 48 8b b8 c8 02 00 00 
Aug  3 03:52:45 murmillia kernel: [772458.659681] RIP  [<ffffffff81045a20>] kthread_data+0x7/0xc
Aug  3 03:52:45 murmillia kernel: [772458.659805]  RSP <ffff880015e43ab0>
Aug  3 03:52:45 murmillia kernel: [772458.659877] CR2: ffffffffffffffd8
Aug  3 03:52:45 murmillia kernel: [772458.659950] ---[ end trace 2a2a66811d33dc9f ]---
Aug  3 03:52:45 murmillia kernel: [772458.661770] Fixing recursive fault but reboot is needed!


Files

rbd.patch (12.5 KB) rbd.patch Olivier Bonvalet, 10/28/2013 04:42 AM

Related issues 2 (0 open2 closed)

Has duplicate rbd - Bug #7125: Assertion failure in rbd_img_obj_callback() ResolvedIlya Dryomov01/09/2014

Actions
Has duplicate rbd - Bug #8378: krbd: Kernel oops in rbd_img_obj_callbackResolvedIlya Dryomov05/16/2014

Actions
Actions #1

Updated by Sage Weil over 10 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Sage Weil over 10 years ago

  • Status changed from New to Resolved
Actions #3

Updated by Olivier Bonvalet over 10 years ago

@Sage Weil : in which kernel version can I found this fix please ?

Actions #4

Updated by Sage Weil over 10 years ago

  • Status changed from Resolved to Pending Backport

the fixes just went upstream to linus's tree and are not in a released kernel yet. as soon as they appear there i will send them to stable@ for the next stable kernel updates.

Actions #5

Updated by Olivier Bonvalet over 10 years ago

Hi,

I had this bug just now (see below), with a 3.10.16 kernel, with patches from your Git (cf attached file), and Ceph 0.67.4.

I did something wrong ?

Oct 28 12:14:26 rurkh kernel: [703845.331581] Assertion failure in rbd_img_obj_callback() at line 2125:
Oct 28 12:14:26 rurkh kernel: [703845.331581]
Oct 28 12:14:26 rurkh kernel: [703845.331581] rbd_assert(which >= img_request->next_completion);
Oct 28 12:14:26 rurkh kernel: [703845.331581]
Oct 28 12:14:26 rurkh kernel: [703845.331924] ------------[ cut here ]------------
Oct 28 12:14:26 rurkh kernel: [703845.331964] kernel BUG at drivers/block/rbd.c:2125!
Oct 28 12:14:26 rurkh kernel: [703845.332003] invalid opcode: 0000 [#1] SMP
Oct 28 12:14:26 rurkh kernel: [703845.332047] Modules linked in: cbc rbd libceph xen_gntdev xt_physdev iptable_filter ip_tables x_tables xfs
libcrc32c bridge loop coretemp ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd gpio_ich iTCO_wdt iTC
O_vendor_support microcode serio_raw sb_edac evdev edac_core i2c_i801 lpc_ich mfd_core ioatdma shpchp wmi ac button dm_mod hid_generic usbhi
d hid sg sd_mod crc_t10dif crc32c_intel isci ahci megaraid_sas libsas libahci ehci_pci ehci_hcd libata usbcore scsi_transport_sas igb i2c_al
go_bit scsi_mod usb_common ixgbe i2c_core dca ptp pps_core mdio
Oct 28 12:14:26 rurkh kernel: [703845.332611] CPU: 0 PID: 24901 Comm: kworker/0:1 Not tainted 3.10-dae-dom0 #1
Oct 28 12:14:26 rurkh kernel: [703845.332653] Hardware name: Supermicro X9DRW-7TPF+/X9DRW-7TPF+, BIOS 2.0a 03/11/2013
Oct 28 12:14:26 rurkh kernel: [703845.332723] Workqueue: ceph-msgr con_work [libceph]
Oct 28 12:14:26 rurkh kernel: [703845.332763] task: ffff8804408f27c0 ti: ffff8804675ba000 task.ti: ffff8804675ba000
Oct 28 12:14:26 rurkh kernel: [703845.332825] RIP: e030:[<ffffffffa021a3c7>] [<ffffffffa021a3c7>] rbd_img_obj_callback+0x10b/0x3cb [rbd]
Oct 28 12:14:26 rurkh kernel: [703845.332897] RSP: e02b:ffff8804675bbcf8 EFLAGS: 00010282
Oct 28 12:14:26 rurkh kernel: [703845.332938] RAX: 0000000000000070 RBX: ffff88046226e608 RCX: 0000000000000000
Oct 28 12:14:26 rurkh kernel: [703845.333000] RDX: ffff88047dc0e8c0 RSI: ffff88047dc0de68 RDI: ffff8804675b02b8
Oct 28 12:14:26 rurkh kernel: [703845.333061] RBP: ffff8804657a2e48 R08: 0000000000000000 R09: 0000000000000000
Oct 28 12:14:26 rurkh kernel: [703845.333123] R10: 0000000000000000 R11: 000000000000013b R12: 0000000000000001
Oct 28 12:14:26 rurkh kernel: [703845.333184] R13: ffff88045b10ec25 R14: 0000000000000000 R15: ffff88045b2e2718
Oct 28 12:14:26 rurkh kernel: [703845.333249] FS: 00007fb0576e2700(0000) GS:ffff88047dc00000(0000) knlGS:0000000000000000
Oct 28 12:14:26 rurkh kernel: [703845.333313] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 12:14:26 rurkh kernel: [703845.333351] CR2: 00007f25bbdd1000 CR3: 0000000462fcf000 CR4: 0000000000042660
Oct 28 12:14:26 rurkh kernel: [703845.333414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 28 12:14:26 rurkh kernel: [703845.333475] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 28 12:14:26 rurkh kernel: [703845.333537] Stack:
Oct 28 12:14:26 rurkh kernel: [703845.333566] ffffffffffffffff ffffffff81285ec9 ffffffff81191fcd ffff8804596595e8
Oct 28 12:14:26 rurkh kernel: [703845.333640] ffff880464468b78 ffff880464468b80 ffff88045b10ec25 0000000000000000
Oct 28 12:14:26 rurkh kernel: [703845.333715] ffff88045b2e2718 ffffffffa029e4ec 0000000000000025 ffff88045b2e2770
Oct 28 12:14:26 rurkh kernel: [703845.333790] Call Trace:
Oct 28 12:14:26 rurkh kernel: [703845.333827] [<ffffffff81285ec9>] ? kernel_recvmsg+0x30/0x3a
Oct 28 12:14:26 rurkh kernel: [703845.333876] [<ffffffff81191fcd>] ? rb_erase+0x156/0x28f
Oct 28 12:14:26 rurkh kernel: [703845.333917] [<ffffffffa029e4ec>] ? dispatch+0x3da/0x535 [libceph]
Oct 28 12:14:26 rurkh kernel: [703845.333959] [<ffffffffa02990da>] ? con_work+0xf6e/0x1a65 [libceph]
Oct 28 12:14:26 rurkh kernel: [703845.334002] [<ffffffff810026fa>] ? xen_end_context_switch+0xa/0x14
Oct 28 12:14:26 rurkh kernel: [703845.334046] [<ffffffff810435df>] ? process_one_work+0x15a/0x215
Oct 28 12:14:26 rurkh kernel: [703845.334087] [<ffffffff81043a64>] ? worker_thread+0x139/0x1de
Oct 28 12:14:26 rurkh kernel: [703845.334130] [<ffffffff8104392b>] ? rescuer_thread+0x26e/0x26e
Oct 28 12:14:26 rurkh kernel: [703845.334173] [<ffffffff81047b74>] ? kthread+0x7d/0x85
Oct 28 12:14:26 rurkh kernel: [703845.334213] [<ffffffff81047af7>] ? __kthread_parkme+0x59/0x59
Oct 28 12:14:26 rurkh kernel: [703845.334258] [<ffffffff8135bcfc>] ? ret_from_fork+0x7c/0xb0
Oct 28 12:14:26 rurkh kernel: [703845.334300] [<ffffffff81047af7>] ? __kthread_parkme+0x59/0x59
Oct 28 12:14:26 rurkh kernel: [703845.334341] Code: 13 e1 0f 0b 44 3b 65 40 73 23 48 c7 c1 5b da 21 a0 ba 4d 08 00 00 31 c0 48 c7 c6 80 e9 2
1 a0 48 c7 c7 1f d1 21 a0 e8 15 77 13 e1 <0f> 0b 48 8d 45 3c 41 b5 01 48 89 c7 48 89 04 24 e8 6c cb 13 e1
Oct 28 12:14:26 rurkh kernel: [703845.334688] RIP [<ffffffffa021a3c7>] rbd_img_obj_callback+0x10b/0x3cb [rbd]
Oct 28 12:14:26 rurkh kernel: [703845.334737] RSP <ffff8804675bbcf8>
Oct 28 12:14:26 rurkh kernel: [703845.335099] ---[ end trace e8df3a7ff854054f ]---

Actions #6

Updated by Olivier Bonvalet about 10 years ago

Hi,

I had this error on a 3.10.27 kernel. The fix is included in this kernel, right ?

Jan 29 04:49:59 alg kernel: [213517.787595] Assertion failure in rbd_img_obj_callback() at line 2137:
Jan 29 04:49:59 alg kernel: [213517.787595] 
Jan 29 04:49:59 alg kernel: [213517.787595]     rbd_assert(which >= img_request->next_completion);
Jan 29 04:49:59 alg kernel: [213517.787595] 
Jan 29 04:49:59 alg kernel: [213517.787881] ------------[ cut here ]------------
Jan 29 04:49:59 alg kernel: [213517.787941] kernel BUG at drivers/block/rbd.c:2137!
Jan 29 04:49:59 alg kernel: [213517.787989] invalid opcode: 0000 [#1] SMP 
Jan 29 04:49:59 alg kernel: [213517.788069] Modules linked in: cbc rbd libceph xen_gntdev xt_physdev iptable_filter ip_tables x_tables xfs libcrc32c bridge loop coretemp ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support microcode sb_edac edac_core lpc_ich evdev i2c_i801 mfd_core ioatdma wmi button dm_mod hid_generic usbhid hid sg sd_mod ses enclosure crc_t10dif crc32c_intel ahci isci megaraid_sas libahci libsas libata ehci_pci ehci_hcd scsi_transport_sas scsi_mod usbcore usb_common ixgbe mdio igb i2c_algo_bit i2c_core dca ptp pps_core
Jan 29 04:49:59 alg kernel: [213517.788923] CPU: 8 PID: 5197 Comm: kworker/8:0 Not tainted 3.10-dae-dom0 #13
Jan 29 04:49:59 alg kernel: [213517.788974] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
Jan 29 04:49:59 alg kernel: [213517.789085] Workqueue: ceph-msgr con_work [libceph]
Jan 29 04:49:59 alg kernel: [213517.789141] task: ffff88021f14f080 ti: ffff88022fef2000 task.ti: ffff88022fef2000
Jan 29 04:49:59 alg kernel: [213517.789231] RIP: e030:[<ffffffffa025a449>]  [<ffffffffa025a449>] rbd_img_obj_callback+0x10b/0x3cb [rbd]
Jan 29 04:49:59 alg kernel: [213517.789333] RSP: e02b:ffff88022fef3ce8  EFLAGS: 00010282
Jan 29 04:49:59 alg kernel: [213517.789397] RAX: 0000000000000070 RBX: ffff88023036c8c8 RCX: 0000000000000000
Jan 29 04:49:59 alg kernel: [213517.789476] RDX: ffff880237f0e8c0 RSI: ffff880237f0de68 RDI: ffff88022fef02a8
Jan 29 04:49:59 alg kernel: [213517.789580] RBP: ffff88021e2d1380 R08: 0000000000000000 R09: 0000000000000000
Jan 29 04:49:59 alg kernel: [213517.789690] R10: 0000000000000000 R11: 0000000000001245 R12: 0000000000000001
Jan 29 04:49:59 alg kernel: [213517.789779] R13: 0000000000000000 R14: ffff880231f615a0 R15: 0000000000000000
Jan 29 04:49:59 alg kernel: [213517.789874] FS:  00007fba3e0fe700(0000) GS:ffff880237f00000(0000) knlGS:0000000000000000
Jan 29 04:49:59 alg kernel: [213517.789973] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 04:49:59 alg kernel: [213517.790027] CR2: 00007fc78c56f000 CR3: 000000022ef51000 CR4: 0000000000042660
Jan 29 04:49:59 alg kernel: [213517.790124] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 04:49:59 alg kernel: [213517.790213] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 29 04:49:59 alg kernel: [213517.790310] Stack:
Jan 29 04:49:59 alg kernel: [213517.790355]  000000000000000d ffff88022c18a9bd ffffffffffffffff ffff88022c18a988
Jan 29 04:49:59 alg kernel: [213517.790464]  ffff880231f61598 ffff8802316cb718 0000000000000000 ffff880231f615a0
Jan 29 04:49:59 alg kernel: [213517.790582]  0000000000000000 ffffffffa028551c 0000000000000025 ffff8802316cb770
Jan 29 04:49:59 alg kernel: [213517.790694] Call Trace:
Jan 29 04:49:59 alg kernel: [213517.790746]  [<ffffffffa028551c>] ? dispatch+0x3e4/0x55e [libceph]
Jan 29 04:49:59 alg kernel: [213517.790810]  [<ffffffffa02800d9>] ? con_work+0xf6e/0x1a65 [libceph]
Jan 29 04:49:59 alg kernel: [213517.790879]  [<ffffffff81003327>] ? arch_local_irq_restore+0x7/0x8
Jan 29 04:49:59 alg kernel: [213517.790937]  [<ffffffff810026fa>] ? xen_end_context_switch+0xa/0x14
Jan 29 04:49:59 alg kernel: [213517.790997]  [<ffffffff810070ba>] ? __switch_to+0x13e/0x3c2
Jan 29 04:49:59 alg kernel: [213517.791065]  [<ffffffff8104d6de>] ? mmdrop+0xd/0x1c
Jan 29 04:49:59 alg kernel: [213517.791114]  [<ffffffff8104dfdf>] ? finish_task_switch+0x4d/0x83
Jan 29 04:49:59 alg kernel: [213517.791177]  [<ffffffff81043589>] ? process_one_work+0x15a/0x215
Jan 29 04:49:59 alg kernel: [213517.791239]  [<ffffffff81043a0e>] ? worker_thread+0x139/0x1de
Jan 29 04:49:59 alg kernel: [213517.791306]  [<ffffffff810438d5>] ? rescuer_thread+0x26e/0x26e
Jan 29 04:49:59 alg kernel: [213517.791355]  [<ffffffff81047b78>] ? kthread+0x7d/0x85
Jan 29 04:49:59 alg kernel: [213517.791411]  [<ffffffff81047afb>] ? __kthread_parkme+0x59/0x59
Jan 29 04:49:59 alg kernel: [213517.791473]  [<ffffffff8135b67c>] ? ret_from_fork+0x7c/0xb0
Jan 29 04:49:59 alg kernel: [213517.791530]  [<ffffffff81047afb>] ? __kthread_parkme+0x59/0x59
Jan 29 04:49:59 alg kernel: [213517.791530]  [<ffffffff81047afb>] ? __kthread_parkme+0x59/0x59
Jan 29 04:49:59 alg kernel: [213517.791586] Code: 0f e1 0f 0b 44 3b 65 40 73 23 48 c7 c1 5b da 25 a0 ba 59 08 00 00 31 c0 48 c7 c6 60 e9 25 a0 48 c7 c7 1f d1 25 a0 e8 f7 6f 0f e1 <0f> 0b 48 8d 45 3c 41 b5 01 48 89 c7 48 89 04 24 e8 84 c4 0f e1 
Jan 29 04:49:59 alg kernel: [213517.792247] RIP  [<ffffffffa025a449>] rbd_img_obj_callback+0x10b/0x3cb [rbd]
Jan 29 04:49:59 alg kernel: [213517.792335]  RSP <ffff88022fef3ce8>
Jan 29 04:49:59 alg kernel: [213517.793057] ---[ end trace 9bf9554b68cf1b19 ]---
Actions #7

Updated by Ian Colle about 10 years ago

  • Status changed from Pending Backport to New
Actions #8

Updated by Olivier Bonvalet about 10 years ago

Hi,

not sure if it's the same bug or not :

Assertion failure in rbd_img_obj_callback() at line 2133
rbd_assert(img_request != NULL);

Mar  3 02:40:56 alg kernel: [259891.212399] 
Mar  3 02:40:56 alg kernel: [259891.212399] Assertion failure in rbd_img_obj_callback() at line 2133:
Mar  3 02:40:56 alg kernel: [259891.212399] 
Mar  3 02:40:56 alg kernel: [259891.212399]     rbd_assert(img_request != NULL);
Mar  3 02:40:56 alg kernel: [259891.212399] 
Mar  3 02:40:56 alg kernel: [259891.212665] ------------[ cut here ]------------
Mar  3 02:40:56 alg kernel: [259891.212713] kernel BUG at drivers/block/rbd.c:2133!
Mar  3 02:40:56 alg kernel: [259891.212761] invalid opcode: 0000 [#1] SMP 
Mar  3 02:40:56 alg kernel: [259891.212816] Modules linked in: cbc rbd libceph xen_gntdev xt_physdev iptable_filter ip_tables x_tables xfs libcrc32c bridge loop coretemp ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support microcode sb_edac edac_core lpc_ich mfd_core i2c_i801 evdev wmi button dm_mod hid_generic usbhid hid sg sd_mod ses enclosure crc_t10dif crc32c_intel isci megaraid_sas ahci libsas libahci libata scsi_transport_sas ehci_pci ehci_hcd scsi_mod usbcore ixgbe usb_common igb i2c_algo_bit mdio i2c_core ptp pps_core
Mar  3 02:40:56 alg kernel: [259891.213456] CPU: 3 PID: 9707 Comm: kworker/3:3 Not tainted 3.10-dae-dom0 #19
Mar  3 02:40:56 alg kernel: [259891.213512] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
Mar  3 02:40:56 alg kernel: [259891.213610] Workqueue: ceph-msgr con_work [libceph]
Mar  3 02:40:56 alg kernel: [259891.213662] task: ffff880232c3f080 ti: ffff880097e20000 task.ti: ffff880097e20000
Mar  3 02:40:56 alg kernel: [259891.213744] RIP: e030:[<ffffffffa028a3a7>]  [<ffffffffa028a3a7>] rbd_img_obj_callback+0x69/0x3cb [rbd]
Mar  3 02:40:56 alg kernel: [259891.213836] RSP: e02b:ffff880097e21ce8  EFLAGS: 00010282
Mar  3 02:40:56 alg kernel: [259891.213885] RAX: 000000000000005e RBX: ffff880106e5d768 RCX: 0000000000000000
Mar  3 02:40:56 alg kernel: [259891.213967] RDX: ffff880237e6e8c0 RSI: ffff880237e6de68 RDI: ffff880097e202a8
Mar  3 02:40:56 alg kernel: [259891.214048] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Mar  3 02:40:56 alg kernel: [259891.214130] R10: 0000000000000000 R11: 000000000000153d R12: 0000000000000001
Mar  3 02:40:56 alg kernel: [259891.214211] R13: 0000000000000000 R14: ffff8801022438c0 R15: 0000000000000000
Mar  3 02:40:56 alg kernel: [259891.214296] FS:  00007f08a30cd720(0000) GS:ffff880237e60000(0000) knlGS:0000000000000000
Mar  3 02:40:56 alg kernel: [259891.214380] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar  3 02:40:56 alg kernel: [259891.214431] CR2: 00007f08a1d64140 CR3: 000000000160c000 CR4: 0000000000042660
Mar  3 02:40:56 alg kernel: [259891.214513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar  3 02:40:56 alg kernel: [259891.214595] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar  3 02:40:56 alg kernel: [259891.214676] Stack:
Mar  3 02:40:56 alg kernel: [259891.214714]  000000000000000d ffff8800bd346705 ffffffffffffffff ffff8800bd3466d0
Mar  3 02:40:56 alg kernel: [259891.214813]  ffff8801022438b8 ffff880232c27718 0000000000000000 ffff8801022438c0
Mar  3 02:40:56 alg kernel: [259891.214913]  0000000000000000 ffffffffa024650e 0000000000000015 ffff880232c27770
Mar  3 02:40:56 alg kernel: [259891.215012] Call Trace:
Mar  3 02:40:56 alg kernel: [259891.215057]  [<ffffffffa024650e>] ? dispatch+0x3e4/0x55e [libceph]
Mar  3 02:40:56 alg kernel: [259891.215111]  [<ffffffffa02410d9>] ? con_work+0xf6e/0x1a65 [libceph]
Mar  3 02:40:56 alg kernel: [259891.215169]  [<ffffffff8104d8a5>] ? mmdrop+0xd/0x1c
Mar  3 02:40:56 alg kernel: [259891.215219]  [<ffffffff8104e1a6>] ? finish_task_switch+0x4d/0x83
Mar  3 02:40:56 alg kernel: [259891.215273]  [<ffffffff8104374b>] ? process_one_work+0x15a/0x215
Mar  3 02:40:56 alg kernel: [259891.215326]  [<ffffffff81043bd0>] ? worker_thread+0x139/0x1de
Mar  3 02:40:56 alg kernel: [259891.215377]  [<ffffffff81043a97>] ? rescuer_thread+0x26e/0x26e
Mar  3 02:40:56 alg kernel: [259891.215430]  [<ffffffff81047d3a>] ? kthread+0x7d/0x85
Mar  3 02:40:56 alg kernel: [259891.215479]  [<ffffffff81047cbd>] ? __kthread_parkme+0x59/0x59
Mar  3 02:40:56 alg kernel: [259891.215534]  [<ffffffff8136217c>] ? ret_from_fork+0x7c/0xb0
Mar  3 02:40:56 alg kernel: [259891.215585]  [<ffffffff81047cbd>] ? __kthread_parkme+0x59/0x59
Mar  3 02:40:56 alg kernel: [259891.215635] Code: 0b 48 8b 6b 20 48 85 ed 75 23 48 c7 c1 75 d3 28 a0 ba 55 08 00 00 31 c0 48 c7 c6 60 e9 28 a0 48 c7 c7 1f d1 28 a0 e8 59 db 0c e1 <0f> 0b 8b 45 5c 85 c0 75 21 48 c7 c1 66 d8 28 a0 ba 56 08 00 00 
Mar  3 02:40:56 alg kernel: [259891.216071] RIP  [<ffffffffa028a3a7>] rbd_img_obj_callback+0x69/0x3cb [rbd]
Mar  3 02:40:56 alg kernel: [259891.216129]  RSP <ffff880097e21ce8>
Mar  3 02:40:56 alg kernel: [259891.216565] ---[ end trace 9e0cd662b1f708da ]---
Actions #9

Updated by Ian Colle about 10 years ago

  • Assignee set to Ilya Dryomov
Actions #10

Updated by Olivier Bonvalet about 10 years ago

Hi,

same thing with a 3.13.5 kernel :

Mar 18 16:31:19 murmillia kernel: [88362.548693] 
Mar 18 16:31:19 murmillia kernel: [88362.548693] Assertion failure in rbd_img_obj_callback() at line 2127:
Mar 18 16:31:19 murmillia kernel: [88362.548693] 
Mar 18 16:31:19 murmillia kernel: [88362.548693]        rbd_assert(img_request != NULL);
Mar 18 16:31:19 murmillia kernel: [88362.548693] 
Mar 18 16:31:19 murmillia kernel: [88362.549066] ------------[ cut here ]------------
Mar 18 16:31:19 murmillia kernel: [88362.549106] kernel BUG at drivers/block/rbd.c:2127!
Mar 18 16:31:19 murmillia kernel: [88362.549145] invalid opcode: 0000 [#1] SMP 
Mar 18 16:31:19 murmillia kernel: [88362.549208] Modules linked in: cbc rbd libceph xen_gntdev ip6t_REJECT ip6table_filter ip6table_mangle ip6_tables xt_LOG xt_physdev ipt_REJECT iptable_filter xt_tcpudp xt_DSCP iptable_mangle ip_tables x_tables xfs libcrc32c bridge loop iTCO_wdt gpio_ich iTCO_vendor_support serio_raw sb_edac edac_core evdev i2c_i801 lpc_ich mfd_core ioatdma ipmi_si ipmi_msghandler wmi ac button shpchp dm_mod hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_common megaraid_sas isci ahci libahci libsas libata igb scsi_transport_sas ehci_pci ehci_hcd scsi_mod ixgbe i2c_algo_bit usbcore i2c_core usb_common dca ptp pps_core mdio
Mar 18 16:31:19 murmillia kernel: [88362.549908] CPU: 0 PID: 21511 Comm: kworker/0:1 Not tainted 3.13-dae-dom0 #1
Mar 18 16:31:19 murmillia kernel: [88362.549955] Hardware name: Supermicro X9DRW-7TPF+/X9DRW-7TPF+, BIOS 2.0a 03/11/2013
Mar 18 16:31:19 murmillia kernel: [88362.550034] Workqueue: ceph-msgr con_work [libceph]
Mar 18 16:31:19 murmillia kernel: [88362.550087] task: ffff880073c91150 ti: ffff880241c98000 task.ti: ffff880241c98000
Mar 18 16:31:19 murmillia kernel: [88362.550156] RIP: e030:[<ffffffffa033aac0>]  [<ffffffffa033aac0>] rbd_img_obj_callback+0x69/0x3cb [rbd]
Mar 18 16:31:19 murmillia kernel: [88362.550240] RSP: e02b:ffff880241c99ce8  EFLAGS: 00010282
Mar 18 16:31:19 murmillia kernel: [88362.550282] RAX: 000000000000005e RBX: ffff88024e1506c8 RCX: 0000000000000000
Mar 18 16:31:19 murmillia kernel: [88362.550329] RDX: ffff88027fc0eb50 RSI: ffff88027fc0e1a8 RDI: ffff880241c902a8
Mar 18 16:31:19 murmillia kernel: [88362.550375] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Mar 18 16:31:19 murmillia kernel: [88362.550432] R10: 0000000000000000 R11: 00000000000003c8 R12: 00000000ffffffff
Mar 18 16:31:19 murmillia kernel: [88362.550475] R13: 0000000000000000 R14: ffff880273196410 R15: 0000000000000000
Mar 18 16:31:19 murmillia kernel: [88362.550541] FS:  00007fd2081a1700(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
Mar 18 16:31:19 murmillia kernel: [88362.550623] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 18 16:31:19 murmillia kernel: [88362.550676] CR2: 00007f54d655f000 CR3: 00000002661e4000 CR4: 0000000000042660
Mar 18 16:31:19 murmillia kernel: [88362.550725] Stack:
Mar 18 16:31:19 murmillia kernel: [88362.550762]  ffffffffa033aa57 ffff880273196640 0000000000002201 ffff88025bd06788
Mar 18 16:31:19 murmillia kernel: [88362.550875]  ffff880273196408 ffff8802662ce718 0000000000000000 ffff880273196410
Mar 18 16:31:19 murmillia kernel: [88362.550980]  0000000000000000 ffffffffa0319594 0000000000000025 ffff8802662ce770
Mar 18 16:31:19 murmillia kernel: [88362.551088] Call Trace:
Mar 18 16:31:19 murmillia kernel: [88362.551139]  [<ffffffffa033aa57>] ? rbd_watch_cb+0xc1/0xc1 [rbd]
Mar 18 16:31:19 murmillia kernel: [88362.551200]  [<ffffffffa0319594>] ? dispatch+0x3e4/0x55e [libceph]
Mar 18 16:31:19 murmillia kernel: [88362.551258]  [<ffffffffa03140fc>] ? con_work+0xf6e/0x1a65 [libceph]
Mar 18 16:31:19 murmillia kernel: [88362.551320]  [<ffffffff81005f00>] ? xen_timer_resume+0x4f/0x4f
Mar 18 16:31:19 murmillia kernel: [88362.551383]  [<ffffffff81051f26>] ? mmdrop+0xd/0x1c
Mar 18 16:31:19 murmillia kernel: [88362.551442]  [<ffffffff81052601>] ? finish_task_switch+0x4d/0x83
Mar 18 16:31:19 murmillia kernel: [88362.551497]  [<ffffffff8104847a>] ? process_one_work+0x15a/0x214
Mar 18 16:31:19 murmillia kernel: [88362.551553]  [<ffffffff810488fe>] ? worker_thread+0x139/0x1de
Mar 18 16:31:19 murmillia kernel: [88362.551606]  [<ffffffff810487c5>] ? rescuer_thread+0x26e/0x26e
Mar 18 16:31:19 murmillia kernel: [88362.551669]  [<ffffffff8104cf99>] ? kthread+0x9e/0xa6
Mar 18 16:31:19 murmillia kernel: [88362.551718]  [<ffffffff8104cefb>] ? __kthread_parkme+0x55/0x55
Mar 18 16:31:19 murmillia kernel: [88362.551780]  [<ffffffff8137200c>] ? ret_from_fork+0x7c/0xb0
Mar 18 16:31:19 murmillia kernel: [88362.551829]  [<ffffffff8104cefb>] ? __kthread_parkme+0x55/0x55
Mar 18 16:31:19 murmillia kernel: [88362.551877] Code: 0b 48 8b 6b 20 48 85 ed 75 23 48 c7 c1 5e d3 33 a0 ba 4f 08 00 00 31 c0 48 c7 c6 80 e9 33 a0 48 c7 c7 1f d1 33 a0 e8 43 d0 02 e1 <0f> 0b 8b 45 5c 85 c0 75 21 48 c7 c1 66 d8 33 a0 ba 50 08 00 00 
Mar 18 16:31:19 murmillia kernel: [88362.552517] RIP  [<ffffffffa033aac0>] rbd_img_obj_callback+0x69/0x3cb [rbd]
Mar 18 16:31:19 murmillia kernel: [88362.552589]  RSP <ffff880241c99ce8>
Mar 18 16:31:19 murmillia kernel: [88362.554220] ---[ end trace 6935c08c0172485b ]---

Actions #11

Updated by Olivier Bonvalet about 10 years ago

and in a 3.13.5 kernel too :

Mar 18 02:00:01 rurkh kernel: [1231536.338315] xen:balloon: reserve_additional_memory: add_memory() failed: -17
Mar 18 05:46:52 rurkh kernel: [1245147.503380] 
Mar 18 05:46:52 rurkh kernel: [1245147.503380] Assertion failure in rbd_img_obj_callback() at line 2131:
Mar 18 05:46:52 rurkh kernel: [1245147.503380] 
Mar 18 05:46:52 rurkh kernel: [1245147.503380]  rbd_assert(which >= img_request->next_completion);
Mar 18 05:46:52 rurkh kernel: [1245147.503380] 
Mar 18 05:46:52 rurkh kernel: [1245147.503659] ------------[ cut here ]------------
Mar 18 05:46:52 rurkh kernel: [1245147.503699] kernel BUG at drivers/block/rbd.c:2131!
Mar 18 05:46:52 rurkh kernel: [1245147.503740] invalid opcode: 0000 [#1] SMP 
Mar 18 05:46:52 rurkh kernel: [1245147.503785] Modules linked in: cbc rbd libceph xen_gntdev xt_physdev iptable_filter ip_tables x_tables xfs libcrc32c bridge loop iTCO_wdt gpio_ich iTCO_vendor_support serio_raw sb_edac edac_core i2c_i801 lpc_ich evdev mfd_core ioatdma shpchp ipmi_
si ipmi_msghandler wmi ac button dm_mod hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_common ahci isci libahci megaraid_sas libsas libata ehci_pci scsi_transport_sas ehci_hcd igb usbcore i2c_algo_bit scsi_mod ixgbe i2c_core usb_common dca ptp pps_core mdio
Mar 18 05:46:52 rurkh kernel: [1245147.504283] CPU: 0 PID: 25525 Comm: kworker/0:1 Not tainted 3.13-dae-dom0 #1
Mar 18 05:46:52 rurkh kernel: [1245147.504348] Hardware name: Supermicro X9DRW-7TPF+/X9DRW-7TPF+, BIOS 3.0 07/24/2013
Mar 18 05:46:52 rurkh kernel: [1245147.504420] Workqueue: ceph-msgr con_work [libceph]
Mar 18 05:46:52 rurkh kernel: [1245147.504462] task: ffff8802575320d0 ti: ffff8802401a0000 task.ti: ffff8802401a0000
Mar 18 05:46:52 rurkh kernel: [1245147.504528] RIP: e030:[<ffffffffa02ffb62>]  [<ffffffffa02ffb62>] rbd_img_obj_callback+0x10b/0x3cb [rbd]
Mar 18 05:46:52 rurkh kernel: [1245147.504604] RSP: e02b:ffff8802401a1ce8  EFLAGS: 00010282
Mar 18 05:46:52 rurkh kernel: [1245147.504644] RAX: 0000000000000070 RBX: ffff88006e7ba108 RCX: 0000000000000000
Mar 18 05:46:52 rurkh kernel: [1245147.504709] RDX: ffff88027fe0eb50 RSI: ffff88027fe0e1a8 RDI: ffff8802401a02a8
Mar 18 05:46:52 rurkh kernel: [1245147.504773] RBP: ffff88023c8f41f0 R08: 0000000000000000 R09: 0000000000000000
Mar 18 05:46:52 rurkh kernel: [1245147.504838] R10: 0000000000000000 R11: 0000000000000bbc R12: 0000000000000001
Mar 18 05:46:52 rurkh kernel: [1245147.504902] R13: 0000000000000000 R14: ffff880250951bf0 R15: 0000000000000000
Mar 18 05:46:52 rurkh kernel: [1245147.504972] FS:  00007fb72ad39700(0000) GS:ffff88027fe00000(0000) knlGS:0000000000000000
Mar 18 05:46:52 rurkh kernel: [1245147.505039] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 18 05:46:52 rurkh kernel: [1245147.505080] CR2: 00007fc8a15e7000 CR3: 00000000028b0000 CR4: 0000000000042660
Mar 18 05:46:52 rurkh kernel: [1245147.505145] Stack:
Mar 18 05:46:52 rurkh kernel: [1245147.505176]  000000000000000d ffff88007b7b1efd ffffffffffffffff ffff88007b7b1ec8
Mar 18 05:46:52 rurkh kernel: [1245147.505257]  ffff880250951be8 ffff8802730db718 0000000000000000 ffff880250951bf0
Mar 18 05:46:52 rurkh kernel: [1245147.505337]  0000000000000000 ffffffffa02de594 0000000000000015 ffff8802730db770
Mar 18 05:46:52 rurkh kernel: [1245147.505416] Call Trace:
Mar 18 05:46:52 rurkh kernel: [1245147.505456]  [<ffffffffa02de594>] ? dispatch+0x3e4/0x55e [libceph]
Mar 18 05:46:52 rurkh kernel: [1245147.505503]  [<ffffffffa02d90fc>] ? con_work+0xf6e/0x1a65 [libceph]
Mar 18 05:46:52 rurkh kernel: [1245147.505549]  [<ffffffff81051f26>] ? mmdrop+0xd/0x1c
Mar 18 05:46:52 rurkh kernel: [1245147.505590]  [<ffffffff81052601>] ? finish_task_switch+0x4d/0x83
Mar 18 05:46:52 rurkh kernel: [1245147.505635]  [<ffffffff8104847a>] ? process_one_work+0x15a/0x214
Mar 18 05:46:52 rurkh kernel: [1245147.505678]  [<ffffffff810488fe>] ? worker_thread+0x139/0x1de
Mar 18 05:46:52 rurkh kernel: [1245147.505720]  [<ffffffff810487c5>] ? rescuer_thread+0x26e/0x26e
Mar 18 05:46:52 rurkh kernel: [1245147.505764]  [<ffffffff8104cf99>] ? kthread+0x9e/0xa6
Mar 18 05:46:52 rurkh kernel: [1245147.505804]  [<ffffffff8104cefb>] ? __kthread_parkme+0x55/0x55
Mar 18 05:46:52 rurkh kernel: [1245147.505847]  [<ffffffff8137200c>] ? ret_from_fork+0x7c/0xb0
Mar 18 05:46:52 rurkh kernel: [1245147.505890]  [<ffffffff8104cefb>] ? __kthread_parkme+0x55/0x55
Mar 18 05:46:52 rurkh kernel: [1245147.505930] Code: 06 e1 0f 0b 44 3b 65 40 73 23 48 c7 c1 2e 2c 30 a0 ba 53 08 00 00 31 c0 48 c7 c6 80 39 30 a0 48 c7 c7 1f 21 30 a0 e8 a1 7f 06 e1 <0f> 0b 48 8d 45 3c 41 b5 01 48 89 c7 48 89 04 24 e8 9a d7 06 e1 
Mar 18 05:46:52 rurkh kernel: [1245147.506317] RIP  [<ffffffffa02ffb62>] rbd_img_obj_callback+0x10b/0x3cb [rbd]
Mar 18 05:46:52 rurkh kernel: [1245147.506386]  RSP <ffff8802401a1ce8>
Mar 18 05:46:52 rurkh kernel: [1245147.506859] ---[ end trace 6fa8b6ece62ae5c0 ]---

Actions #12

Updated by Ilya Dryomov about 10 years ago

Hi Olivier,

Can you attach the entire dmesg or at least a few minutes worth of
messages prior to the assertion failure splat in each case? Also, can
you describe your workload and what exactly was happening at that time?

Actions #13

Updated by Olivier Bonvalet about 10 years ago

Hi,

well, there is nothing else in dmesg, except several hours before this hang.

This servers are running about 30 VM (Xen paravirtualisation), using 5 RBD images each one (so about 150 RBD images are mapped on each server). For me there is nothing "special" at this hours : snapshot are created and removed only at night, and I have hang at all hours.

Those VM are essentially small LAMP servers.

If it can help cluster stats are : pgmap v39011655: 5168 pgs: 5168 active+clean; 14347 GB data, 51970 GB used, 28264 GB / 80235 GB avail; 4013KB/s rd, 67873KB/s wr, 2679op/s

Note that servers "murmillia" and "rurkh" are also MON servers.

Actions #14

Updated by Ian Colle about 10 years ago

  • Status changed from New to In Progress
Actions #15

Updated by Olivier Bonvalet about 10 years ago

I haven't got this problem anymore, it seems really stable for me now. Thanks !

I think the issue can be mark as resolved.

Actions #16

Updated by Ilya Dryomov about 10 years ago

Just in case, the fix you are running with is now in 3.14. However we
are still working on a better fix, so we'll keep this open for a while.

Actions #17

Updated by Ilya Dryomov almost 10 years ago

  • Status changed from In Progress to Resolved

Should be fixed by commit 0f2d5be792b0 ("rbd: use reference counts for image requests"), which went into 3.16-rc1.

Actions #18

Updated by sean redmond almost 10 years ago

Olivier Bonvalet wrote:

I haven't got this problem anymore, it seems really stable for me now. Thanks !

I think the issue can be mark as resolved.

How did you overcome this Bug? Applying the patch in 3.13.5 or upgrading to a newer kernel?

Actions #19

Updated by sean redmond almost 10 years ago

Ilya Dryomov wrote:

Just in case, the fix you are running with is now in 3.14. However we
are still working on a better fix, so we'll keep this open for a while.

Which version of 3.14 is this fix applied to please? - I was not able to see anything in the change logs.

Actions #20

Updated by sean redmond almost 10 years ago

sean redmond wrote:

Ilya Dryomov wrote:

Just in case, the fix you are running with is now in 3.14. However we
are still working on a better fix, so we'll keep this open for a while.

Which version of 3.14 is this fix applied to please? - I was not able to see anything in the change logs.

Sorry, please ignore this I can see the patch applied in 3.14.8 at least.

Actions

Also available in: Atom PDF