Project

General

Profile

Actions

Bug #16963

closed

rbd_assert(rbd_image_format_valid(rbd_dev->image_format)) on OSD death

Added by Ilya Dryomov over 7 years ago. Updated 2 months ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
rbd
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

On Mon, Aug 8, 2016 at 9:57 PM, Victor Payno <vpayno@gaikai.com> wrote:
> We have another problem where an RBD client was killed when an OSD was
> killed by the OOM on a server. The servers have 4.4.16 kernels.
>
> ams2 login: [789881.620147] ------------[ cut here ]------------
> [789881.625094] kernel BUG at drivers/block/rbd.c:4638!
> [789881.630311] invalid opcode: 0000 [#1] SMP
> [789881.634650] Modules linked in: rbd libceph sg rpcsec_gss_krb5
> xt_nat xt_UDPLB(O) xt_multiport xt_addrtype iptable_mangle iptable_raw
> iptable_nat nf_nat_ipv4 nf_nat ext4 jbd2 mbcache x86_pkg_temp_thermal
> gkuart(O) usbserial ie31200_edac edac_core tpm_tis raid1 crc32c_intel
> [789881.661718] CPU: 4 PID: 4111 Comm: kworker/u16:0 Tainted: G
>    O    4.7.0-vanilla-ams-3 #1
> [789881.671091] Hardware name: Quanta T6BC-S1N/T6BC, BIOS T6BC2A01 03/26/2014
> [789881.678212] Workqueue: ceph-watch-notify do_watch_notify [libceph]
> [789881.684814] task: ffff88032069ea00 ti: ffff8803f0c90000 task.ti:
> ffff8803f0c90000
> [789881.692802] RIP: 0010:[<ffffffffa016d1c9>]  [<ffffffffa016d1c9>]
> rbd_dev_header_info+0x5a9/0x940 [rbd]
> [789881.702702] RSP: 0018:ffff8803f0c93d30  EFLAGS: 00010286
> [789881.708344] RAX: 0000000000000077 RBX: ffff8802a6a63800 RCX:
> 0000000000000000
> [789881.715985] RDX: 0000000000000077 RSI: ffff88041fd0dd08 RDI:
> ffff88041fd0dd08
> [789881.723625] RBP: ffff8803f0c93d98 R08: 0000000000000030 R09:
> 0000000000000000
> [789881.731261] R10: 0000000000000000 R11: 0000000000004479 R12:
> ffff8800d6eaf000
> [789881.738899] R13: ffff8802a6a639b0 R14: 0000000000000000 R15:
> ffff880327e6e780
> [789881.746533] FS:  0000000000000000(0000) GS:ffff88041fd00000(0000)
> knlGS:0000000000000000
> [789881.755120] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [789881.761197] CR2: 00007fbb18242838 CR3: 0000000001e07000 CR4:
> 00000000001406e0
> [789881.768846] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [789881.776482] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [789881.784118] Stack:
> [789881.786457]  ffffffff8113a91a ffff88032069ea00 ffff88041fd17ef0
> ffff88041fd17ef0
> [789881.794713]  ffff88041fd17ef0 0000000000030289 ffff8803f0c93dd8
> ffffffff8113d968
> [789881.802965]  ffff8802a6a63800 ffff8800d6eaf000 ffff8802a6a639b0
> 0000000000000000
> [789881.811207] Call Trace:
> [789881.813988]  [<ffffffff8113a91a>] ? update_curr+0x8a/0x110
> [789881.819810]  [<ffffffff8113d968>] ? dequeue_task_fair+0x618/0x1150
> [789881.826321]  [<ffffffffa016d591>] rbd_dev_refresh+0x31/0xf0 [rbd]
> [789881.832760]  [<ffffffffa016d719>] rbd_watch_cb+0x29/0xa0 [rbd]
> [789881.838930]  [<ffffffffa0138fdc>] do_watch_notify+0x4c/0x80 [libceph]
> [789881.845706]  [<ffffffff811258e9>] process_one_work+0x149/0x3c0
> [789881.856639]  [<ffffffff81125bae>] worker_thread+0x4e/0x490
> [789881.862453]  [<ffffffff81125b60>] ? process_one_work+0x3c0/0x3c0
> [789881.868823]  [<ffffffff8112b1e9>] kthread+0xc9/0xe0
> [789881.874033]  [<ffffffff8185e4ff>] ret_from_fork+0x1f/0x40
> [789881.879764]  [<ffffffff8112b120>] ? kthread_create_on_node+0x170/0x170
> [789881.886618] Code: 0b 44 8b 6d b8 e9 1d ff ff ff 48 c7 c1 f0 00 17
> a0 ba 1e 12 00 00 48 c7 c6 90 0e 17 a0 48 c7 c7 20 f8 16 a0 31 c0 e8
> 8a 5d 08 e1 <0f> 0b 75 14 49 8b 7f 68 41 bd 92 ff ff ff e8 d4 e0 fc ff
> e9 dc
> [789881.911744] RIP  [<ffffffffa016d1c9>] rbd_dev_header_info+0x5a9/0x940 [rbd]
> [789881.919116]  RSP <ffff8803f0c93d30>
> [789881.922989] ---[ end trace 12b8d1c2ed74d6c1 ]---

Also http://marc.info/?l=ceph-devel&m=147022089813864&w=2.

Actions #1

Updated by Ilya Dryomov 2 months ago

  • Status changed from New to Can't reproduce

This assert is still there in the code but hasn't been seen anywhere else.

Actions

Also available in: Atom PDF