Bug #6267
closedkrbd: null deref in __kick_osd_requests+0x15e/0x1b0
0%
Description
[639680.982539] BUG: unable to handle kernel NULL pointer dereference at 0000000000000498
[639680.986988] IP: [<ffffffffa01cf1ae>] __kick_osd_requests+0x15e/0x1b0 [libceph]
[639680.989983] PGD 10d2139067 PUD 10d546c067 PMD 0
[639680.989983] Oops: 0000 [#1] SMP
[639680.989983] Modules linked in: xt_nat xt_mark iptable_mangle dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag ipt_REJECT xt_LOG xt_limit xt_recent xt_state xt_REDIRECT xt_tcpudp iptable_filter xt_addrtype veth aufs ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables ebt_ip ebtable_broute ebtable_nat ebtable_filter ebtables x_tables bridge stp llc ip_gre gre dm_crypt rbd libceph psmouse libcrc32c microcode joydev i2c_piix4 mac_hid serio_raw virtio_balloon acpiphp hid_generic usbhid hid crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 xts lrw gf128mul ablk_helper floppy cryptd cirrus syscopyarea sysfillrect sysimgblt ttm drm_kms_helper drm
[639681.016598] CPU 4
[639681.016598] Pid: 27857, comm: kworker/4:1 Tainted: G W 3.9.00generic #4userns5 OpenStack Foundation OpenStack Nova
[639681.016598] RIP: 0010:[<ffffffffa01cf1ae>] [<ffffffffa01cf1ae>] __kick_osd_requests+0x15e/0x1b0 [libceph]
[639681.016598] RSP: 0018:ffff880eb4fc9d28 EFLAGS: 00010206
[639681.016598] RAX: 0000000000000000 RBX: ffff8810d5ff0738 RCX: ffff880ffc440db0
[639681.016598] RDX: ffff8810d5ff07f0 RSI: ffff880ffc440d80 RDI: ffff8810d5ff0738
[639681.016598] RBP: ffff880eb4fc9d78 R08: 000000000000000a R09: 0000000000000000
[639681.045466] R10: 00000000000298f7 R11: 00000000000298f6 R12: ffff880ffc440d80
[639681.045466] R13: ffff880e90298ca0 R14: ffff880ffc440d80 R15: ffff880ffc440da0
[639681.045466] FS: 0000000000000000(0000) GS:ffff881139700000(0000) knlGS:0000000000000000
[639681.045466] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[639681.045466] CR2: 0000000000000498 CR3: 00000010d2122000 CR4: 00000000000006e0
[639681.065324] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[639681.065324] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[639681.065324] Process kworker/4:1 (pid: 27857, threadinfo ffff880eb4fc8000, task ffff880ec5e10000)
[639681.065324] Stack:
[639681.065324] 0000000000000000 ffff8810d5ff07f0 ffff880ebec37640 ffff88103d8072e0
[639681.065324] ffff880eb4fc9d68 ffff8810d5ff0738 ffff8810d5ff0748 ffff8810d5ff0790
[639681.065324] ffff880e90298800 0000000000000000 ffff880eb4fc9da8 ffffffffa01cf24c
[639681.065324] Call Trace:
[639681.065324] [<ffffffffa01cf24c>] osd_reset+0x4c/0x80 [libceph]
[639681.065324] [<ffffffffa01ca576>] con_work+0x126/0x240 [libceph]
[639681.065324] [<ffffffff8107a5db>] process_one_work+0x16b/0x400
[639681.065324] [<ffffffff8107b298>] worker_thread+0x118/0x350
[639681.065324] [<ffffffff8107b180>] ? manage_workers+0x120/0x120
[639681.109419] [<ffffffff810808e0>] kthread+0xc0/0xd0
[639681.109419] [<ffffffff81080820>] ? flush_kthread_worker+0xb0/0xb0
[639681.109419] [<ffffffff8171816c>] ret_from_fork+0x7c/0xb0
[639681.109419] [<ffffffff81080820>] ? flush_kthread_worker+0xb0/0xb0
[639681.125066] Code: 00 00 00 48 8b 55 b8 49 8d 4c 24 30 4c 89 bb c0 00 00 00 4c 89 e6 48 89 df 49 89 44 24 28 49 89 54 24 20 4c 89 38 49 8b 44 24 60 <48> 8b 90 98 04 00 00 48 89 88 98 04 00 00 48 05 90 04 00 00 49
[639681.125066] RIP [<ffffffffa01cf1ae>] __kick_osd_requests+0x15e/0x1b0 [libceph]
[639681.137488] RSP <ffff880eb4fc9d28>
[639681.137488] CR2: 0000000000000498
[639681.167212] [ end trace f707d561f7dc2318 ]
[639681.171811] BUG: unable to handle kernel paging request at ffffffffffffffd8
[639681.174826] IP: [<ffffffff81080d10>] kthread_data+0x10/0x20
[639681.174826] PGD 1c10067 PUD 1c12067 PMD 0
[639681.174826] Oops: 0000 [#2] SMP
[639681.174826] Modules linked in: xt_nat xt_mark iptable_mangle dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag ipt_REJECT xt_LOG xt_limit xt_recent xt_state xt_REDIRECT xt_tcpudp iptable_filter xt_addrtype veth aufs ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables ebt_ip ebtable_broute ebtable_nat ebtable_filter ebtables x_tables bridge stp llc ip_gre gre dm_crypt rbd libceph psmouse libcrc32c microcode joydev i2c_piix4 mac_hid serio_raw virtio_balloon acpiphp hid_generic usbhid hid crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 xts lrw gf128mul ablk_helper floppy cryptd cirrus syscopyarea sysfillrect sysimgblt ttm drm_kms_helper drm
[639681.174826] CPU 4
[639681.174826] Pid: 27857, comm: kworker/4:1 Tainted: G D W 3.9.00generic #4userns5 OpenStack Foundation OpenStack Nova
[639681.174826] RIP: 0010:[<ffffffff81080d10>] [<ffffffff81080d10>] kthread_data+0x10/0x20
[639681.174826] RSP: 0018:ffff880eb4fc9918 EFLAGS: 00010096
[639681.174826] RAX: 0000000000000000 RBX: 0000000000000004 RCX: ffffffff81ebf020
[639681.174826] RDX: 0000000000000009 RSI: 0000000000000004 RDI: ffff880ec5e10000
[639681.174826] RBP: ffff880eb4fc9918 R08: 0000000000000001 R09: ffffea003729f800
[639681.174826] R10: 0000000000000000 R11: ffff8810d26f1f20 R12: 0000000000000004
[639681.174826] R13: ffff880ec5e10400 R14: 0000000000000001 R15: 0000000000000246
[639681.174826] FS: 0000000000000000(0000) GS:ffff881139700000(0000) knlGS:0000000000000000
[639681.174826] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[639681.174826] CR2: ffffffffffffffd8 CR3: 0000000d5a786000 CR4: 00000000000006e0
[639681.174826] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[639681.174826] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[639681.174826] Process kworker/4:1 (pid: 27857, threadinfo ffff880eb4fc8000, task ffff880ec5e10000)
[639681.174826] Stack:
[639681.174826] ffff880eb4fc9938 ffffffff8107bdd5 ffff880eb4fc9938 ffff881139714240
[639681.174826] ffff880eb4fc99b8 ffffffff8170d6af 0000000000000003 ffffea003729fa00
[639681.174826] ffff880eb4fc9fd8 ffff880eb4fc9fd8 ffff880eb4fc9fd8 0000000000014240
[639681.174826] Call Trace:
[639681.174826] [<ffffffff8107bdd5>] wq_worker_sleeping+0x15/0x80
[639681.174826] [<ffffffff8170d6af>] __schedule+0x5ef/0x6d0
[639681.174826] [<ffffffff8170e269>] schedule+0x29/0x70
[639681.174826] [<ffffffff81060e0b>] do_exit+0x2bb/0x480
[639681.174826] [<ffffffff81710669>] oops_end+0xb9/0x100
[639681.174826] [<ffffffff816f56c6>] no_context+0x1ab/0x1ba
[639681.174826] [<ffffffff816f58a8>] __bad_area_nosemaphore+0x1d3/0x1f2
[639681.174826] [<ffffffff8105b3a7>] ? print_time.part.5+0x67/0x90
[639681.174826] [<ffffffff8105b447>] ? print_prefix+0x77/0xc0
[639681.174826] [<ffffffff816f58da>] bad_area_nosemaphore+0x13/0x15
[639681.174826] [<ffffffff817135e2>] __do_page_fault+0x3c2/0x560
[639681.174826] [<ffffffff8105ce04>] ? wake_up_klogd+0x34/0x40
[639681.174826] [<ffffffff8105d057>] ? console_unlock.part.9+0x247/0x270
[639681.174826] [<ffffffff8105d09a>] ? console_unlock+0x1a/0x30
[639681.174826] [<ffffffff817137ab>] do_page_fault+0x2b/0x50
[639681.174826] [<ffffffff81712e45>] do_async_page_fault+0x35/0xa0
[639681.174826] [<ffffffff8170fa48>] async_page_fault+0x28/0x30
[639681.174826] [<ffffffffa01cf1ae>] ? __kick_osd_requests+0x15e/0x1b0 [libceph]
[639681.174826] [<ffffffffa01cf17f>] ? __kick_osd_requests+0x12f/0x1b0 [libceph]
[639681.174826] [<ffffffffa01cf24c>] osd_reset+0x4c/0x80 [libceph]
[639681.174826] [<ffffffffa01ca576>] con_work+0x126/0x240 [libceph]
[639681.174826] [<ffffffff8107a5db>] process_one_work+0x16b/0x400
[639681.174826] [<ffffffff8107b298>] worker_thread+0x118/0x350
[639681.174826] [<ffffffff8107b180>] ? manage_workers+0x120/0x120
[639681.174826] [<ffffffff810808e0>] kthread+0xc0/0xd0
[639681.174826] [<ffffffff81080820>] ? flush_kthread_worker+0xb0/0xb0
[639681.174826] [<ffffffff8171816c>] ret_from_fork+0x7c/0xb0
[639681.174826] [<ffffffff81080820>] ? flush_kthread_worker+0xb0/0xb0
[639681.174826] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 a0 03 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
[639681.174826] RIP [<ffffffff81080d10>] kthread_data+0x10/0x20
[639681.174826] RSP <ffff880eb4fc9918>
[639681.174826] CR2: ffffffffffffffd8
[639681.174826] [ end trace f707d561f7dc2319 ]
[639681.174826] Fixing recursive fault but reboot is needed!
This is ubuntu 13.04 with Kernel 3.9.00 patched to enable user namespaces.
This has happened twice today under moderate load - about 150 RBD's mapped.
Updated by Sage Weil over 10 years ago
- Subject changed from Kernel null pointer leading to panic to krbd: null deref in __kick_osd_requests+0x15e/0x1b0
- Status changed from New to Need More Info
can you try a 3.10 kernel? there was at least one locking fix during that interval that could explain this. (also, the 3.9 kernel is EOL.)
Updated by geraint jones over 10 years ago
Sage Weil wrote:
can you try a 3.10 kernel? there was at least one locking fix during that interval that could explain this. (also, the 3.9 kernel is EOL.)
I will do my best, these boxes are production :)
Give me a few days and I will do the update and see if its still happening.
Updated by Sage Weil over 10 years ago
- Status changed from Need More Info to Resolved