Actions
Bug #15490
closedrbd map vs notify race
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
[60185.245893] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [60185.246636] IP: [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd] [60185.247216] PGD a778b067 PUD a7789067 PMD 0 [60185.247626] Oops: 0002 [#1] SMP [60185.247969] Modules linked in: ext4 mbcache jbd2 rbd libceph dns_resolver xt_statistic xt_nat xt_mark veth xt_comment xt_multiport vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw virtio_balloon gf128mul glue_helper i2c_piix4 ablk_helper cryptd parport_pc parport pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk cirrus syscopyarea sysfillrect sysimgblt crct10dif_pclmul crct10dif_common drm_kms_helper ttm crc32c_intel serio_raw ata_piix drm libata [60185.254483] floppy virtio_pci virtio_ring i2c_core virtio dm_mirror dm_region_hash dm_log dm_mod [60185.255186] CPU: 6 PID: 5462 Comm: kworker/u16:0 Not tainted 3.10.0-327.13.1.el7.x86_64 #1 [60185.255856] Hardware name: Red Hat OpenStack Compute, BIOS seabios-1.7.5-8.el7 04/01/2014 [60185.256567] Workqueue: ceph-watch-notify do_event_work [libceph] [60185.257111] task: ffff880756980000 ti: ffff8805ca560000 task.ti: ffff8805ca560000 [60185.257717] RIP: 0010:[<ffffffffa050828a>] [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd] [60185.258476] RSP: 0018:ffff8805ca563d90 EFLAGS: 00010246 [60185.258926] RAX: 0000000000000000 RBX: ffff880700a33800 RCX: 0000000000000000 [60185.259504] RDX: 0000000000000000 RSI: 0000000000020000 RDI: ffff880700a33848 [60185.260089] RBP: ffff8805ca563db0 R08: 0000000000017620 R09: ffff88083a397620 [60185.260669] R10: ffffea001be22200 R11: ffffffffa0502798 R12: 0000000000000000 [60185.261256] R13: 0000000000400000 R14: ffff880700a33848 R15: 0000000000000001 [60185.261840] FS: 0000000000000000(0000) GS:ffff88083a380000(0000) knlGS:0000000000000000 [60185.262514] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [60185.263000] CR2: 0000000000000050 CR3: 000000009e6a9000 CR4: 00000000001406e0 [60185.263603] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [60185.264187] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [60185.264770] Stack: [60185.264959] ffff880700a33800 ffff88075690d180 000026c70000006a 000026c70000006a [60185.265607] ffff8805ca563de0 ffffffffa0508344 ffff8805bf803c60 ffff88075690d180 [60185.266286] 00000000000074e4 000026c70000006a ffff8805ca563e18 ffffffffa04bd290 [60185.266944] Call Trace: [60185.267205] [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd] [60185.267694] [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph] [60185.268230] [<ffffffff8109d5db>] process_one_work+0x17b/0x470 [60185.268709] [<ffffffff8109e3ab>] worker_thread+0x11b/0x400 [60185.269198] [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400 [60185.269704] [<ffffffff810a5acf>] kthread+0xcf/0xe0 [60185.270123] [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170 [60185.270624] [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [60185.271173] [<ffffffff81645dd8>] ret_from_fork+0x58/0x90 [60185.271621] [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [60185.272167] Code: 43 48 02 fb 66 66 90 66 66 90 4d 85 ed 0f 85 77 ff ff ff 4c 8b ab 90 01 00 00 49 c1 ed 09 f6 05 f6 58 00 00 04 75 67 48 8b 43 10 <4c> 89 68 50 48 8b 7b 10 e8 e9 16 d1 e0 5b 44 89 e0 41 5c 41 5d [60185.274539] RIP [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd] [60185.275129] RSP <ffff8805ca563d90> [60185.276772] CR2: 0000000000000050
So a NULL deref on rbd_device::disk. I couldn't fully confirm the "rbd map" part (provided vmcore had been filtered to remove user pages), but it's the most plausible explanation.
Updated by Ilya Dryomov about 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Ilya Dryomov almost 8 years ago
- Status changed from Fix Under Review to Resolved
"rbd: fix rbd map vs notify races" in 4.6-rc6.
Actions