Project

General

Profile

Actions

Bug #15490

closed

rbd map vs notify race

Added by Ilya Dryomov about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
rbd
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

[60185.245893] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[60185.246636] IP: [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]
[60185.247216] PGD a778b067 PUD a7789067 PMD 0 
[60185.247626] Oops: 0002 [#1] SMP 
[60185.247969] Modules linked in: ext4 mbcache jbd2 rbd libceph dns_resolver xt_statistic xt_nat xt_mark veth xt_comment xt_multiport vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw virtio_balloon gf128mul glue_helper i2c_piix4 ablk_helper cryptd parport_pc parport pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk cirrus syscopyarea sysfillrect sysimgblt crct10dif_pclmul crct10dif_common drm_kms_helper ttm crc32c_intel serio_raw ata_piix drm libata
[60185.254483]  floppy virtio_pci virtio_ring i2c_core virtio dm_mirror dm_region_hash dm_log dm_mod
[60185.255186] CPU: 6 PID: 5462 Comm: kworker/u16:0 Not tainted 3.10.0-327.13.1.el7.x86_64 #1
[60185.255856] Hardware name: Red Hat OpenStack Compute, BIOS seabios-1.7.5-8.el7 04/01/2014
[60185.256567] Workqueue: ceph-watch-notify do_event_work [libceph]
[60185.257111] task: ffff880756980000 ti: ffff8805ca560000 task.ti: ffff8805ca560000
[60185.257717] RIP: 0010:[<ffffffffa050828a>]  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]
[60185.258476] RSP: 0018:ffff8805ca563d90  EFLAGS: 00010246
[60185.258926] RAX: 0000000000000000 RBX: ffff880700a33800 RCX: 0000000000000000
[60185.259504] RDX: 0000000000000000 RSI: 0000000000020000 RDI: ffff880700a33848
[60185.260089] RBP: ffff8805ca563db0 R08: 0000000000017620 R09: ffff88083a397620
[60185.260669] R10: ffffea001be22200 R11: ffffffffa0502798 R12: 0000000000000000
[60185.261256] R13: 0000000000400000 R14: ffff880700a33848 R15: 0000000000000001
[60185.261840] FS:  0000000000000000(0000) GS:ffff88083a380000(0000) knlGS:0000000000000000
[60185.262514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60185.263000] CR2: 0000000000000050 CR3: 000000009e6a9000 CR4: 00000000001406e0
[60185.263603] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[60185.264187] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[60185.264770] Stack:
[60185.264959]  ffff880700a33800 ffff88075690d180 000026c70000006a 000026c70000006a
[60185.265607]  ffff8805ca563de0 ffffffffa0508344 ffff8805bf803c60 ffff88075690d180
[60185.266286]  00000000000074e4 000026c70000006a ffff8805ca563e18 ffffffffa04bd290
[60185.266944] Call Trace:
[60185.267205]  [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
[60185.267694]  [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
[60185.268230]  [<ffffffff8109d5db>] process_one_work+0x17b/0x470
[60185.268709]  [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
[60185.269198]  [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
[60185.269704]  [<ffffffff810a5acf>] kthread+0xcf/0xe0
[60185.270123]  [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
[60185.270624]  [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
[60185.271173]  [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
[60185.271621]  [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
[60185.272167] Code: 43 48 02 fb 66 66 90 66 66 90 4d 85 ed 0f 85 77 ff ff ff 4c 8b ab 90 01 00 00 49 c1 ed 09 f6 05 f6 58 00 00 04 75 67 48 8b 43 10 <4c> 89 68 50 48 8b 7b 10 e8 e9 16 d1 e0 5b 44 89 e0 41 5c 41 5d 
[60185.274539] RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]
[60185.275129]  RSP <ffff8805ca563d90>
[60185.276772] CR2: 0000000000000050

So a NULL deref on rbd_device::disk. I couldn't fully confirm the "rbd map" part (provided vmcore had been filtered to remove user pages), but it's the most plausible explanation.

Actions #1

Updated by Ilya Dryomov about 8 years ago

  • Category set to rbd
Actions #2

Updated by Ilya Dryomov about 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #3

Updated by Ilya Dryomov almost 8 years ago

  • Status changed from Fix Under Review to Resolved

"rbd: fix rbd map vs notify races" in 4.6-rc6.

Actions

Also available in: Atom PDF