Project

General

Profile

Bug #2478

krbd: unmap on 3.4.0: scheduling while atomic...

Added by Andrey Korolyov almost 12 years ago. Updated over 11 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On 3.4.0 on the 'rbd unmap' call kernel rbd driver produces following trace(same for vanilla):

[ 8660.786269] BUG: scheduling while atomic: blkid/20559/0x00000002
[ 8660.786272] Modules linked in: rbd libceph vhost_net bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nfs lockd auth_rpcgss nfs_acl sunrpc hdaps(O) tp_smapi(O) thinkpad_ec(O) ipv6 joydev kvm_intel kvm loop tun hid_logitech_dj nvidia(PO) uvcvideo videobuf2_core qcserial videodev usb_wwan btusb usbserial videobuf2_vmalloc videobuf2_memops bluetooth thinkpad_acpi psmouse coretemp sdhci_pci hwmon video iwlwifi sdhci i2c_i801 backlight mmc_core nvram firewire_ohci mac80211 firewire_core cfg80211 xhci_hcd rfkill agpgart e1000e i2c_core
[ 8660.786327] Pid: 20559, comm: blkid Tainted: P O 3.4.0-gentoo #1
[ 8660.786330] Call Trace:
[ 8660.786334] [<ffffffff814563e8>] ? __schedule+0x96/0x625
[ 8660.786339] [<ffffffff81457579>] ? _raw_spin_unlock_irq+0x5/0x2e
[ 8660.786344] [<ffffffff81069d8a>] ? finish_task_switch+0x4c/0xc1
[ 8660.786348] [<ffffffff814568db>] ? __schedule+0x589/0x625
[ 8660.786355] [<ffffffff8120417a>] ? ip4_string+0x5a/0xc6
[ 8660.786360] [<ffffffff81204983>] ? string.isra.3+0x39/0x9f
[ 8660.786365] [<ffffffff81204af9>] ? ip4_addr_string.isra.5+0x5a/0x76
[ 8660.786371] [<ffffffff81204840>] ? number.isra.1+0x10e/0x218
[ 8660.786375] [<ffffffff81455258>] ? schedule_timeout+0x21/0x1af
[ 8660.786380] [<ffffffff81456159>] ? wait_for_common+0xc5/0x143
[ 8660.786385] [<ffffffff8106d3f1>] ? try_to_wake_up+0x217/0x217
[ 8660.786390] [<ffffffff8105f040>] ? flush_workqueue+0x130/0x2a5
[ 8660.786398] [<ffffffffa0cf33b9>] ? ceph_destroy_client+0x15/0x44 [libceph]
[ 8660.786403] [<ffffffffa002c2ae>] ? rbd_client_release+0x38/0x4b [rbd]
[ 8660.786409] [<ffffffffa002c719>] ? rbd_put_client.isra.10+0x28/0x3d [rbd]
[ 8660.786414] [<ffffffffa002d09d>] ? rbd_dev_release+0xc3/0x157 [rbd]
[ 8660.786419] [<ffffffff81114a43>] ? buffer_cpu_notify+0x75/0x75
[ 8660.786425] [<ffffffff81283c67>] ? device_release+0x41/0x72
[ 8660.786429] [<ffffffff811fe869>] ? kobject_release+0x4e/0x6a
[ 8660.786434] [<ffffffffa002c69f>] ? rbd_release+0x15/0x1a [rbd]
[ 8660.786439] [<ffffffff8111907f>] ? __blkdev_put+0xa6/0x14d
[ 8660.786444] [<ffffffff81119115>] ? __blkdev_put+0x13c/0x14d
[ 8660.786450] [<ffffffff810f3763>] ? fput+0xed/0x203
[ 8660.786455] [<ffffffff810f0d2a>] ? filp_close+0x62/0x6a
[ 8660.786460] [<ffffffff810f0db7>] ? sys_close+0x85/0xc6
[ 8660.786465] [<ffffffff8145beb9>] ? system_call_fastpath+0x16/0x1b

This stack appears asynchronously(with slow connection to monitor server there are two or three seconds after unmap call returned 0 in the shell), 0x00000002 address is always the same for different systems/compilers. Also I have managed to broke enumeration reuse once, probably by intensive guestfish/map operations, but unfortunately that one is unreproducible - new mappings has used /dev/rbdN+1 even if /dev/ and /sys shows no sights of previously mapped images. Any suggestions on further trace collection are welcome.

History

#1 Updated by Sage Weil almost 12 years ago

  • Status changed from New to In Progress

commit:cd9d9f5df6098c50726200d4185e9e8da32785b3 already fixes this; we just neglected to send it upstream for 3.4. I'll go in monday, and we'll send it to stable@ so that it gets into any 3.4 point releases.

leave bug open so we don't forget to do that!

#2 Updated by Greg Farnum almost 12 years ago

  • Assignee changed from Greg Farnum to Sage Weil

#3 Updated by Andrey Korolyov almost 12 years ago

with patch above, this trace appears after ~15 sec from unmap operation

[ 2453.203365] ------------[ cut here ]------------
[ 2453.203375] WARNING: at include/linux/kref.h:41 kobject_get+0x21/0x2a()
[ 2453.203378] Hardware name: System Product Name
[ 2453.203380] Modules linked in: btrfs ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse nfsd nfs nfs_acl auth_rpcgss lockd sunrpc kvm_intel kvm bridge stp llc ipv6 rbd libceph loop coretemp pcspkr psmouse hwmon firewire_ohci firewire_core
[ 2453.203417] Pid: 4933, comm: blkid Not tainted 3.4.0 #4
[ 2453.203419] Call Trace:
[ 2453.203426] [<ffffffff8104e1bc>] ? warn_slowpath_common+0x78/0x8c
[ 2453.203430] [<ffffffff81202a6d>] ? kobject_get+0x21/0x2a
[ 2453.203436] [<ffffffff8128d36f>] ? get_device+0x11/0x17
[ 2453.203442] [<ffffffffa00666e0>] ? rbd_open+0x24/0x4d [rbd]
[ 2453.203447] [<ffffffff8111f84d>] ? __blkdev_get+0xfb/0x3ce
[ 2453.203451] [<ffffffff8111fdda>] ? blkdev_get+0x2ba/0x2ba
[ 2453.203454] [<ffffffff8111fcf0>] ? blkdev_get+0x1d0/0x2ba
[ 2453.203465] [<ffffffff8146c963>] ? _raw_spin_unlock+0x5/0x27
[ 2453.203468] [<ffffffff8111fdda>] ? blkdev_get+0x2ba/0x2ba
[ 2453.203472] [<ffffffff810f6e56>] ? __dentry_open+0x188/0x2c4
[ 2453.203476] [<ffffffff8146c963>] ? _raw_spin_unlock+0x5/0x27
[ 2453.203481] [<ffffffff8110455a>] ? do_last+0x7a1/0x7b5
[ 2453.203484] [<ffffffff8110472f>] ? path_openat+0xcf/0x385
[ 2453.203488] [<ffffffff81104aa7>] ? do_filp_open+0x2a/0x6e
[ 2453.203492] [<ffffffff8146c963>] ? _raw_spin_unlock+0x5/0x27
[ 2453.203496] [<ffffffff8110e31f>] ? alloc_fd+0xe6/0xf9
[ 2453.203500] [<ffffffff810f7e59>] ? do_sys_open+0xf8/0x17f
[ 2453.203504] [<ffffffff814712f9>] ? system_call_fastpath+0x16/0x1b
[ 2453.203507] ---[ end trace 759012b630f6cbae ]---
[ 2453.232895] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2453.232934] IP: [<ffffffffa0066ffc>] rbd_dev_release+0x22/0x157 [rbd]
[ 2453.232961] PGD 101949067 PUD 101948067 PMD 0
[ 2453.232988] Oops: 0000 [#1] PREEMPT SMP
[ 2453.233014] CPU 0
[ 2453.233019] Modules linked in: btrfs ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse nfsd nfs nfs_acl auth_rpcgss lockd sunrpc kvm_intel kvm bridge stp llc ipv6 rbd libceph loop coretemp pcspkr psmouse hwmon firewire_ohci firewire_core
[ 2453.233055]
[ 2453.233055] Pid: 4933, comm: blkid Tainted: G W 3.4.0 #4 System manufacturer System Product Name/P5N-D
[ 2453.233055] RIP: 0010:[<ffffffffa0066ffc>] [<ffffffffa0066ffc>] rbd_dev_release+0x22/0x157 [rbd]
[ 2453.233055] RSP: 0018:ffff880112bd9e08 EFLAGS: 00010286
[ 2453.233055] RAX: 0000000000000000 RBX: ffff88011454fa38 RCX: 0000000000000000
[ 2453.233055] RDX: ffff88011454fa48 RSI: ffff880016d89c00 RDI: ffff88011454fa38
[ 2453.233055] RBP: ffff88011454f800 R08: 0000000000000000 R09: 0000000000000000
[ 2453.233055] R10: 0000000000000000 R11: ffff880122706400 R12: ffff880128ba2908
[ 2453.233055] R13: ffffffff81674be0 R14: ffff880016d8bc00 R15: ffff88012ac616d8
[ 2453.233055] FS: 00007f849e004740(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
[ 2453.233055] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2453.233055] CR2: 0000000000000000 CR3: 0000000101894000 CR4: 00000000000027f0
[ 2453.233055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2453.233055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2453.233055] Process blkid (pid: 4933, threadinfo ffff880112bd8000, task ffff88010181d1c0)
[ 2453.233055] Stack:
[ 2453.233055] ffff88012ac0c180 ffff88012ac0b600 0000000300313030 0000000000000000
[ 2453.233055] 0000000000000000 00000000ffffff9c ffff880108c5f6c0 ffff88011454fa48
[ 2453.233055] ffff880128ba2908 ffffffff8128d66f ffff88011454fa80 ffffffff81202b95
[ 2453.233055] Call Trace:
[ 2453.233055] [<ffffffff8128d66f>] ? device_release+0x41/0x72
[ 2453.233055] [<ffffffff81202b95>] ? kobject_release+0x4e/0x6a
[ 2453.233055] [<ffffffffa00666b7>] ? rbd_release+0x15/0x1a [rbd]
[ 2453.233055] [<ffffffff8111f58b>] ? __blkdev_put+0xa6/0x14d
[ 2453.233055] [<ffffffff810f9c0f>] ? fput+0xed/0x203
[ 2453.233055] [<ffffffff810f71d6>] ? filp_close+0x62/0x6a
[ 2453.233055] [<ffffffff810f7263>] ? sys_close+0x85/0xc6
[ 2453.233055] [<ffffffff814712f9>] ? system_call_fastpath+0x16/0x1b
[ 2453.233055] Code: 41 5c 41 5d 41 5e 41 5f c3 41 54 55 48 8d af c8 fd ff ff 53 48 89 fb 48 83 ec 30 48 8b 77 90 48 85 f6 74 16 48 8b 87 e0 fd ff ff <48> 8b 38 48 81 c7 90 01 00 00 e8 48 6a fe ff 48 83 7b 88 00 74
[ 2453.233055] RIP [<ffffffffa0066ffc>] rbd_dev_release+0x22/0x157 [rbd]
[ 2453.233055] RSP <ffff880112bd9e08>
[ 2453.233055] CR2: 0000000000000000
[ 2453.234501] ---[ end trace 759012b630f6cbaf ]---

#4 Updated by Sage Weil over 11 years ago

  • Project changed from Ceph to Linux kernel client

#5 Updated by Sage Weil over 11 years ago

  • Status changed from In Progress to New
  • Priority changed from Normal to Urgent

#6 Updated by Sage Weil over 11 years ago

  • Project changed from Linux kernel client to rbd

#7 Updated by Sage Weil over 11 years ago

  • Assignee deleted (Sage Weil)

#8 Updated by Josh Durgin over 11 years ago

  • Subject changed from rbd unmap on 3.4.0: scheduling while atomic... to krbd: unmap on 3.4.0: scheduling while atomic...

#9 Updated by Sage Weil over 11 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF