Project

General

Profile

Actions

Bug #3427

closed

krbd: unmap does not remove block device properly

Added by Maciej Galkiewicz over 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Problem caused by unmapping rbd volume:

# rbd map redis -p devise-test --user admin --secret /tmp/secret
# mkfs.xfs /dev/rbd/devise-test/redis
# rbd unmap /dev/rbd/devise-test/redis 

# ls /dev/rbd* -lR
brw-rw---T 1 root disk 252, 0 Oct 30 10:06 /dev/rbd0

/dev/rbd:
total 0
drwxr-xr-x 2 root root 60 Oct 30 10:06 devise-test

/dev/rbd/devise-test:
total 0
lrwxrwxrwx 1 root root 10 Oct 30 10:06 redis -> ../../rbd0
# rbd showmapped 
id      pool    image   snap    device
Oct 30 10:06:18 cc2 kernel: [32556.256155] libceph: client0 fsid d875aad2-f19d-4ea5-aa0e-d397a27b53f6
Oct 30 10:06:18 cc2 kernel: [32556.258906] libceph: mon0 10.1.128.1:6789 session established
Oct 30 10:06:18 cc2 kernel: [32556.262503]  rbd0: unknown partition table
Oct 30 10:06:18 cc2 kernel: [32556.263816] rbd: rbd0: added with size 0x140000000
Oct 30 10:06:24 cc2 kernel: [32561.834339] ------------[ cut here ]------------
Oct 30 10:06:24 cc2 kernel: [32561.835850] WARNING: at /build/buildd-linux_3.2.23-1-amd64-zj7gxu/linux-3.2.23/lib/kref.c:34 kref_get+0x1b/0x21()
Oct 30 10:06:24 cc2 kernel: [32561.837080] Hardware name: System Product Name
Oct 30 10:06:24 cc2 kernel: [32561.838158] Modules linked in: rbd libceph act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_hfsc sch_htb sch_ingress sch_sfq xt_statistic xt_CT xt_connlimit xt_realm xt_addrtype iptable_raw xt_comment xt_recent ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip ip6_queue nf_nat_pptp nf_nat_proto_gre nf_nat_irc xt_set nf_nat_h323 nf_nat_ftp nf_nat_amanda ip_set nf_conntrack_tftp nf_conntrack_sane nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast ts_kmp nf_conntrack_irc nf_conntrack_amanda nf_conntrack_h323 nf_conntrack_ftp xt_time xt_TPROXY nf_tproxy_core xt_TCPMSS xt_tcpmss xt_NFLOG xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit 
Oct 30 10:06:24 cc2 kernel: xt_DSCP xt_dscp xt_dccp xt_connmark xt_AUDIT xt_CLASSIFY ipt_LOG ip6t_LOG ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_defrag_ipv6 iptable_nat nf_nat xt_conntrack nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle ip6table_raw ip6table_mangle nfnetlink iptable_filter ip_tables ip6table_filter ip6_tables x_tables rmd160 sha1_ssse3 sha1_generic hmac crypto_null camellia lzo cast6 cast5 deflate cts ctr gcm ccm serpent blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp rng_core af_key acpi_cpufreq cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative ext4 crc16 jbd2 ext3 mbcache jbd nf_conntrack loop coretemp crc32c_intel snd_pcm ghas
Oct 30 10:06:24 cc2 kernel: h_clmulni_intel aesni_intel snd_page_alloc i915 i2c_i801 snd_timer snd aes_x86_64 soundcore drm_kms_helper aes_generic eeepc_wmi drm asus_wmi sparse_keymap rfkill pcspkr i2c_algo_bit cryptd evdev video i2c_core iTCO_wdt iTCO_vendor_support button wmi xfs btrfs crc32c libcrc32c zlib_deflate dm_mod mperf aacraid 3w_9xxx 3w_xxxx raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 md_mod sata_nv sata_sil sata_via sg sd_mod crc_t10dif ehci_hcd e1000e ahci libahci libata xhci_hcd scsi_mod r8169 mii usbcore thermal fan usb_common processor thermal_sys [last unloaded: scsi_wait_scan]
Oct 30 10:06:24 cc2 kernel: [32561.868233] Pid: 8557, comm: blkid Not tainted 3.2.0-3-amd64 #1
Oct 30 10:06:24 cc2 kernel: [32561.869643] Call Trace:
Oct 30 10:06:24 cc2 kernel: [32561.870986]  [<ffffffff81046901>] ? warn_slowpath_common+0x78/0x8c
Oct 30 10:06:24 cc2 kernel: [32561.872364]  [<ffffffff811ab6a6>] ? kref_get+0x1b/0x21
Oct 30 10:06:24 cc2 kernel: [32561.873708]  [<ffffffff811aa579>] ? kobject_get+0x12/0x17
Oct 30 10:06:24 cc2 kernel: [32561.875177]  [<ffffffff8124b9b7>] ? get_device+0x11/0x17
Oct 30 10:06:24 cc2 kernel: [32561.876589]  [<ffffffffa089f6ea>] ? rbd_open+0x24/0x4d [rbd]
Oct 30 10:06:24 cc2 kernel: [32561.877900]  [<ffffffff811215b7>] ? __blkdev_get+0xec/0x3a9
Oct 30 10:06:24 cc2 kernel: [32561.879273]  [<ffffffff81121b1b>] ? blkdev_get+0x2a7/0x2a7
Oct 30 10:06:24 cc2 kernel: [32561.880643]  [<ffffffff81121a3b>] ? blkdev_get+0x1c7/0x2a7
Oct 30 10:06:24 cc2 kernel: [32561.881943]  [<ffffffff81036457>] ? should_resched+0x5/0x23
Oct 30 10:06:24 cc2 kernel: [32561.883202]  [<ffffffff81121b1b>] ? blkdev_get+0x2a7/0x2a7
Oct 30 10:06:24 cc2 kernel: [32561.884464]  [<ffffffff810f7c66>] ? __dentry_open+0x19c/0x2b3
Oct 30 10:06:24 cc2 kernel: [32561.885692]  [<ffffffff811010c8>] ? dget+0x12/0x1e
Oct 30 10:06:24 cc2 kernel: [32561.886909]  [<ffffffff81104239>] ? do_last+0x553/0x58d
Oct 30 10:06:24 cc2 kernel: [32561.888160]  [<ffffffff8110486b>] ? path_openat+0xce/0x32a
Oct 30 10:06:24 cc2 kernel: [32561.889393]  [<ffffffff810cd779>] ? pte_offset_kernel+0x16/0x35
Oct 30 10:06:24 cc2 kernel: [32561.890700]  [<ffffffff81104b89>] ? do_filp_open+0x2a/0x6e
Oct 30 10:06:24 cc2 kernel: [32561.892176]  [<ffffffff813499cf>] ? _cond_resched+0x7/0x1c
Oct 30 10:06:24 cc2 kernel: [32561.893588]  [<ffffffff811b29d9>] ? __strncpy_from_user+0x18/0x48
Oct 30 10:06:24 cc2 kernel: [32561.894932]  [<ffffffff8110d8bf>] ? alloc_fd+0x64/0x109
Oct 30 10:06:24 cc2 kernel: [32561.896313]  [<ffffffff810f8b3b>] ? do_sys_open+0x5e/0xe5
Oct 30 10:06:24 cc2 kernel: [32561.897561]  [<ffffffff8134fc92>] ? system_call_fastpath+0x16/0x1b
Oct 30 10:06:24 cc2 kernel: [32561.898754] ---[ end trace dea810ce5dc00ee3 ]---
Oct 30 10:06:24 cc2 kernel: [32561.903958] ------------[ cut here ]------------
Oct 30 10:06:24 cc2 kernel: [32561.905281] kernel BUG at /build/buildd-linux_3.2.23-1-amd64-zj7gxu/linux-3.2.23/net/ceph/osd_client.c:682!
Oct 30 10:06:24 cc2 kernel: [32561.906636] invalid opcode: 0000 [#1] SMP 
Oct 30 10:06:24 cc2 kernel: [32561.907927] CPU 0 
Oct 30 10:06:24 cc2 kernel: [32561.907938] Modules linked in: rbd libceph act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_hfsc sch_htb sch_ingress sch_sfq xt_statistic xt_CT xt_connlimit xt_realm xt_addrtype iptable_raw xt_comment xt_recent ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip ip6_queue nf_nat_pptp nf_nat_proto_gre nf_nat_irc xt_set nf_nat_h323 nf_nat_ftp nf_nat_amanda ip_set nf_conntrack_tftp nf_conntrack_sane nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast ts_kmp nf_conntrack_irc nf_conntrack_amanda nf_conntrack_h323 nf_conntrack_ftp xt_time xt_TPROXY nf_tproxy_core xt_TCPMSS xt_tcpmss xt_NFLOG xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit 
Oct 30 10:06:24 cc2 kernel: xt_DSCP xt_dscp xt_dccp xt_connmark xt_AUDIT xt_CLASSIFY ipt_LOG ip6t_LOG ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_defrag_ipv6 iptable_nat nf_nat xt_conntrack nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle ip6table_raw ip6table_mangle nfnetlink iptable_filter ip_tables ip6table_filter ip6_tables x_tables rmd160 sha1_ssse3 sha1_generic hmac crypto_null camellia lzo cast6 cast5 deflate cts ctr gcm ccm serpent blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp rng_core af_key acpi_cpufreq cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative ext4 crc16 jbd2 ext3 mbcache jbd nf_conntrack loop coretemp crc32c_intel snd_pcm ghas
Oct 30 10:06:24 cc2 kernel: h_clmulni_intel aesni_intel snd_page_alloc i915 i2c_i801 snd_timer snd aes_x86_64 soundcore drm_kms_helper aes_generic eeepc_wmi drm asus_wmi sparse_keymap rfkill pcspkr i2c_algo_bit cryptd evdev video i2c_core iTCO_wdt iTCO_vendor_support button wmi xfs btrfs crc32c libcrc32c zlib_deflate dm_mod mperf aacraid 3w_9xxx 3w_xxxx raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 md_mod sata_nv sata_sil sata_via sg sd_mod crc_t10dif ehci_hcd e1000e ahci libahci libata xhci_hcd scsi_mod r8169 mii usbcore thermal fan usb_common processor thermal_sys [last unloaded: scsi_wait_scan]
Oct 30 10:06:24 cc2 kernel: [32561.934702] 
Oct 30 10:06:24 cc2 kernel: [32561.935831] Pid: 8532, comm: rbd Tainted: G        W    3.2.0-3-amd64 #1 System manufacturer System Product Name/P8H77-M PRO
Oct 30 10:06:24 cc2 kernel: [32561.937014] RIP: 0010:[<ffffffffa0882a4e>]  [<ffffffffa0882a4e>] __remove_osd+0x14/0x41 [libceph]
Oct 30 10:06:24 cc2 kernel: [32561.938219] RSP: 0018:ffff880560877d68  EFLAGS: 00010293
Oct 30 10:06:24 cc2 kernel: [32561.939456] RAX: ffff8807fe1f0c98 RBX: ffff8807fe1f0800 RCX: 0000000000000000
Oct 30 10:06:24 cc2 kernel: [32561.940703] RDX: 0000000000000000 RSI: ffff8807fe1f0800 RDI: ffff88071d1f51a8
Oct 30 10:06:24 cc2 kernel: [32561.942076] RBP: ffff88071d1f5200 R08: 0000000000000000 R09: ffffffff8168da00
Oct 30 10:06:24 cc2 kernel: [32561.943531] R10: ffff880593bee1c0 R11: ffff880593bee1c0 R12: ffff88071d1f5220
Oct 30 10:06:24 cc2 kernel: [32561.944912] R13: ffffffff81653710 R14: ffff8806f1eeb018 R15: ffff88056f80abc0
Oct 30 10:06:24 cc2 kernel: [32561.946333] FS:  00007f94fa29c760(0000) GS:ffff88081fa00000(0000) knlGS:0000000000000000
Oct 30 10:06:24 cc2 kernel: [32561.947717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 30 10:06:24 cc2 kernel: [32561.948999] CR2: 00007fbc62483000 CR3: 00000005e3427000 CR4: 00000000001406f0
Oct 30 10:06:24 cc2 kernel: [32561.950280] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 30 10:06:24 cc2 kernel: [32561.951567] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 30 10:06:24 cc2 kernel: [32561.952881] Process rbd (pid: 8532, threadinfo ffff880560876000, task ffff880593bee1c0)
Oct 30 10:06:24 cc2 kernel: [32561.954186] Stack:
Oct 30 10:06:24 cc2 kernel: [32561.955516]  ffff88071d1f51a8 ffffffffa0882aee ffff88071d1f5000 ffffffffa089f1aa
Oct 30 10:06:24 cc2 kernel: [32561.956822]  ffff8806e5f0bc80 ffffffffa087c3b4 ffff880582253c50 ffffffffa089f1f5
Oct 30 10:06:24 cc2 kernel: [32561.958135]  ffff880582253c50 ffffffff811ab682 0000000000000000 ffff8805aa1dda40
Oct 30 10:06:24 cc2 kernel: [32561.959444] Call Trace:
Oct 30 10:06:24 cc2 kernel: [32561.960731]  [<ffffffffa0882aee>] ? ceph_osdc_stop+0x73/0xaa [libceph]
Oct 30 10:06:24 cc2 kernel: [32561.962055]  [<ffffffffa089f1aa>] ? kzalloc+0xb/0xb [rbd]
Oct 30 10:06:24 cc2 kernel: [32561.963386]  [<ffffffffa087c3b4>] ? ceph_destroy_client+0x10/0x44 [libceph]
Oct 30 10:06:24 cc2 kernel: [32561.964859]  [<ffffffffa089f1f5>] ? rbd_client_release+0x4b/0x5e [rbd]
Oct 30 10:06:24 cc2 kernel: [32561.966341]  [<ffffffff811ab682>] ? kref_put+0x3e/0x47
Oct 30 10:06:24 cc2 kernel: [32561.967775]  [<ffffffffa089ffe9>] ? rbd_dev_release+0xcb/0x16d [rbd]
Oct 30 10:06:24 cc2 kernel: [32561.969205]  [<ffffffff8114f378>] ? sysfs_remove_dir+0x9f/0xaa
Oct 30 10:06:24 cc2 kernel: [32561.970536]  [<ffffffff8124bb18>] ? device_release+0x41/0x72
Oct 30 10:06:24 cc2 kernel: [32561.971937]  [<ffffffff811aa54a>] ? kobject_release+0x4f/0x6c
Oct 30 10:06:24 cc2 kernel: [32561.973300]  [<ffffffff811aa4fb>] ? kobject_del+0x2d/0x2d
Oct 30 10:06:24 cc2 kernel: [32561.974612]  [<ffffffff811ab682>] ? kref_put+0x3e/0x47
Oct 30 10:06:24 cc2 kernel: [32561.975807]  [<ffffffffa089f163>] ? rbd_remove+0x104/0x11e [rbd]
Oct 30 10:06:24 cc2 kernel: [32561.976995]  [<ffffffff81036457>] ? should_resched+0x5/0x23
Oct 30 10:06:24 cc2 kernel: [32561.978282]  [<ffffffff8114dca7>] ? sysfs_write_file+0xe0/0x11c
Oct 30 10:06:24 cc2 kernel: [32561.979521]  [<ffffffff810f9529>] ? vfs_write+0xa2/0xe9
Oct 30 10:06:24 cc2 kernel: [32561.980626]  [<ffffffff810f9706>] ? sys_write+0x45/0x6b
Oct 30 10:06:24 cc2 kernel: [32561.981696]  [<ffffffff8134fc92>] ? system_call_fastpath+0x16/0x1b
Oct 30 10:06:24 cc2 kernel: [32561.982761] Code: 48 8b 47 10 ff 50 38 48 89 df 5b e9 fa 7d 86 e0 5b c3 48 8b 3f eb bd 53 48 8d 86 98 04 00 00 48 39 86 98 04 00 00 48 89 f3 74 02 <0f> 0b 48 8d 77 78 48 8d 7b 18 e8 9e b4 92 e0 48 8d bb b8 04 00 
Oct 30 10:06:24 cc2 kernel: [32561.984974] RIP  [<ffffffffa0882a4e>] __remove_osd+0x14/0x41 [libceph]
Oct 30 10:06:24 cc2 kernel: [32561.986003]  RSP <ffff880560877d68>
Oct 30 10:06:24 cc2 kernel: [32561.998180] ---[ end trace dea810ce5dc00ee4 ]---
Oct 30 10:07:25 cc2 udevd[619]: timeout '/sbin/blkid -o udev -p /dev/rbd0'
Oct 30 10:07:26 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:27 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:28 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:29 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:30 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:31 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:32 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:33 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:34 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:35 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:36 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:37 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:38 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:39 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Oct 30 10:07:40 cc2 udevd[619]: timeout: killing '/sbin/blkid -o udev -p /dev/rbd0' [8557]
Actions #1

Updated by Alex Elder over 11 years ago

This appears to be using kernel 3.2.23. Lots of work has gone into
rbd in the last year, and I don't believe any of it has be ported
back to the 3.2 kernel.

A case could be made we could invest in doing this back port, but
I don't believe it will happen otherwise.

Actions #2

Updated by Nikola Kotur over 11 years ago

The same bug hit me in kernel 3.6.7.

Ceph version: 0.54

Kernel log:

[14716531.284184] ------------[ cut here ]------------
[14716531.284202] WARNING: at include/linux/kref.h:42 kobject_get+0x32/0x40()
[14716531.284208] Modules linked in: nfsd nfs_acl auth_rpcgss exportfs nfs fscache lockd sunrpc rbd libceph ghash_clmulni_intel aesni_intel ablk_helper cryptd aes_x86_64 ext4 crc16 jbd2 mbcache btrfs libcrc32c lzo_compress zlib_deflate
[14716531.284239] Pid: 3655, comm: blkid Not tainted 3.6.7-ec2-nm #53
[14716531.284244] Call Trace:
[14716531.284252]  [<ffffffff8103b8bb>] ? warn_slowpath_common+0x7b/0xc0
[14716531.284258]  [<ffffffff81192c32>] ? kobject_get+0x32/0x40
[14716531.284267]  [<ffffffff811dabb4>] ? get_device+0x14/0x20
[14716531.284280]  [<ffffffffa017e4f0>] ? rbd_open+0x30/0x70 [rbd]
[14716531.284288]  [<ffffffff8111b160>] ? __blkdev_get+0xd0/0x490
[14716531.284294]  [<ffffffff8111b56b>] ? blkdev_get+0x4b/0x330
[14716531.284304]  [<ffffffff810b1de8>] ? __alloc_pages_nodemask+0x168/0x840
[14716531.284310]  [<ffffffff8111b850>] ? blkdev_get+0x330/0x330
[14716531.284318]  [<ffffffff810e7e85>] ? do_dentry_open+0x245/0x2c0
[14716531.284324]  [<ffffffff810e8018>] ? finish_open+0x28/0x40
[14716531.284330]  [<ffffffff810f8732>] ? do_last+0x2b2/0xd60
[14716531.284337]  [<ffffffff810f92a1>] ? path_openat+0xc1/0x500
[14716531.284345]  [<ffffffff810c93ac>] ? handle_mm_fault+0x10c/0x280
[14716531.284351]  [<ffffffff810f980d>] ? do_filp_open+0x4d/0xc0
[14716531.284359]  [<ffffffff811049ab>] ? alloc_fd+0x4b/0x120
[14716531.284365]  [<ffffffff810e79a8>] ? do_sys_open+0x108/0x200
[14716531.284372]  [<ffffffff8129fdf9>] ? system_call_fastpath+0x16/0x1b
[14716531.284379] ---[ end trace 68d2affc4a081b02 ]---
[14716531.288046] ------------[ cut here ]------------
[14716531.288054] kernel BUG at net/ceph/osd_client.c:690!
[14716531.288059] invalid opcode: 0000 [#1] SMP 
[14716531.288064] Modules linked in: nfsd nfs_acl auth_rpcgss exportfs nfs fscache lockd sunrpc rbd libceph ghash_clmulni_intel aesni_intel ablk_helper cryptd aes_x86_64 ext4 crc16 jbd2 mbcache btrfs libcrc32c lzo_compress zlib_deflate
[14716531.288089] CPU 0 
[14716531.288093] Pid: 3647, comm: rbd Tainted: G        W    3.6.7-ec2-nm #53  
[14716531.288099] RIP: e030:[<ffffffffa01600dd>]  [<ffffffffa01600dd>] __remove_osd+0x5d/0x70 [libceph]
[14716531.288111] RSP: e02b:ffff880056221da8  EFLAGS: 00010293
[14716531.288116] RAX: ffff88005bdae488 RBX: ffff88005bdae000 RCX: 00000000000074d0
[14716531.288121] RDX: 0000000000000000 RSI: ffff88005bdae000 RDI: ffff880059749708
[14716531.288126] RBP: ffff880059749780 R08: 0000000000014940 R09: ffff880067802900
[14716531.288131] R10: ffffffffa016016c R11: 0000000000130000 R12: ffff880059749760
[14716531.288137] R13: ffff880066a55910 R14: ffff880066c3da18 R15: ffff880066866f00
[14716531.288146] FS:  00007f0292597760(0000) GS:ffff88006a000000(0000) knlGS:0000000000000000
[14716531.288151] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[14716531.288156] CR2: 00007f02905b7e50 CR3: 0000000056230000 CR4: 0000000000002660
[14716531.288161] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[14716531.288167] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[14716531.288173] Process rbd (pid: 3647, threadinfo ffff880056220000, task ffff88005b22a990)
[14716531.288178] Stack:
[14716531.288180]  ffff880059749708 ffffffffa016016c ffff880059749000 ffff880066e3bd50
[14716531.288188]  ffff880066863480 ffffffffa01594da ffff880066e3bd40 ffffffffa017d7dc
[14716531.288195]  ffff880066a55800 ffff880066a55800 ffff880066a55800 ffffffffa017d81d
[14716531.288202] Call Trace:
[14716531.288209]  [<ffffffffa016016c>] ? ceph_osdc_stop+0x7c/0xc0 [libceph]
[14716531.288216]  [<ffffffffa01594da>] ? ceph_destroy_client+0x1a/0x40 [libceph]
[14716531.288224]  [<ffffffffa017d7dc>] ? rbd_client_release+0x5c/0x80 [rbd]
[14716531.288230]  [<ffffffffa017d81d>] ? rbd_put_client+0x1d/0x30 [rbd]
[14716531.288235]  [<ffffffffa017e7a2>] ? rbd_dev_release+0xb2/0x160 [rbd]
[14716531.288242]  [<ffffffff811db36a>] ? device_release+0x1a/0x80
[14716531.288248]  [<ffffffff81192b93>] ? kobject_release+0x43/0xb0
[14716531.288253]  [<ffffffffa017d23a>] ? rbd_remove+0x15a/0x190 [rbd]
[14716531.288261]  [<ffffffff8114f472>] ? sysfs_write_file+0xd2/0x160
[14716531.288268]  [<ffffffff810e9d58>] ? vfs_write+0xc8/0x170
[14716531.288273]  [<ffffffff810e9efe>] ? sys_write+0x4e/0x90
[14716531.288278]  [<ffffffff8129fdf9>] ? system_call_fastpath+0x16/0x1b
[14716531.288283] Code: a8 04 00 00 48 89 51 08 48 8d 7b 30 48 89 0a 48 89 83 a8 04 00 00 48 89 83 b0 04 00 00 e8 dc d9 ff ff 48 89 df 5b e9 43 ff ff ff <0f> 0b eb fe 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 41 54 
[14716531.288326] RIP  [<ffffffffa01600dd>] __remove_osd+0x5d/0x70 [libceph]
[14716531.288334]  RSP <ffff880056221da8>
[14716531.288342] ---[ end trace 68d2affc4a081b03 ]---
Actions #3

Updated by Alex Elder over 11 years ago

  • Priority changed from Normal to High
Actions #4

Updated by Alex Elder over 11 years ago

  • Status changed from New to In Progress
  • Assignee set to Alex Elder
This latest report (based on looking at a fairly recent version
of the code at this point) is I think hitting this BUG_ON():

static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
{
        dout("__remove_osd %p\n", osd);
        BUG_ON(!list_empty(&osd->o_requests));       <--
        rb_erase(&osd->o_node, &osdc->osds);
        list_del_init(&osd->o_osd_lru);
        ceph_con_close(&osd->o_con);
        put_osd(osd);
}

And I think the end of the call path looks like:
    rbd_client_release(kref)
     ceph_destroy_client(rbdc->client)
      ceph_osdc_stop(&client->osdc)
       remove_all_osds(osdc)
        __remove_osd(osdc, osd)
         BUG_ON(!list_empty(&osd->o_requests));

I believe this is a different problem from what's originally
reported here, but I'm not going to split it off to another
bug until I can look verify that.
Actions #5

Updated by Nikola Kotur over 11 years ago

Indeed, that's where this is from. So, there are running, or lingering, requests on the OSD. Shouldn't it try to kick_osd_requests() if there are any, or log error and leave image mapped?

Actions #6

Updated by Alex Elder over 11 years ago

I haven't spent time on this in almost a month so wanted to just
provide an update. We have been looking at and trying to fix
another problem that could be closely connected to this (at least
to the report Nikola Kotur supplied). With the recently developed
fixes the underlying problem here may go away here too.

I'll take another look shortly and will update again if I conclude
anything new.

Actions #7

Updated by Alex Elder over 11 years ago

It looks to me like the osd client code has nothing in place
to protect itself from one of its users (ceph client, meaning
ceph fs or rbd) requesting a stop of an osd client while there
are requests to that osd in flight.

It's possible this latest crash could be a symptom of that.

Actions #8

Updated by Alex Elder over 11 years ago

I also note, having taken a little closer look at Nikola Kotur's
kernel log that both an open and a close appear to be happening
at the same time. This is evidenced by the back-to-back messages
in the log: a WARN_ON() in kref_get() called from get_device() in
rbd_open(); and a BUG_ON() in __remove_osd(), called, eventually,
via rbd_remove().

Actions #9

Updated by Sage Weil over 11 years ago

For cephfs, the vfs normally handles that.

For RBD, wasn't the use_count something we just added? Would it cover this situation?

Actions #10

Updated by Alex Elder over 11 years ago

For RBD, wasn't the use_count something we just added? Would it cover this situation?

No.

The first warning in the report is from a kref_get(), which finds the
reference count is 0 before it grabs another reference.

That is happening during rbd_open(), using this path:
rbd_open()
rbd_get_dev()
get_device()
kobject_get()
This is the first thing that happens within rbd_open(). So
that open call will be successful getting a reference to the
device.

The second message--the BUG_ON()--is occurring while attempting
to destroy an osd client. It may well be that activity due to
this open request is the reason an active request was found
while trying to tear down the client.

The use counting you're referring to prevents an active rbd device
from being removed, but it doesn't prevent a racing open from
occurring after deletion of an osd has been initiated.

Actions #11

Updated by Alex Elder over 11 years ago

I am going to assume that the racing open is the cause of
the problem reported by Nikola Kotur.

To fix it, I will add a flag to the rbd_device structure
that will be set in rbd_remove(), once it has determined
that the remove can proceed because nothing still holds
the device open.

It will be checked in rbd_open(), after rejecting attempts
to open a read-only mapped image for writing.

To do this I think I'll create a rdb_dev->flags field
whose bits will be used to represent this "removing"
flag, as well as the mapping existence flag now represented
using the rbd_dev->exists field.

Actions #12

Updated by Ian Colle over 11 years ago

  • Project changed from Ceph to rbd
Actions #13

Updated by Alex Elder over 11 years ago

Implementing the change I described now.

Actions #14

Updated by Alex Elder over 11 years ago

Patch posted for review. I'm not sure I'll be able to test
the scenario very well but hopefully it can be seen by
inspection it should prevent this problem.

Actions #15

Updated by Alex Elder over 11 years ago

  • Subject changed from rbd unmap does not remove block device properly to krbd: unmap does not remove block device properly
Actions #16

Updated by Alex Elder about 11 years ago

We had some discussion about the whether an atomic bit
operation for this was sufficient, or whether a memory
barrier was required, and in the end I concluded a
spinlock was going to be necessary. So that's my plan.
I'll still use a bit flag but I'll use a spinlock to
protect it.

Actions #17

Updated by Alex Elder about 11 years ago

  • Status changed from In Progress to Fix Under Review

I have posted two patches for review, the second of which
should fix this problem. I have not actually reproduced
the original problem yet, but I think the problem and this
fix can be understood by inspection.

I have an idea for a test that might reproduce it and I'll
give that a try later today or tomorrow.

Actions #18

Updated by Alex Elder about 11 years ago

I have opened this to cover developing that test script
http://tracker.ceph.com/issues/3949

Actions #19

Updated by Alex Elder about 11 years ago

  • Status changed from Fix Under Review to 7

Reviewed and ready to push to the ceph-client "testing" branch.

I'm going to wait a day or two before pushing this (and some
other commits) there. The "testing" branch has recently been
updated to be based on Linux 3.8-rc5 and I want to establish
a baseline for nightly test results using that.

Actions #20

Updated by Alex Elder about 11 years ago

  • Status changed from 7 to Resolved

commit bc7a62ee52cffc735cb8383b6d26648883f1a01e
Author: Alex Elder <>
Date: Mon Jan 14 12:43:31 2013 -0600

rbd: prevent open for image being removed
Actions

Also available in: Atom PDF