Bug #5429
closed
libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd
Added by Sage Weil almost 11 years ago.
Updated over 9 years ago.
Description
<1>[19828.585548] BUG: unable to handle kernel NULL pointer dereference at (null)
<1>[19828.593437] IP: [<ffffffff813185cb>] rb_erase+0x1bb/0x370
<4>[19828.598865] PGD 0
<4>[19828.600899] Oops: 0002 [#1] SMP
[dumpcommon]kdb> -bt
Stack traceback for pid 29967
0xffff88020dd03f20 29967 2 1 4 R 0xffff88020dd043a8 *kworker/4:1
ffff88020b257b48 0000000000000018 0000000000000000 ffff88020b257b68
ffffffffa05487bc ffff8802204e4000 ffff880224ec7950 ffff88020b257b98
ffffffffa0548abf ffff8802204e4030 ffff880224ec7950 0000000000000000
Call Trace:
[<ffffffffa05487bc>] ? __remove_osd+0x3c/0xa0 [libceph]
[<ffffffffa0548abf>] ? __reset_osd+0x12f/0x170 [libceph]
[<ffffffffa054a6de>] ? osd_reset+0x7e/0x2b0 [libceph]
[<ffffffffa0541e21>] ? con_work+0x571/0x2d50 [libceph]
[<ffffffff81080bb3>] ? idle_balance+0x133/0x180
[<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
[<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
[<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
[<ffffffff8105f3da>] ? process_one_work+0x1da/0x540
[<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
[<ffffffff810605bc>] ? worker_thread+0x11c/0x370
[<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
[<ffffffff8106727a>] ? kthread+0xea/0xf0
[<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
[<ffffffff8163ff9c>] ? ret_from_fork+0x7c/0xb0
[<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
[dumpall]kdb> -bta
but preceeded by 7 seconds earlier by
<4>[19778.015116] libceph: osd0 10.214.132.16:6801 socket closed (con state CONNECTING)
<3>[19799.355399] INFO: rcu_sched self-detected stall on CPU { 6} (t=2100 jiffies g=245350 c=245349 q=2640)
<4>[19799.364789] CPU: 6 PID: 19284 Comm: kworker/6:2 Tainted: G W 3.10.0-rc6-ceph-00091-g2dd322b #1
<4>[19799.374303] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<3>[19799.375424] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=2102 jiffies, g=245350, c=245349, q=2640)
<6>[19799.375425] Task dump for CPU 6:
<6>[19799.375429] kworker/6:2 R running task 0 19284 2 0x00000000
<6>[19799.375442] Workqueue: ceph-msgr con_work [libceph]
<4>[19799.375445] ffff880125ff7de8 ffffffff8105f3da ffffffff8105f36f ffff8802272d3a00
<4>[19799.375447] 0000000000000000 00000006272d2f98 ffff880125ff7fd8 ffff8802272d2f80
<4>[19799.375450] ffffffffa05666d0 0000000000000000 0000000000000000 ffffffffa055940e
<4>[19799.375451] Call Trace:
<4>[19799.375457] [<ffffffff8105f3da>] ? process_one_work+0x1da/0x540
<4>[19799.375459] [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
<4>[19799.375462] [<ffffffff810605bc>] worker_thread+0x11c/0x370
<4>[19799.375464] [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
<4>[19799.375468] [<ffffffff8106727a>] kthread+0xea/0xf0
<4>[19799.375471] [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.375476] [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
<4>[19799.375478] [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.480150] Workqueue: ceph-msgr con_work [libceph]
<4>[19799.485054] ffffffff81c4ca00 ffff8802272c3db8 ffffffff81630b82 ffff8802272c3e38
<4>[19799.492523] ffffffff810e285a 0000000000000006 ffff8802272cd4e0 ffff8802272c3de8
<4>[19799.499995] ffffffff810e644c 0000000000000086 0000000000000001 0000000000000006
<4>[19799.507469] Call Trace:
<4>[19799.509929] <IRQ> [<ffffffff81630b82>] dump_stack+0x19/0x1b
<4>[19799.515718] [<ffffffff810e285a>] rcu_check_callbacks+0x21a/0x710
<4>[19799.521831] [<ffffffff810e644c>] ? acct_account_cputime+0x1c/0x20
<4>[19799.528034] [<ffffffff81050f68>] update_process_times+0x48/0x80
<4>[19799.534062] [<ffffffff8109b616>] tick_sched_handle.isra.10+0x36/0x50
<4>[19799.540524] [<ffffffff8109b71c>] tick_sched_timer+0x4c/0x80
<4>[19799.546203] [<ffffffff8106a841>] __run_hrtimer+0x81/0x1e0
<4>[19799.551709] [<ffffffff8109b6d0>] ? tick_nohz_handler+0xa0/0xa0
<4>[19799.557647] [<ffffffff8106b147>] hrtimer_interrupt+0x107/0x260
<4>[19799.563588] [<ffffffff81641b69>] smp_apic_timer_interrupt+0x69/0x99
<4>[19799.569964] [<ffffffff81640caf>] apic_timer_interrupt+0x6f/0x80
<4>[19799.575987] <EOI> [<ffffffff8112edec>] ? shrink_inactive_list+0x18c/0x400
<4>[19799.582995] [<ffffffff81637590>] ? _raw_spin_unlock_irq+0x30/0x40
<4>[19799.589195] [<ffffffff81637595>] ? _raw_spin_unlock_irq+0x35/0x40
<4>[19799.595397] [<ffffffff8112edec>] shrink_inactive_list+0x18c/0x400
<4>[19799.601595] [<ffffffff8112f66d>] shrink_lruvec+0x2cd/0x4d0
<4>[19799.607187] [<ffffffff8119855b>] ? bdi_queue_work+0x8b/0xf0
<4>[19799.612869] [<ffffffff8112fc1c>] do_try_to_free_pages+0x11c/0x3a0
<4>[19799.619068] [<ffffffff81130066>] try_to_free_pages+0xd6/0x1b0
<4>[19799.624922] [<ffffffff811375b0>] ? next_zone+0x30/0x40
<4>[19799.630165] [<ffffffff81125406>] __alloc_pages_nodemask+0x596/0x8f0
<4>[19799.636541] [<ffffffff8115bb1a>] alloc_pages_current+0xba/0x170
<4>[19799.642569] [<ffffffff81516d3e>] sk_page_frag_refill+0x7e/0x130
<4>[19799.648593] [<ffffffff8156f5a5>] tcp_sendmsg+0x305/0xe50
<4>[19799.654010] [<ffffffff8159af99>] inet_sendmsg+0xb9/0xf0
<4>[19799.659339] [<ffffffff8159aee5>] ? inet_sendmsg+0x5/0xf0
<4>[19799.664760] [<ffffffff81510de2>] sock_sendmsg+0xc2/0xe0
<4>[19799.670090] [<ffffffff812ee35b>] ? chksum_update+0x1b/0x30
<4>[19799.675686] [<ffffffff812ea1e8>] ? crypto_shash_update+0x18/0x30
<4>[19799.681814] [<ffffffffa0000056>] ? crc32c+0x56/0x7c [libcrc32c]
<4>[19799.687842] [<ffffffff81510e40>] kernel_sendmsg+0x40/0x60
<4>[19799.693353] [<ffffffffa05424d8>] con_work+0xc28/0x2d50 [libceph]
<4>[19799.699468] [<ffffffff81080bb3>] ? idle_balance+0x133/0x180
<4>[19799.705145] [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
<4>[19799.711257] [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
<4>[19799.717371] [<ffffffff8105f3da>] process_one_work+0x1da/0x540
<4>[19799.723220] [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
<4>[19799.729247] [<ffffffff810605bc>] worker_thread+0x11c/0x370
<4>[19799.734840] [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
<4>[19799.741388] [<ffffffff8106727a>] kthread+0xea/0xf0
<4>[19799.746283] [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.752658] [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
<4>[19799.758076] [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<3>[19820.788280] INFO: rcu_sched self-detected stall on CPU
<3>[19820.788286] INFO: rcu_sched self-detected stall on CP
<4>[19820.788287]
job was
ubuntu@teuthology:/a/teuthology-2013-06-22_01:00:51-kernel-next-testing-basic/42857$ cat orig.config.yaml
kernel:
kdb: true
sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32
machine_type: plana
nuke-on-error: true
overrides:
admin_socket:
branch: next
ceph:
conf:
mon:
debug mon: 20
debug ms: 20
debug paxos: 20
osd:
filestore flush min: 0
osd op thread timeout: 60
fs: btrfs
log-whitelist:
- slow request
sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
install:
ceph:
sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
s3tests:
branch: next
workunit:
sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- thrashosds: null
- kclient: null
- workunit:
clients:
all:
- suites/ffsb.sh
Files
- Assignee set to Josh Durgin
hit this again, ubuntu@teuthology:/a/teuthology-2013-06-28_01:01:07-kernel-master-testing-basic/48683
plana72 still sitting in kdb.
- Priority changed from Urgent to High
hit this again, ubuntu@teuthology:/a/teuthology-2013-08-14_01:01:26-kcephfs-next-testing-basic-plana/106215
it was here:
static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
{
a78d: 48 89 e5 mov %rsp,%rbp
a790: 41 54 push %r12
a792: 49 89 fc mov %rdi,%r12
a795: 53 push %rbx
a796: 48 89 f3 mov %rsi,%rbx
dout("__remove_osd %p\n", osd);
a799: 75 61 jne a7fc <__remove_osd+0x7c>
BUG_ON(!list_empty(&osd->o_requests));
a79b: 48 8d 83 38 05 00 00 lea 0x538(%rbx),%rax
a7a2: 48 39 83 38 05 00 00 cmp %rax,0x538(%rbx)
a7a9: 75 6b jne a816 <__remove_osd+0x96>
rb_erase(&osd->o_node, &osdc->osds);
a7ab: 49 8d b4 24 60 01 00 lea 0x160(%r12),%rsi
a7b2: 00
a7b3: 48 8d 7b 18 lea 0x18(%rbx),%rdi
a7b7: e8 00 00 00 00 callq a7bc <__remove_osd+0x3c>
a7b8: R_X86_64_PC32 rb_erase+0xfffffffffffffffc
* in an undefined state.
*/
#ifndef CONFIG_DEBUG_LIST
static inline void __list_del_entry(struct list_head *entry)
{
__list_del(entry->prev, entry->next);
a7bc: 48 8b 8b 58 05 00 00 mov 0x558(%rbx),%rcx
^^^^^^^^^^^^^^^^^
a7c3: 48 8b 93 60 05 00 00 mov 0x560(%rbx),%rdx
list_del_init(&osd->o_osd_lru);
a7ca: 48 8d 83 58 05 00 00 lea 0x558(%rbx),%rax
ceph_con_close(&osd->o_con);
a7d1: 48 8d 7b 30 lea 0x30(%rbx),%rdi
* This is only for internal list manipulation where we know
* the prev/next entries already!
*/
static inline void __list_del(struct list_head * prev, struct list_head * next)
<6>[17485.734714] rbd1: unknown partition table
<4>[17485.735740] libceph: mon2 10.214.132.4:6790 socket closed (con state OPEN)
<6>[17485.735759] libceph: mon2 10.214.132.4:6790 session lost, hunting for new mon
<6>[17485.737794] libceph: mon2 10.214.132.4:6790 session established
<6>[17485.759013] rbd: rbd1: added with size 0x40000000
<4>[17485.858921] libceph: osd2 10.214.132.4:6808 socket closed (con state OPEN)
<6>[17485.966118] libceph: client4411 fsid 1af5918f-c950-454f-9769-f3b857fac855
<6>[17485.974997] libceph: mon1 10.214.132.38:6789 session established
<6>[17486.017942] rbd1: unknown partition table
<6>[17486.022331] rbd: rbd1: added with size 0x40000000
<4>[17486.199275] libceph: osd3 10.214.132.38:6809 socket closed (con state OPEN)
<6>[17486.233630] libceph: client4445 fsid 1af5918f-c950-454f-9769-f3b857fac855
<6>[17486.242439] libceph: mon2 10.214.132.4:6790 session established
<6>[17486.277232] rbd1: unknown partition table
<6>[17486.281544] rbd: rbd1: added with size 0x40000000
<4>[17486.331466] libceph: osd2 10.214.132.4:6808 socket closed (con state OPEN)
<4>[17486.381566] libceph: osd2 10.214.132.4:6808 socket closed (con state OPEN)
...
[2]kdb> bt
Stack traceback for pid 25803
0xffff8802238dbf20 25803 13496 1 2 R 0xffff8802238dc3a8 *rbd
ffff88020d87bd68 0000000000000018 ffff8801b5ff7950 ffff88020d87bd88
ffffffffa05fe7bc ffff8801b5ff7950 ffff8801b5ff7ab0 ffff88020d87bdb8
ffffffffa0602d24 ffff88012c8f16c0 ffff8801b5ff7000 ffff88012c8f16c0
Call Trace:
[<ffffffffa05fe7bc>] ? __remove_osd+0x3c/0xa0 [libceph]
[<ffffffffa0602d24>] ? ceph_osdc_stop+0xa4/0x110 [libceph]
[<ffffffffa05f4790>] ? ceph_destroy_client+0x30/0xa0 [libceph]
[<ffffffffa022fb41>] ? rbd_client_release+0x71/0xb0 [rbd]
[<ffffffffa0230798>] ? rbd_put_client+0x28/0x30 [rbd]
[<ffffffffa02307ba>] ? rbd_dev_destroy+0x1a/0x40 [rbd]
[<ffffffffa023083b>] ? rbd_dev_image_release+0x5b/0x70 [rbd]
[<ffffffffa0231095>] ? rbd_remove+0x155/0x180 [rbd]
[<ffffffff81407187>] ? bus_attr_store+0x27/0x30
[<ffffffff811f2d66>] ? sysfs_write_file+0xe6/0x170
[<ffffffff8117feae>] ? vfs_write+0xce/0x200
[<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
[<ffffffff811803b5>] ? SyS_write+0x55/0xa0
[<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b
ubuntu@teuthology:/a/teuthology-2013-09-02_01:01:32-krbd-master-testing-basic-plana/17253$ cat orig.config.yaml
kernel:
kdb: true
sha1: 263cbbcaf605e359a46e30889595d82629f82080
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
admin_socket:
branch: master
ceph:
conf:
global:
ms inject socket failures: 500
mon:
debug mon: 20
debug ms: 1
debug paxos: 20
osd:
osd op thread timeout: 60
fs: btrfs
log-whitelist:
- slow request
sha1: 1c5e58a85ef7f26b2c617ecb6c08de5632bb0fe3
ceph-deploy:
branch:
dev: master
conf:
client:
log file: /var/log/ceph/ceph-$name.$pid.log
mon:
debug mon: 1
debug ms: 20
debug paxos: 20
install:
ceph:
sha1: 1c5e58a85ef7f26b2c617ecb6c08de5632bb0fe3
s3tests:
branch: master
workunit:
sha1: 1c5e58a85ef7f26b2c617ecb6c08de5632bb0fe3
roles:
- - mon.a
- mon.c
- osd.0
- osd.1
- osd.2
- - mon.b
- mds.a
- osd.3
- osd.4
- osd.5
- - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph: null
- workunit:
clients:
all:
- rbd/map-unmap.sh
teuthology_branch: master
ubuntu@teuthology:/a/teuthology-2013-09-02_01:01:32-krbd-master-testing-basic-plana/17253$
- Status changed from New to Duplicate
i think/hope this is a duplicate of the async notify racing with shutdown
- Status changed from Duplicate to 12
- Project changed from rbd to Linux kernel client
- Category set to rbd
- Assignee deleted (
Josh Durgin)
- Assignee set to Ilya Dryomov
I bet there is another trace of this somewhere, no rcu stall, just plain NULL deref in rb_erase(). Will try to investigate.
Is there anything which needs to be gathered from the cluster currently displaying this issue which could help out?
If it's crashed again, a full dmesg and a tail (say, last 5-10 minutes before the crash) of osd/messenger logs would help.
And if it hasn't, the same (or at least a full dmesg) from the previous crash won't hurt, if you still have it around.
- Status changed from 12 to Resolved
What Josh got a report of was not a referenced trace, but the
following (pulled off of the vmcore):
[197102.902802] ------------[ cut here ]------------
[197102.903670] kernel BUG at /builddir/build/BUILD/ceph-3.10-dc9ac62/net/ceph//osd_client.c:1003!
[197102.904553] invalid opcode: 0000 [#1] SMP
[197102.905393] Modules linked in: fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat xfs bridge stp llc xt_nat xt_REDIRECT rbd(OF) libceph(OF) ip6table_filter ip6_tables sg openvswitch vxlan ip_tunnel gre libcrc32c ipt_REJECT xt_comment xt_conntrack xt_multiport iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables iTCO_wdt iTCO_vendor_support ipmi_devintf coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nfsd aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core lpc_ich mfd_core shpchp wmi ipmi_si ipmi_msghandler mperf acpi_power_meter auth_rpcgss nfs_acl lockd sunrpc binfmt_misc dm_multipath ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect
[197102.911118] sysimgblt i2c_algo_bit drm_kms_helper ttm drm i2c_core enic megaraid_sas dm_mirror dm_region_hash dm_log dm_mod
[197102.913242] CPU: 4 PID: 18929 Comm: rbd Tainted: GF O-------------- 3.10.0-123.8.1.el7.x86_64 #1
[197102.914487] Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.2.0.042820141643 04/28/2014
[197102.915959] task: ffff882f75fe38e0 ti: ffff882f60e24000 task.ti: ffff882f60e24000
[197102.917434] RIP: 0010:[<ffffffffa0448dc9>] [<ffffffffa0448dc9>] __remove_osd+0x89/0x90 [libceph]
[197102.918961] RSP: 0018:ffff882f60e25da0 EFLAGS: 00010206
[197102.920408] RAX: ffff885ea5043ca0 RBX: ffff885ea5043800 RCX: 0000000180190011
[197102.921488] RDX: 0000000000000000 RSI: ffff885ea5043800 RDI: ffff880036837768
[197102.922561] RBP: ffff882f60e25db0 R08: ffff882de51caf80 R09: 0000000180190011
[197102.923636] R10: ffffffff814b65af R11: ffffea00b7947200 R12: ffff880036837768
[197102.924709] R13: ffff8800368377c0 R14: 0000000000000000 R15: 0000000000000000
[197102.925782] FS: 00007fae849447c0(0000) GS:ffff882fbfc80000(0000) knlGS:0000000000000000
[197102.926863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[197102.927936] CR2: 00007f3af4745f20 CR3: 0000002f61bdf000 CR4: 00000000001407e0
[197102.929038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[197102.930075] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[197102.931133] Stack:
[197102.932198] ffff880036837768 ffff8800368377e8 ffff882f60e25dd8 ffffffffa044d2a4
[197102.933306] ffff880036837000 ffff885ecd83ef00 0000000000000001 ffff882f60e25df0
[197102.934390] ffffffffa043e82c ffff885ecd83ef08 ffff882f60e25e10 ffffffffa048d3e6
[197102.935474] Call Trace:
[197102.936547] [<ffffffffa044d2a4>] ceph_osdc_stop+0x94/0x100 [libceph]
[197102.937633] [<ffffffffa043e82c>] ceph_destroy_client+0x2c/0xa0 [libceph]
[197102.938709] [<ffffffffa048d3e6>] rbd_client_release+0x46/0x80 [rbd]
[197102.939809] [<ffffffffa048e705>] rbd_dev_destroy+0x65/0x70 [rbd]
[197102.940875] [<ffffffffa048e9a7>] rbd_dev_image_release+0x57/0x60 [rbd]
[197102.941946] [<ffffffffa048fe43>] do_rbd_remove.isra.33+0x163/0x1f0 [rbd]
[197102.943050] [<ffffffffa048ff14>] rbd_remove+0x24/0x30 [rbd]
[197102.944110] [<ffffffff813b41a7>] bus_attr_store+0x27/0x30
[197102.945166] [<ffffffff81225286>] sysfs_write_file+0xc6/0x140
[197102.946232] [<ffffffff811af6dd>] vfs_write+0xbd/0x1e0
[197102.947346] [<ffffffff811b0128>] SyS_write+0x58/0xb0
[197102.948420] [<ffffffff815f2a59>] system_call_fastpath+0x16/0x1b
[197102.949473] Code: 2e 97 ff ff 48 89 df e8 06 ff ff ff 5b 41 5c 5d c3 48 89 f2 48 c7 c7 f8 7f 46 a0 48 c7 c6 be c6 45 a0 31 c0 e8 b9 fe e8 e0 eb 92 <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 f6 05 8d f2 01 00 04 48
[197102.951679] RIP [<ffffffffa0448dc9>] __remove_osd+0x89/0x90 [libceph]
[197102.952791] RSP <ffff882f60e25da0>
This is a
BUG_ON(!list_empty(&osd->o_requests));
in __remove_osd() in our original rhel7 kmod (dc9ac62e1e1a, rhel7
branch @ github).
vmcore showed that o_requests had a single entry on it which turned out
to be a requeued due to a connection reset and half resent lingering
request. request structures were completely messed up due to rbd unmap
unregistering requeued request with __unregister_linger_request().
This (request cancellation) was fixed upstream a while ago and the
fixes are also in the updated kmod.
During some tests, I stumbled upon this bug in rb_erase triggered osd_reset() -> __reset_osd() -> __remove_osd(). But I have not been using rbd but cephfs with kernel v3.14.28 + patches mentioned in #10449 and #10450. The bug was triggered by restarting all OSDs of our cluster simultaneously.
This ticket has it mixed up with another issue, we are tracking rb_erase() in #8087.
I'll post your comment and reply there.
Also available in: Atom
PDF