Project

General

Profile

Actions

Bug #3151

closed

krbd: possible circular locking dependency (sysfs_lock and ctl_mutex) (testing branch)

Added by Josh Durgin over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Detected when unmapping an image:

2012-09-13T23:10:31.361742-07:00 plana85 kernel: [14561.681430] ======================================================
2012-09-13T23:10:31.361748-07:00 plana85 kernel: [14561.710723] [ INFO: possible circular locking dependency detected ]
2012-09-13T23:10:31.361749-07:00 plana85 kernel: [14561.740845] 3.6.0-rc5-ceph-00063-gbac97e1 #1 Not tainted
2012-09-13T23:10:31.361759-07:00 plana85 kernel: [14561.770428] -------------------------------------------------------
2012-09-13T23:10:31.361761-07:00 plana85 kernel: [14561.801738] rbd/21089 is trying to acquire lock:
2012-09-13T23:10:31.361763-07:00 plana85 kernel: [14561.831482]  (&eq->sysfs_lock){+.+...}, at: [<ffffffff812f5a46>] elevator_exit+0x26/0x60
2012-09-13T23:10:31.361764-07:00 plana85 kernel: [14561.890992] 
2012-09-13T23:10:31.361765-07:00 plana85 kernel: [14561.890992] but task is already holding lock:
2012-09-13T23:10:31.361768-07:00 plana85 kernel: [14561.945218]  (ctl_mutex/1){+.+.+.}, at: [<ffffffffa02300ee>] rbd_remove+0x5e/0x1a0 [rbd]
2012-09-13T23:10:31.361769-07:00 plana85 kernel: [14562.002043] 
2012-09-13T23:10:31.361770-07:00 plana85 kernel: [14562.002043] which lock already depends on the new lock.
2012-09-13T23:10:31.361770-07:00 plana85 kernel: [14562.002043] 
2012-09-13T23:10:31.361771-07:00 plana85 kernel: [14562.082281] 
2012-09-13T23:10:31.361772-07:00 plana85 kernel: [14562.082281] the existing dependency chain (in reverse order) is:
2012-09-13T23:10:31.361773-07:00 plana85 kernel: [14562.138280] 
2012-09-13T23:10:31.361774-07:00 plana85 kernel: [14562.138280] -> #3 (ctl_mutex/1){+.+.+.}:
2012-09-13T23:10:31.361776-07:00 plana85 kernel: [14562.190024]        [<ffffffff810b2ef2>] lock_acquire+0xa2/0x140
2012-09-13T23:10:31.361778-07:00 plana85 kernel: [14562.221198]        [<ffffffff81631f2b>] mutex_lock_nested+0x4b/0x320
2012-09-13T23:10:31.361779-07:00 plana85 kernel: [14562.252819]        [<ffffffffa0233301>] rbd_add+0x3f1/0xd84 [rbd]
2012-09-13T23:10:31.361780-07:00 plana85 kernel: [14562.284162]        [<ffffffff813f8a17>] bus_attr_store+0x27/0x30
2012-09-13T23:10:31.361782-07:00 plana85 kernel: [14562.315594]        [<ffffffff811ec5f6>] sysfs_write_file+0xe6/0x170
2012-09-13T23:10:31.361783-07:00 plana85 kernel: [14562.347322]        [<ffffffff8117bea3>] vfs_write+0xb3/0x180
2012-09-13T23:10:31.361784-07:00 plana85 kernel: [14562.378272]        [<ffffffff8117c1ca>] sys_write+0x4a/0x90
2012-09-13T23:10:31.361785-07:00 plana85 kernel: [14562.409037]        [<ffffffff8163dba9>] system_call_fastpath+0x16/0x1b
2012-09-13T23:10:31.361787-07:00 plana85 kernel: [14562.441699] 
2012-09-13T23:10:31.361789-07:00 plana85 kernel: [14562.441699] -> #2 (&rbd_dev->header_rwsem){+++++.}:
2012-09-13T23:10:31.361790-07:00 plana85 kernel: [14562.498330]        [<ffffffff810b2ef2>] lock_acquire+0xa2/0x140
2012-09-13T23:10:31.361791-07:00 plana85 kernel: [14562.531443]        [<ffffffff816331a9>] down_read+0x39/0x4e
2012-09-13T23:10:31.361792-07:00 plana85 kernel: [14562.563633]        [<ffffffffa0231e82>] rbd_rq_fn+0xd2/0x6a0 [rbd]
2012-09-13T23:10:31.361793-07:00 plana85 kernel: [14562.595968]        [<ffffffff812f711e>] __blk_run_queue+0x1e/0x20
2012-09-13T23:10:31.361795-07:00 plana85 kernel: [14562.627963]        [<ffffffff813138e8>] cfq_kick_queue+0x38/0x50
2012-09-13T23:10:31.361797-07:00 plana85 kernel: [14562.659242]        [<ffffffff810718b0>] process_one_work+0x1a0/0x5f0
2012-09-13T23:10:31.361799-07:00 plana85 kernel: [14562.690515]        [<ffffffff8107367d>] worker_thread+0x18d/0x4c0
2012-09-13T23:10:31.361800-07:00 plana85 kernel: [14562.720838]        [<ffffffff8107924e>] kthread+0xae/0xc0
2012-09-13T23:10:31.361801-07:00 plana85 kernel: [14562.749512]        [<ffffffff8163ed84>] kernel_thread_helper+0x4/0x10
2012-09-13T23:10:31.361802-07:00 plana85 kernel: [14562.779921] 
2012-09-13T23:10:31.361803-07:00 plana85 kernel: [14562.779921] -> #1 ((&cfqd->unplug_work)){+.+...}:
2012-09-13T23:10:31.361804-07:00 plana85 kernel: [14562.831367]        [<ffffffff810b2ef2>] lock_acquire+0xa2/0x140
2012-09-13T23:10:31.361805-07:00 plana85 kernel: [14562.860808]        [<ffffffff81072111>] wait_on_work+0x41/0x160
2012-09-13T23:10:31.361808-07:00 plana85 kernel: [14562.890409]        [<ffffffff810723a3>] __cancel_work_timer+0x83/0x140
2012-09-13T23:10:31.361809-07:00 plana85 kernel: [14562.920749]        [<ffffffff81072490>] cancel_work_sync+0x10/0x20
2012-09-13T23:10:31.361810-07:00 plana85 kernel: [14562.950645]        [<ffffffff81315b3b>] cfq_exit_queue+0x3b/0xe0
2012-09-13T23:10:31.361811-07:00 plana85 kernel: [14562.980871]        [<ffffffff812f5a5a>] elevator_exit+0x3a/0x60
2012-09-13T23:10:31.361812-07:00 plana85 kernel: [14563.010740]        [<ffffffff812ff382>] blk_release_queue+0x52/0xb0
2012-09-13T23:10:31.361814-07:00 plana85 kernel: [14563.040778]        [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.361815-07:00 plana85 kernel: [14563.070846]        [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.361817-07:00 plana85 kernel: [14563.099611]        [<ffffffff812f92e5>] blk_put_queue+0x15/0x20
2012-09-13T23:10:31.361818-07:00 plana85 kernel: [14563.128616]        [<ffffffff8130542b>] disk_release+0x8b/0xc0
2012-09-13T23:10:31.361820-07:00 plana85 kernel: [14563.157250]        [<ffffffff813f6377>] device_release+0x27/0xa0
2012-09-13T23:10:31.361821-07:00 plana85 kernel: [14563.186011]        [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.361822-07:00 plana85 kernel: [14563.214957]        [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.361823-07:00 plana85 kernel: [14563.243205]        [<ffffffff813057b7>] put_disk+0x17/0x20
2012-09-13T23:10:31.361824-07:00 plana85 kernel: [14563.271429]        [<ffffffffa01f5c08>] 0xffffffffa01f5c08
2012-09-13T23:10:31.361825-07:00 plana85 kernel: [14563.299500]        [<ffffffffa01f6af7>] 0xffffffffa01f6af7
2012-09-13T23:10:31.361828-07:00 plana85 kernel: [14563.326760]        [<ffffffff813f6377>] device_release+0x27/0xa0
2012-09-13T23:10:31.361829-07:00 plana85 kernel: [14563.354556]        [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.361830-07:00 plana85 kernel: [14563.382006]        [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.361832-07:00 plana85 kernel: [14563.408772]        [<ffffffff813f6127>] put_device+0x17/0x20
2012-09-13T23:10:31.361833-07:00 plana85 kernel: [14563.435472]        [<ffffffff813f731a>] device_unregister+0x2a/0x60
2012-09-13T23:10:31.361834-07:00 plana85 kernel: [14563.462830]        [<ffffffffa01f521f>] 0xffffffffa01f521f
2012-09-13T23:10:31.361835-07:00 plana85 kernel: [14563.489322]        [<ffffffff813f8a17>] bus_attr_store+0x27/0x30
2012-09-13T23:10:31.361838-07:00 plana85 kernel: [14563.516408]        [<ffffffff811ec5f6>] sysfs_write_file+0xe6/0x170
2012-09-13T23:10:31.361839-07:00 plana85 kernel: [14563.543852]        [<ffffffff8117bea3>] vfs_write+0xb3/0x180
2012-09-13T23:10:31.361883-07:00 plana85 kernel: [14563.570610]        [<ffffffff8117c1ca>] sys_write+0x4a/0x90
2012-09-13T23:10:31.361884-07:00 plana85 kernel: [14563.596840]        [<ffffffff8163dba9>] system_call_fastpath+0x16/0x1b
2012-09-13T23:10:31.361885-07:00 plana85 kernel: [14563.624070] 
2012-09-13T23:10:31.361886-07:00 plana85 kernel: [14563.624070] -> #0 (&eq->sysfs_lock){+.+...}:
2012-09-13T23:10:31.361887-07:00 plana85 kernel: [14563.667812]        [<ffffffff810b2858>] __lock_acquire+0x1ac8/0x1b90
2012-09-13T23:10:31.361888-07:00 plana85 kernel: [14563.695381]        [<ffffffff810b2ef2>] lock_acquire+0xa2/0x140
2012-09-13T23:10:31.361891-07:00 plana85 kernel: [14563.722343]        [<ffffffff81631f2b>] mutex_lock_nested+0x4b/0x320
2012-09-13T23:10:31.361892-07:00 plana85 kernel: [14563.749862]        [<ffffffff812f5a46>] elevator_exit+0x26/0x60
2012-09-13T23:10:31.361894-07:00 plana85 kernel: [14563.776977]        [<ffffffff812ff382>] blk_release_queue+0x52/0xb0
2012-09-13T23:10:31.361895-07:00 plana85 kernel: [14563.804569]        [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.361896-07:00 plana85 kernel: [14563.832092]        [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.361897-07:00 plana85 kernel: [14563.858917]        [<ffffffff812f92e5>] blk_put_queue+0x15/0x20
2012-09-13T23:10:31.361898-07:00 plana85 kernel: [14563.886066]        [<ffffffff8130542b>] disk_release+0x8b/0xc0
2012-09-13T23:10:31.361901-07:00 plana85 kernel: [14563.913169]        [<ffffffff813f6377>] device_release+0x27/0xa0
2012-09-13T23:10:31.361902-07:00 plana85 kernel: [14563.940465]        [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.361903-07:00 plana85 kernel: [14563.968029]        [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.361905-07:00 plana85 kernel: [14563.994878]        [<ffffffff813057b7>] put_disk+0x17/0x20
2012-09-13T23:10:31.361906-07:00 plana85 kernel: [14564.021460]        [<ffffffffa0230c08>] rbd_free_disk.isra.18+0x38/0x50 [rbd]
2012-09-13T23:10:31.361907-07:00 plana85 kernel: [14564.050446]        [<ffffffffa0231af7>] rbd_dev_release+0xf7/0x180 [rbd]
2012-09-13T23:10:31.361908-07:00 plana85 kernel: [14564.079113]        [<ffffffff813f6377>] device_release+0x27/0xa0
2012-09-13T23:10:31.361910-07:00 plana85 kernel: [14564.107490]        [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.361912-07:00 plana85 kernel: [14564.136294]        [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.361913-07:00 plana85 kernel: [14564.164199]        [<ffffffff813f6127>] put_device+0x17/0x20
2012-09-13T23:10:31.361915-07:00 plana85 kernel: [14564.192093]        [<ffffffff813f731a>] device_unregister+0x2a/0x60
2012-09-13T23:10:31.361916-07:00 plana85 kernel: [14564.220769]        [<ffffffffa023021f>] rbd_remove+0x18f/0x1a0 [rbd]
2012-09-13T23:10:31.361917-07:00 plana85 kernel: [14564.249689]        [<ffffffff813f8a17>] bus_attr_store+0x27/0x30
2012-09-13T23:10:31.361919-07:00 plana85 kernel: [14564.278175]        [<ffffffff811ec5f6>] sysfs_write_file+0xe6/0x170
2012-09-13T23:10:31.361920-07:00 plana85 kernel: [14564.306996]        [<ffffffff8117bea3>] vfs_write+0xb3/0x180
2012-09-13T23:10:31.361922-07:00 plana85 kernel: [14564.335081]        [<ffffffff8117c1ca>] sys_write+0x4a/0x90
2012-09-13T23:10:31.361924-07:00 plana85 kernel: [14564.362676]        [<ffffffff8163dba9>] system_call_fastpath+0x16/0x1b
2012-09-13T23:10:31.361924-07:00 plana85 kernel: [14564.391836] 
2012-09-13T23:10:31.361925-07:00 plana85 kernel: [14564.391836] other info that might help us debug this:
2012-09-13T23:10:31.361926-07:00 plana85 kernel: [14564.391836] 
2012-09-13T23:10:31.361927-07:00 plana85 kernel: [14564.465379] Chain exists of:
2012-09-13T23:10:31.361928-07:00 plana85 kernel: [14564.465379]   &eq->sysfs_lock --> &rbd_dev->header_rwsem --> ctl_mutex/1
2012-09-13T23:10:31.361929-07:00 plana85 kernel: [14564.465379] 
2012-09-13T23:10:31.361932-07:00 plana85 kernel: [14564.537646]  Possible unsafe locking scenario:
2012-09-13T23:10:31.361933-07:00 plana85 kernel: [14564.537646] 
2012-09-13T23:10:31.361934-07:00 plana85 kernel: [14564.586885]        CPU0                    CPU1
2012-09-13T23:10:31.361935-07:00 plana85 kernel: [14564.612893]        ----                    ----
2012-09-13T23:10:31.361935-07:00 plana85 kernel: [14564.638827]   lock(ctl_mutex/1);
2012-09-13T23:10:31.361936-07:00 plana85 kernel: [14564.662982]                                lock(&rbd_dev->header_rwsem);
2012-09-13T23:10:31.361937-07:00 plana85 kernel: [14564.691711]                                lock(ctl_mutex/1);
2012-09-13T23:10:31.361940-07:00 plana85 kernel: [14564.719241]   lock(&eq->sysfs_lock);
2012-09-13T23:10:31.361940-07:00 plana85 kernel: [14564.744279] 
2012-09-13T23:10:31.361941-07:00 plana85 kernel: [14564.744279]  *** DEADLOCK ***
2012-09-13T23:10:31.361924-07:00 plana85 kernel: [14564.391836] 
2012-09-13T23:10:31.361925-07:00 plana85 kernel: [14564.391836] other info that might help us debug this:
2012-09-13T23:10:31.361926-07:00 plana85 kernel: [14564.391836] 
2012-09-13T23:10:31.361927-07:00 plana85 kernel: [14564.465379] Chain exists of:
2012-09-13T23:10:31.361928-07:00 plana85 kernel: [14564.465379]   &eq->sysfs_lock --> &rbd_dev->header_rwsem --> ctl_mutex/1
2012-09-13T23:10:31.361929-07:00 plana85 kernel: [14564.465379] 
2012-09-13T23:10:31.361932-07:00 plana85 kernel: [14564.537646]  Possible unsafe locking scenario:
2012-09-13T23:10:31.361933-07:00 plana85 kernel: [14564.537646] 
2012-09-13T23:10:31.361934-07:00 plana85 kernel: [14564.586885]        CPU0                    CPU1
2012-09-13T23:10:31.361935-07:00 plana85 kernel: [14564.612893]        ----                    ----
2012-09-13T23:10:31.361935-07:00 plana85 kernel: [14564.638827]   lock(ctl_mutex/1);
2012-09-13T23:10:31.361936-07:00 plana85 kernel: [14564.662982]                                lock(&rbd_dev->header_rwsem);
2012-09-13T23:10:31.361937-07:00 plana85 kernel: [14564.691711]                                lock(ctl_mutex/1);
2012-09-13T23:10:31.361940-07:00 plana85 kernel: [14564.719241]   lock(&eq->sysfs_lock);
2012-09-13T23:10:31.361940-07:00 plana85 kernel: [14564.744279] 
2012-09-13T23:10:31.361941-07:00 plana85 kernel: [14564.744279]  *** DEADLOCK ***
2012-09-13T23:10:31.361942-07:00 plana85 kernel: [14564.744279] 
2012-09-13T23:10:31.361943-07:00 plana85 kernel: [14564.812034] 3 locks held by rbd/21089:
2012-09-13T23:10:31.361944-07:00 plana85 kernel: [14564.835976]  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811ec554>] sysfs_write_file+0x44/0x170
2012-09-13T23:10:31.361946-07:00 plana85 kernel: [14564.887304]  #1:  (s_active#146){.+.+.+}, at: [<ffffffff811ec5dd>] sysfs_write_file+0xcd/0x170
2012-09-13T23:10:31.361947-07:00 plana85 kernel: [14564.941919]  #2:  (ctl_mutex/1){+.+.+.}, at: [<ffffffffa02300ee>] rbd_remove+0x5e/0x1a0 [rbd]
2012-09-13T23:10:31.361950-07:00 plana85 kernel: [14564.999542] 
2012-09-13T23:10:31.361950-07:00 plana85 kernel: [14564.999542] stack backtrace:
2012-09-13T23:10:31.361952-07:00 plana85 kernel: [14565.051303] Pid: 21089, comm: rbd Not tainted 3.6.0-rc5-ceph-00063-gbac97e1 #1
2012-09-13T23:10:31.361952-07:00 plana85 kernel: [14565.108416] Call Trace:
2012-09-13T23:10:31.361954-07:00 plana85 kernel: [14565.135993]  [<ffffffff8162ac90>] print_circular_bug+0x1fb/0x20c
2012-09-13T23:10:31.361955-07:00 plana85 kernel: [14565.167790]  [<ffffffff810b2858>] __lock_acquire+0x1ac8/0x1b90
2012-09-13T23:10:31.361956-07:00 plana85 kernel: [14565.199468]  [<ffffffff8102230f>] ? save_stack_trace+0x2f/0x50
2012-09-13T23:10:31.361959-07:00 plana85 kernel: [14565.231162]  [<ffffffff810b1a0c>] ? __lock_acquire+0xc7c/0x1b90
2012-09-13T23:10:31.361960-07:00 plana85 kernel: [14565.263206]  [<ffffffff812f5a46>] ? elevator_exit+0x26/0x60
2012-09-13T23:10:31.361961-07:00 plana85 kernel: [14565.294404]  [<ffffffff810b2ef2>] lock_acquire+0xa2/0x140
2012-09-13T23:10:31.361962-07:00 plana85 kernel: [14565.325008]  [<ffffffff812f5a46>] ? elevator_exit+0x26/0x60
2012-09-13T23:10:31.361963-07:00 plana85 kernel: [14565.355652]  [<ffffffff81631f2b>] mutex_lock_nested+0x4b/0x320
2012-09-13T23:10:31.361964-07:00 plana85 kernel: [14565.386725]  [<ffffffff812f5a46>] ? elevator_exit+0x26/0x60
2012-09-13T23:10:31.361966-07:00 plana85 kernel: [14565.417952]  [<ffffffff810b37b6>] ? mark_held_locks+0x86/0x140
2012-09-13T23:10:31.361967-07:00 plana85 kernel: [14565.449956]  [<ffffffff81635270>] ? _raw_spin_unlock_irq+0x30/0x40
2012-09-13T23:10:31.361987-07:00 plana85 kernel: [14565.482556]  [<ffffffff812f5a46>] elevator_exit+0x26/0x60
2012-09-13T23:10:31.361989-07:00 plana85 kernel: [14565.513948]  [<ffffffff812ff382>] blk_release_queue+0x52/0xb0
2012-09-13T23:10:31.361990-07:00 plana85 kernel: [14565.545898]  [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.361991-07:00 plana85 kernel: [14565.577573]  [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.361992-07:00 plana85 kernel: [14565.608713]  [<ffffffff812f92e5>] blk_put_queue+0x15/0x20
2012-09-13T23:10:31.361993-07:00 plana85 kernel: [14565.640566]  [<ffffffff8130542b>] disk_release+0x8b/0xc0
2012-09-13T23:10:31.361995-07:00 plana85 kernel: [14565.672531]  [<ffffffff813f6377>] device_release+0x27/0xa0
2012-09-13T23:10:31.361997-07:00 plana85 kernel: [14565.704770]  [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.361998-07:00 plana85 kernel: [14565.736638]  [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.361999-07:00 plana85 kernel: [14565.767041]  [<ffffffff813057b7>] put_disk+0x17/0x20
2012-09-13T23:10:31.362001-07:00 plana85 kernel: [14565.796278]  [<ffffffffa0230c08>] rbd_free_disk.isra.18+0x38/0x50 [rbd]
2012-09-13T23:10:31.362002-07:00 plana85 kernel: [14565.827001]  [<ffffffffa0231af7>] rbd_dev_release+0xf7/0x180 [rbd]
2012-09-13T23:10:31.362003-07:00 plana85 kernel: [14565.856618]  [<ffffffff813f6377>] device_release+0x27/0xa0
2012-09-13T23:10:31.362004-07:00 plana85 kernel: [14565.884718]  [<ffffffff8131b51b>] kobject_release+0x8b/0x1d0
2012-09-13T23:10:31.362005-07:00 plana85 kernel: [14565.913376]  [<ffffffff8131b39c>] kobject_put+0x2c/0x60
2012-09-13T23:10:31.362007-07:00 plana85 kernel: [14565.941729]  [<ffffffff813f6127>] put_device+0x17/0x20
2012-09-13T23:10:31.362008-07:00 plana85 kernel: [14565.969282]  [<ffffffff813f731a>] device_unregister+0x2a/0x60
2012-09-13T23:10:31.362010-07:00 plana85 kernel: [14565.996876]  [<ffffffffa023021f>] rbd_remove+0x18f/0x1a0 [rbd]
2012-09-13T23:10:31.362011-07:00 plana85 kernel: [14566.024852]  [<ffffffff813f8a17>] bus_attr_store+0x27/0x30
2012-09-13T23:10:31.362012-07:00 plana85 kernel: [14566.052098]  [<ffffffff811ec5f6>] sysfs_write_file+0xe6/0x170
2012-09-13T23:10:31.362013-07:00 plana85 kernel: [14566.079606]  [<ffffffff8117bea3>] vfs_write+0xb3/0x180
2012-09-13T23:10:31.362014-07:00 plana85 kernel: [14566.106395]  [<ffffffff8117c1ca>] sys_write+0x4a/0x90
2012-09-13T23:10:31.362015-07:00 plana85 kernel: [14566.132936]  [<ffffffff8163dba9>] system_call_fastpath+0x16/0x1b
Actions #1

Updated by Josh Durgin over 11 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Josh Durgin over 11 years ago

  • Assignee set to Alex Elder
Actions #3

Updated by Alex Elder over 11 years ago

FINALLY. After going through a bunch of different tests
to try to narrow it down, I have found that simply running
workunit rbd/copy.sh reproduces the problem. I have a
couple of simple experiments to try that might make it
go away. Hopefully I can have a fix verified by tomorrow
noon.

Actions #4

Updated by Alex Elder over 11 years ago

Strike that. I have now learned that the "copy.sh" script
was buggy (which I've now fixed) and have been unable to
get the problem to reproduce again with it after several
attempts.

Onward and upward. I need to be able to reproduce it to
know it's fixed.

Actions #5

Updated by Alex Elder over 11 years ago

  • Status changed from New to Resolved

This has been resolved. The culprit was the committed patch
entitled "rbd: expand lock protection in rbd_add()" which had
rbd_dev->rw_sem be held over a much longer period than it had
been before, and this allowed some code paths that acquired
the ctl_sem semaphore to be traversed.

I fixed it by removing that patch and adjusting/rebasing all
committed patches that followed it accordingly.

Actions

Also available in: Atom PDF