Bug #1762
i_lock vs s_cap_lock vs inodes_with_caps_lock lock ordering
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
Reported by Amon Ott on ML.
Nov 30 10:40:21 tgpro1 kernel: ======================================================= Nov 30 10:40:21 tgpro1 kernel: [ INFO: possible circular locking dependency detected ] Nov 30 10:40:21 tgpro1 kernel: 3.1.3-rsbac #1 Nov 30 10:40:21 tgpro1 kernel: ------------------------------------------------------- Nov 30 10:40:21 tgpro1 kernel: kworker/0:0/8787 is trying to acquire lock: Nov 30 10:40:21 tgpro1 kernel: (&sb->s_type->i_lock_key#16){+.+...}, at: [<000f7176>] igrab+0x11/0x41 Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: but task is already holding lock: Nov 30 10:40:21 tgpro1 kernel: (&(&s->s_cap_lock)->rlock){+.+...}, at: [<006e0de4>] iterate_session_caps+0x40/0x17d [ceph] Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: which lock already depends on the new lock. Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: the existing dependency chain (in reverse order) is: Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: -> #1 (&(&s->s_cap_lock)->rlock){+.+...}: Nov 30 10:40:21 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59 Nov 30 10:40:21 tgpro1 kernel: [<0052f167>] _raw_spin_lock+0x24/0x33 Nov 30 10:40:21 tgpro1 kernel: [<006dc0bc>] ceph_add_cap+0x302/0x551 [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006cfbfd>] fill_inode+0x5ef/0x72e [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006d0d6b>] ceph_fill_trace+0x663/0x6c5 [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006e4c6e>] dispatch+0xafe/0x10e7 [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006a444d>] con_work+0x14f7/0x16c6 [libceph] Nov 30 10:40:21 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391 Nov 30 10:40:21 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7 Nov 30 10:40:21 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67 Nov 30 10:40:21 tgpro1 kernel: [<00530b26>] kernel_thread_helper+0x6/0xd Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: -> #0 (&sb->s_type->i_lock_key#16){+.+...}: Nov 30 10:40:21 tgpro1 kernel: [<0008fa3e>] __lock_acquire+0xe79/0x1425 Nov 30 10:40:21 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59 Nov 30 10:40:21 tgpro1 kernel: [<0052f167>] _raw_spin_lock+0x24/0x33 Nov 30 10:40:21 tgpro1 kernel: [<000f7176>] igrab+0x11/0x41 Nov 30 10:40:21 tgpro1 kernel: [<006e0e14>] iterate_session_caps+0x70/0x17d [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006e3bca>] send_mds_reconnect+0x319/0x46a [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006e4035>] ceph_mdsc_handle_map+0x2f2/0x42d [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006ce00d>] extra_mon_dispatch+0x18/0x1f [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006a63dd>] dispatch+0x4f7/0x52a [libceph] Nov 30 10:40:21 tgpro1 kernel: [<006a444d>] con_work+0x14f7/0x16c6 [libceph] Nov 30 10:40:21 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391 Nov 30 10:40:21 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7 Nov 30 10:40:21 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67 Nov 30 10:40:21 tgpro1 kernel: [<00530b26>] kernel_thread_helper+0x6/0xd Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: other info that might help us debug this: Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: Possible unsafe locking scenario: Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: CPU0 CPU1 Nov 30 10:40:21 tgpro1 kernel: ---- ---- Nov 30 10:40:21 tgpro1 kernel: lock(&(&s->s_cap_lock)->rlock); Nov 30 10:40:21 tgpro1 kernel: lock(&sb->s_type->i_lock_key); Nov 30 10:40:21 tgpro1 kernel: lock(&(&s->s_cap_lock)->rlock); Nov 30 10:40:21 tgpro1 kernel: lock(&sb->s_type->i_lock_key); Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: *** DEADLOCK *** Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: 5 locks held by kworker/0:0/8787: Nov 30 10:40:21 tgpro1 kernel: #0: (ceph-msgr){.+.+.+}, at: [<0007a1ff>] process_one_work+0x1d7/0x391 Nov 30 10:40:21 tgpro1 kernel: #1: ((&(&con->work)->work)){+.+.+.}, at: [<0007a1ff>] process_one_work+0x1d7/0x391 Nov 30 10:40:21 tgpro1 kernel: #2: (&s->s_mutex){+.+.+.}, at: [<006e3954>] send_mds_reconnect+0xa3/0x46a [ceph] Nov 30 10:40:21 tgpro1 kernel: #3: (&mdsc->snap_rwsem){+++++.}, at: [<006e3a4c>] send_mds_reconnect+0x19b/0x46a [ceph] Nov 30 10:40:21 tgpro1 kernel: #4: (&(&s->s_cap_lock)->rlock){+.+...}, at: [<006e0de4>] iterate_session_caps+0x40/0x17d [ceph] Nov 30 10:40:21 tgpro1 kernel: Nov 30 10:40:21 tgpro1 kernel: stack backtrace: Nov 30 10:40:21 tgpro1 kernel: Pid: 8787, comm: kworker/0:0 Tainted: G W 3.1.3-rsbac #1 Nov 30 10:40:21 tgpro1 kernel: Call Trace: Nov 30 10:40:21 tgpro1 kernel: [<0008e776>] print_circular_bug+0x21a/0x227 Nov 30 10:40:21 tgpro1 kernel: [<0008fa3e>] __lock_acquire+0xe79/0x1425 Nov 30 10:40:21 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59 Nov 30 10:40:21 tgpro1 kernel: [<000f7176>] ? igrab+0x11/0x41 Nov 30 10:40:21 tgpro1 kernel: [<0052f167>] _raw_spin_lock+0x24/0x33 Nov 30 10:40:21 tgpro1 kernel: [<000f7176>] ? igrab+0x11/0x41 Nov 30 10:40:21 tgpro1 kernel: [<000f7176>] igrab+0x11/0x41 Nov 30 10:40:21 tgpro1 kernel: [<006e0e14>] iterate_session_caps+0x70/0x17d [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006e5488>] ? ceph_mdsc_submit_request+0x55/0x55 [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006a4d03>] ? ceph_pagelist_append+0xbc/0xf9 [libceph] Nov 30 10:40:21 tgpro1 kernel: [<006e3bca>] send_mds_reconnect+0x319/0x46a [ceph] Nov 30 10:40:21 tgpro1 kernel: [<0008daa1>] ? trace_hardirqs_on_caller+0x10b/0x13c Nov 30 10:40:21 tgpro1 kernel: [<0008dadd>] ? trace_hardirqs_on+0xb/0xd Nov 30 10:40:21 tgpro1 kernel: [<006e4035>] ceph_mdsc_handle_map+0x2f2/0x42d [ceph] Nov 30 10:40:21 tgpro1 kernel: [<000d7cd3>] ? __kmalloc+0x10d/0x137 Nov 30 10:40:21 tgpro1 kernel: [<00c40015>] ? 0xc40014 Nov 30 10:40:21 tgpro1 kernel: [<006ce00d>] extra_mon_dispatch+0x18/0x1f [ceph] Nov 30 10:40:21 tgpro1 kernel: [<006a63dd>] dispatch+0x4f7/0x52a [libceph] Nov 30 10:40:21 tgpro1 kernel: [<0008dadd>] ? trace_hardirqs_on+0xb/0xd Nov 30 10:40:21 tgpro1 kernel: [<006a444d>] con_work+0x14f7/0x16c6 [libceph] Nov 30 10:40:21 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391 Nov 30 10:40:21 tgpro1 kernel: [<0007a1ff>] ? process_one_work+0x1d7/0x391 Nov 30 10:40:21 tgpro1 kernel: [<006a2f56>] ? ceph_fault+0x262/0x262 [libceph] Nov 30 10:40:21 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7 Nov 30 10:40:21 tgpro1 kernel: [<00002a00>] ? __show_regs+0x6f/0x6f Nov 30 10:40:21 tgpro1 kernel: [<00002a00>] ? __show_regs+0x6f/0x6f Nov 30 10:40:21 tgpro1 kernel: [<0007a5c5>] ? rescuer_thread+0x20c/0x20c Nov 30 10:40:21 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67 Nov 30 10:40:21 tgpro1 kernel: [<0007d705>] ? __init_kthread_worker+0x42/0x42 Nov 30 10:40:21 tgpro1 kernel: [<00530b26>] kernel_thread_helper+0x6/0xd Nov 30 10:40:26 tgpro1 kernel: ceph: mds0 reconnect success Nov 30 10:40:40 tgpro1 kernel: ceph: mds0 recovery completed
Nov 30 10:50:53 tgpro1 kernel: libceph: loaded (mon/osd proto 15/24, osdmap 5/6 5/6) Nov 30 10:50:53 tgpro1 kernel: ceph: loaded (mds proto 32) Nov 30 10:50:53 tgpro1 kernel: libceph: client6597 fsid 520649ba-50cc-2c58-7db3-4b30d3bb97d3 Nov 30 10:50:53 tgpro1 kernel: libceph: mon0 192.168.111.1:6789 session established Nov 30 10:56:43 tgpro1 kernel: ceph: mds0 caps stale Nov 30 10:57:03 tgpro1 kernel: ceph: mds0 caps stale Nov 30 10:58:24 tgpro1 kernel: ceph: mds0 reconnect start Nov 30 10:58:24 tgpro1 kernel: Nov 30 10:58:24 tgpro1 kernel: ======================================================= Nov 30 10:58:24 tgpro1 kernel: [ INFO: possible circular locking dependency detected ] Nov 30 10:58:24 tgpro1 kernel: 3.1.4-rsbac #1 Nov 30 10:58:24 tgpro1 kernel: ------------------------------------------------------- Nov 30 10:58:24 tgpro1 kernel: kworker/0:0/4 is trying to acquire lock: Nov 30 10:58:24 tgpro1 kernel: (&sb->s_type->i_lock_key#16){+.+...}, at: [<000f7166>] igrab+0x11/0x41 Nov 30 10:58:24 tgpro1 kernel: Nov 30 10:58:24 tgpro1 kernel: but task is already holding lock: Nov 30 10:58:24 tgpro1 kernel: (&(&s->s_cap_lock)->rlock){+.+...}, at: [<006b2de4>] iterate_session_caps+0x40/0x17d [ceph] Nov 30 10:58:24 tgpro1 kernel: Nov 30 10:58:24 tgpro1 kernel: which lock already depends on the new lock. Nov 30 10:58:24 tgpro1 kernel: Nov 30 10:58:24 tgpro1 kernel: Nov 30 10:58:24 tgpro1 kernel: the existing dependency chain (in reverse order) is: Nov 30 10:58:24 tgpro1 kernel: Nov 30 10:58:24 tgpro1 kernel: -> #1 (&(&s->s_cap_lock)->rlock){+.+...}: Nov 30 10:58:24 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59 Nov 30 10:58:24 tgpro1 kernel: [<0052f147>] _raw_spin_lock+0x24/0x33 Nov 30 10:58:24 tgpro1 kernel: [<006ae0bc>] ceph_add_cap+0x302/0x551 [ceph] Nov 30 10:58:24 tgpro1 kernel: [<006a1bfd>] fill_inode+0x5ef/0x72e [ceph] Nov 30 10:58:24 tgpro1 kernel: [<006a2d6b>] ceph_fill_trace+0x663/0x6c5 [ceph] Nov 30 10:58:24 tgpro1 kernel: [<006b6c6e>] dispatch+0xafe/0x10e7 [ceph] Nov 30 10:58:24 tgpro1 kernel: [<0067644d>] con_work+0x14f7/0x16c6 [libceph] Nov 30 10:58:24 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391 Nov 30 10:58:24 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7 Nov 30 10:58:24 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67 Nov 30 10:58:24 tgpro1 kernel: [<00530b06>] kernel_thread_helper+0x6/0xd Nov 30 10:58:24 tgpro1 kernel: Nov 30 10:58:24 tgpro1 kernel: -> #0 (&sb->s_type->i_lock_key#16){+.+...}: Nov 30 10:58:24 tgpro1 kernel: [<0008fa3e>] __lock_acquire+0xe79/0x1425 Nov 30 10:58:24 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59 Nov 30 10:58:24 tgpro1 kernel: [<0052f147>] _raw_spin_lock+0x24/0x33 Nov 30 10:58:24 tgpro1 kernel: [<000f7166>] igrab+0x11/0x41 Nov 30 10:58:24 tgpro1 kernel: [<006b2e14>] iterate_session_caps+0x70/0x17d [ceph] Nov 30 10:58:24 tgpro1 kernel: [<006b5bca>] send_mds_reconnect+0x319/0x46a [ceph] Nov 30 10:58:24 tgpro1 kernel: [<006b6035>] ceph_mdsc_handle_map+0x2f2/0x42d [ceph] Nov 30 10:58:24 tgpro1 kernel: [<006a000d>] extra_mon_dispatch+0x18/0x1f [ceph] Nov 30 10:58:24 tgpro1 kernel: [<006783dd>] dispatch+0x4f7/0x52a [libceph] Nov 30 10:58:24 tgpro1 kernel: [<0067644d>] con_work+0x14f7/0x16c6 [libceph] Nov 30 10:58:24 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391 Nov 30 10:58:24 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7 Nov 30 10:58:25 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67 Nov 30 10:58:25 tgpro1 kernel: [<00530b06>] kernel_thread_helper+0x6/0xd Nov 30 10:58:25 tgpro1 kernel: Nov 30 10:58:25 tgpro1 kernel: other info that might help us debug this: Nov 30 10:58:25 tgpro1 kernel: Nov 30 10:58:25 tgpro1 kernel: Possible unsafe locking scenario: Nov 30 10:58:25 tgpro1 kernel: Nov 30 10:58:25 tgpro1 kernel: CPU0 CPU1 Nov 30 10:58:25 tgpro1 kernel: ---- ---- Nov 30 10:58:25 tgpro1 kernel: lock(&(&s->s_cap_lock)->rlock); Nov 30 10:58:25 tgpro1 kernel: lock(&sb->s_type->i_lock_key); Nov 30 10:58:25 tgpro1 kernel: lock(&(&s->s_cap_lock)->rlock); Nov 30 10:58:25 tgpro1 kernel: lock(&sb->s_type->i_lock_key); Nov 30 10:58:25 tgpro1 kernel: Nov 30 10:58:25 tgpro1 kernel: *** DEADLOCK *** Nov 30 10:58:25 tgpro1 kernel: Nov 30 10:58:25 tgpro1 kernel: 5 locks held by kworker/0:0/4: Nov 30 10:58:25 tgpro1 kernel: #0: (ceph-msgr){.+.+.+}, at: [<0007a1ff>] process_one_work+0x1d7/0x391 Nov 30 10:58:25 tgpro1 kernel: #1: ((&(&con->work)->work)){+.+.+.}, at: [<0007a1ff>] process_one_work+0x1d7/0x391 Nov 30 10:58:25 tgpro1 kernel: #2: (&s->s_mutex){+.+.+.}, at: [<006b5954>] send_mds_reconnect+0xa3/0x46a [ceph] Nov 30 10:58:25 tgpro1 kernel: #3: (&mdsc->snap_rwsem){+++++.}, at: [<006b5a4c>] send_mds_reconnect+0x19b/0x46a [ceph] Nov 30 10:58:25 tgpro1 kernel: #4: (&(&s->s_cap_lock)->rlock){+.+...}, at: [<006b2de4>] iterate_session_caps+0x40/0x17d [ceph] Nov 30 10:58:25 tgpro1 kernel: Nov 30 10:58:25 tgpro1 kernel: stack backtrace: Nov 30 10:58:25 tgpro1 kernel: Pid: 4, comm: kworker/0:0 Tainted: G W 3.1.4-rsbac #1 Nov 30 10:58:25 tgpro1 kernel: Call Trace: Nov 30 10:58:25 tgpro1 kernel: [<0008e776>] print_circular_bug+0x21a/0x227 Nov 30 10:58:25 tgpro1 kernel: [<0008fa3e>] __lock_acquire+0xe79/0x1425 Nov 30 10:58:25 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59 Nov 30 10:58:25 tgpro1 kernel: [<000f7166>] ? igrab+0x11/0x41 Nov 30 10:58:25 tgpro1 kernel: [<0052f147>] _raw_spin_lock+0x24/0x33 Nov 30 10:58:25 tgpro1 kernel: [<000f7166>] ? igrab+0x11/0x41 Nov 30 10:58:25 tgpro1 kernel: [<000f7166>] igrab+0x11/0x41 Nov 30 10:58:25 tgpro1 kernel: [<006b2e14>] iterate_session_caps+0x70/0x17d [ceph] Nov 30 10:58:25 tgpro1 kernel: [<006b7488>] ? ceph_mdsc_submit_request+0x55/0x55 [ceph] Nov 30 10:58:25 tgpro1 kernel: [<00676d03>] ? ceph_pagelist_append+0xbc/0xf9 [libceph] Nov 30 10:58:25 tgpro1 kernel: [<006b5bca>] send_mds_reconnect+0x319/0x46a [ceph] Nov 30 10:58:25 tgpro1 kernel: [<0008daa1>] ? trace_hardirqs_on_caller+0x10b/0x13c Nov 30 10:58:25 tgpro1 kernel: [<0008dadd>] ? trace_hardirqs_on+0xb/0xd Nov 30 10:58:25 tgpro1 kernel: [<006b6035>] ceph_mdsc_handle_map+0x2f2/0x42d [ceph] Nov 30 10:58:25 tgpro1 kernel: [<000d7cd3>] ? __kmalloc+0x10d/0x137 Nov 30 10:58:25 tgpro1 kernel: [<00c40015>] ? 0xc40014 Nov 30 10:58:25 tgpro1 kernel: [<006a000d>] extra_mon_dispatch+0x18/0x1f [ceph] Nov 30 10:58:25 tgpro1 kernel: [<006783dd>] dispatch+0x4f7/0x52a [libceph] Nov 30 10:58:25 tgpro1 kernel: [<0008dadd>] ? trace_hardirqs_on+0xb/0xd Nov 30 10:58:25 tgpro1 kernel: [<0067644d>] con_work+0x14f7/0x16c6 [libceph] Nov 30 10:58:25 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391 Nov 30 10:58:25 tgpro1 kernel: [<0007a1ff>] ? process_one_work+0x1d7/0x391 Nov 30 10:58:25 tgpro1 kernel: [<00674f56>] ? ceph_fault+0x262/0x262 [libceph] Nov 30 10:58:25 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7 Nov 30 10:58:25 tgpro1 kernel: [<00002a00>] ? __show_regs+0x6f/0x6f Nov 30 10:58:25 tgpro1 kernel: [<00002a00>] ? __show_regs+0x6f/0x6f Nov 30 10:58:25 tgpro1 kernel: [<0007a5c5>] ? rescuer_thread+0x20c/0x20c Nov 30 10:58:25 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67 Nov 30 10:58:25 tgpro1 kernel: [<0007d705>] ? __init_kthread_worker+0x42/0x42 Nov 30 10:58:25 tgpro1 kernel: [<00530b06>] kernel_thread_helper+0x6/0xd Nov 30 10:58:26 tgpro1 kernel: ceph: mds0 reconnect success Nov 30 10:58:30 tgpro1 kernel: ceph: mds0 recovery completed (END)
Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: ======================================================= Nov 30 13:02:05 tgpro1 kernel: [ INFO: possible circular locking dependency detected ] Nov 30 13:02:05 tgpro1 kernel: 3.1.4-rsbac #1 Nov 30 13:02:05 tgpro1 kernel: ------------------------------------------------------- Nov 30 13:02:05 tgpro1 kernel: kworker/0:1/22099 is trying to acquire lock: Nov 30 13:02:05 tgpro1 kernel: (&sb->s_type->i_lock_key#16){+.+...}, at: [<000ff721>] igrab+0x11/0x41 Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: but task is already holding lock: Nov 30 13:02:05 tgpro1 kernel: (&(&realm->inodes_with_caps_lock)->rlock){+.+...}, at: [<006ce9b4>] ceph_update_snap_trace+0x258/0x3bc [ceph] Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: which lock already depends on the new lock. Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: the existing dependency chain (in reverse order) is: Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: -> #1 (&(&realm->inodes_with_caps_lock)->rlock){+.+...}: Nov 30 13:02:05 tgpro1 kernel: [<00091456>] lock_acquire+0x42/0x59 Nov 30 13:02:05 tgpro1 kernel: [<0053f187>] _raw_spin_lock+0x24/0x33 Nov 30 13:02:05 tgpro1 kernel: [<006cc0a3>] ceph_add_cap+0x37d/0x554 [ceph] Nov 30 13:02:05 tgpro1 kernel: [<006bfbf7>] fill_inode+0x5ee/0x72d [ceph] Nov 30 13:02:05 tgpro1 kernel: [<006c0d61>] ceph_fill_trace+0x663/0x6c5 [ceph] Nov 30 13:02:05 tgpro1 kernel: [<006d4bd4>] dispatch+0xaff/0x10e9 [ceph] Nov 30 13:02:05 tgpro1 kernel: [<00694459>] con_work+0x14f4/0x16c3 [libceph] Nov 30 13:02:05 tgpro1 kernel: [<0007b331>] process_one_work+0x229/0x397 Nov 30 13:02:05 tgpro1 kernel: [<0007b80e>] worker_thread+0x182/0x2d8 Nov 30 13:02:05 tgpro1 kernel: [<0007e843>] kthread+0x62/0x67 Nov 30 13:02:05 tgpro1 kernel: [<00540c86>] kernel_thread_helper+0x6/0xd Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: -> #0 (&sb->s_type->i_lock_key#16){+.+...}: Nov 30 13:02:05 tgpro1 kernel: [<00090b06>] __lock_acquire+0xe7b/0x1428 Nov 30 13:02:05 tgpro1 kernel: [<00091456>] lock_acquire+0x42/0x59 Nov 30 13:02:05 tgpro1 kernel: [<0053f187>] _raw_spin_lock+0x24/0x33 Nov 30 13:02:05 tgpro1 kernel: [<000ff721>] igrab+0x11/0x41 Nov 30 13:02:05 tgpro1 kernel: [<006ce9d3>] ceph_update_snap_trace+0x277/0x3bc [ceph] Nov 30 13:02:05 tgpro1 kernel: [<006cee4e>] ceph_handle_snap+0x336/0x458 [ceph] Nov 30 13:02:05 tgpro1 kernel: [<006d4e7d>] dispatch+0xda8/0x10e9 [ceph] Nov 30 13:02:05 tgpro1 kernel: [<00694459>] con_work+0x14f4/0x16c3 [libceph] Nov 30 13:02:05 tgpro1 kernel: [<0007b331>] process_one_work+0x229/0x397 Nov 30 13:02:05 tgpro1 kernel: [<0007b80e>] worker_thread+0x182/0x2d8 Nov 30 13:02:05 tgpro1 kernel: [<0007e843>] kthread+0x62/0x67 Nov 30 13:02:05 tgpro1 kernel: [<00540c86>] kernel_thread_helper+0x6/0xd Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: other info that might help us debug this: Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: Possible unsafe locking scenario: Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: CPU0 CPU1 Nov 30 13:02:05 tgpro1 kernel: ---- ---- Nov 30 13:02:05 tgpro1 kernel: lock(&(&realm->inodes_with_caps_lock)->rlock); Nov 30 13:02:05 tgpro1 kernel: lock(&sb->s_type->i_lock_key); Nov 30 13:02:05 tgpro1 kernel: lock(&(&realm->inodes_with_caps_lock)->rlock); Nov 30 13:02:05 tgpro1 kernel: lock(&sb->s_type->i_lock_key); Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: *** DEADLOCK *** Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: 4 locks held by kworker/0:1/22099: Nov 30 13:02:05 tgpro1 kernel: #0: (ceph-msgr){++++.+}, at: [<0007b2e5>] process_one_work+0x1dd/0x397 Nov 30 13:02:05 tgpro1 kernel: #1: ((&(&con->work)->work)){+.+.+.}, at: [<0007b2e5>] process_one_work+0x1dd/0x397 Nov 30 13:02:05 tgpro1 kernel: #2: (&mdsc->snap_rwsem){+++++.}, at: [<006cebe5>] ceph_handle_snap+0xcd/0x458 [ceph] Nov 30 13:02:05 tgpro1 kernel: #3: (&(&realm->inodes_with_caps_lock)->rlock){+.+...}, at: [<006ce9b4>] ceph_update_snap_trace+0x258/0x3bc [ceph] Nov 30 13:02:05 tgpro1 kernel: Nov 30 13:02:05 tgpro1 kernel: stack backtrace: Nov 30 13:02:05 tgpro1 kernel: Pid: 22099, comm: kworker/0:1 Tainted: G W 3.1.4-rsbac #1 Nov 30 13:02:05 tgpro1 kernel: Call Trace: Nov 30 13:02:05 tgpro1 kernel: [<0008f840>] print_circular_bug+0x219/0x226 Nov 30 13:02:05 tgpro1 kernel: [<00090b06>] __lock_acquire+0xe7b/0x1428 Nov 30 13:02:05 tgpro1 kernel: [<00090b54>] ? __lock_acquire+0xec9/0x1428 Nov 30 13:02:05 tgpro1 kernel: [<00091456>] lock_acquire+0x42/0x59 Nov 30 13:02:05 tgpro1 kernel: [<000ff721>] ? igrab+0x11/0x41 Nov 30 13:02:05 tgpro1 kernel: [<0053f187>] _raw_spin_lock+0x24/0x33 Nov 30 13:02:05 tgpro1 kernel: [<000ff721>] ? igrab+0x11/0x41 Nov 30 13:02:05 tgpro1 kernel: [<000ff721>] igrab+0x11/0x41 Nov 30 13:02:05 tgpro1 kernel: [<006ce9d3>] ceph_update_snap_trace+0x277/0x3bc [ceph] Nov 30 13:02:05 tgpro1 kernel: [<006cee4e>] ceph_handle_snap+0x336/0x458 [ceph] Nov 30 13:02:05 tgpro1 kernel: [<007f0312>] ? 0x7f0311 Nov 30 13:02:05 tgpro1 kernel: [<006d4e7d>] dispatch+0xda8/0x10e9 [ceph] Nov 30 13:02:05 tgpro1 kernel: [<0048738c>] ? kernel_recvmsg+0x2a/0x34 Nov 30 13:02:05 tgpro1 kernel: [<0053dbb6>] ? __mutex_unlock_slowpath+0xdf/0xf8 Nov 30 13:02:05 tgpro1 kernel: [<0008eb6f>] ? trace_hardirqs_on_caller+0x10b/0x13c Nov 30 13:02:05 tgpro1 kernel: [<0008ebab>] ? trace_hardirqs_on+0xb/0xd Nov 30 13:02:05 tgpro1 kernel: [<00694459>] con_work+0x14f4/0x16c3 [libceph] Nov 30 13:02:05 tgpro1 kernel: [<0000784e>] ? text_poke+0x79/0x95 Nov 30 13:02:05 tgpro1 kernel: [<0007b331>] process_one_work+0x229/0x397 Nov 30 13:02:05 tgpro1 kernel: [<0007b2e5>] ? process_one_work+0x1dd/0x397 Nov 30 13:02:05 tgpro1 kernel: [<00692f65>] ? ceph_fault+0x262/0x262 [libceph] Nov 30 13:02:05 tgpro1 kernel: [<0007b80e>] worker_thread+0x182/0x2d8 Nov 30 13:02:05 tgpro1 kernel: [<00002b40>] ? setup_sigcontext+0x24/0x24 Nov 30 13:02:05 tgpro1 kernel: [<00002b40>] ? setup_sigcontext+0x24/0x24 Nov 30 13:02:05 tgpro1 kernel: [<0007b68c>] ? rescuer_thread+0x1ed/0x1ed Nov 30 13:02:05 tgpro1 kernel: [<0007e843>] kthread+0x62/0x67 Nov 30 13:02:05 tgpro1 kernel: [<0007e7e1>] ? __init_kthread_worker+0x42/0x42 Nov 30 13:02:05 tgpro1 kernel: [<00540c86>] kernel_thread_helper+0x6/0xd
History
#1 Updated by Sage Weil almost 12 years ago
Argh, this is a real pain. igrab() requires i_lock, which we use extensively to protect complicated changes. In the case of inodes_with_caps_lock and s_cap_lock, we are putting the inode on a list and need to be able to safely iterate over that list later. If the list lock isn't an inner lock, we can't igrab() under it to safely iterate, and it would be very painful to order it outside of i_lock.
#2 Updated by Amon Ott almost 12 years ago
Seems to be fixed here now with git commits be655596b3de5873f994ddbe205751a5ffb4de39 (for-linus) and 1a2fe05d296a35da46febbd57b06e86c193746ff (wip-d-lock).
#3 Updated by Sage Weil almost 12 years ago
- Target version deleted (
v3.2)
#4 Updated by Sage Weil almost 12 years ago
- translation missing: en.field_position set to 1
#5 Updated by Sage Weil almost 12 years ago
- Status changed from New to Resolved
merged into mainline for 3.2