Bug #17003
possible circular locking dependency detected
0%
Description
http://pulpito.ceph.com/loic-2016-08-10_06:54:41-rados-jewel-backports---basic-smithi/357421/
smithi012 kernel: [ INFO: possible circular locking dependency detected ] ' in syslog
this happened in dozens of jobs yesterday.
History
#1 Updated by Zack Cerza over 7 years ago
- Status changed from New to Need More Info
Is this in the wrong project?
#2 Updated by Loïc Dachary over 7 years ago
I suspect this is infrastructure noise, reason why I added it to the sepia project.
#3 Updated by David Galloway over 7 years ago
- Project changed from sepia to teuthology
- Subject changed from smithi: possible circular locking dependency detected to possible circular locking dependency detected
- Status changed from Need More Info to Fix Under Review
- Assignee set to David Galloway
#4 Updated by David Galloway over 7 years ago
<sjusthm> loicd: iirc, it's an xfs bug <sjusthm> whitelist it <dgalloway> sjusthm, i that something that can be whitelisted on the dev/test end or are you saying it should be added to the teuthology whitelist? <sjusthm> dgalloway: whatever is required to make it stop failing <dgalloway> k <sjusthm> dgalloway loicd: we don't need to fix the bug, we just need to stop it from throwing up failures <sjusthm> I think it's the teuthology whitelist which throws up these failures, right <sjusthm> ? <dgalloway> pretty sure, yeah. <sjusthm> yep, that one <sjusthm> basically, that error signifies nothing useful for the test <sjusthm> so we can ignore it <sjusthm> unless <sjusthm> cephfs cares about it for their tests <sjusthm> don't know about that <sjusthm> still, best to whitelist it for now
#5 Updated by Zack Cerza over 7 years ago
Based on the above, it seems like we should confirm with the fs team before whitelisting this...
#6 Updated by David Galloway over 7 years ago
Sounds like we want to only whitelist this if it's xfs-related.
Here's the full error from kern.log. I'll see if only grep -v
it if it's due to xfs.
2016-08-11T10:49:11.264086+00:00 smithi012 kernel: 2016-08-11T10:49:11.264138+00:00 smithi012 kernel: ====================================================== 2016-08-11T10:49:11.264162+00:00 smithi012 kernel: [ INFO: possible circular locking dependency detected ] 2016-08-11T10:49:11.264180+00:00 smithi012 kernel: 4.8.0-rc1-ceph-00009-gc149a93 #1 Not tainted 2016-08-11T10:49:11.264197+00:00 smithi012 kernel: ------------------------------------------------------- 2016-08-11T10:49:11.264212+00:00 smithi012 kernel: tp_fstore_op/24906 is trying to acquire lock: 2016-08-11T10:49:11.264228+00:00 smithi012 kernel: (&sb->s_type->i_mutex_key#17){+.+.+.}, at: [<ffffffffa05162c4>] xfs_file_buffered_aio_write+0x64/0x300 [xfs] 2016-08-11T10:49:11.264246+00:00 smithi012 kernel: #012but task is already holding lock: 2016-08-11T10:49:11.264262+00:00 smithi012 kernel: (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8127a11d>] pipe_lock+0x1d/0x20 2016-08-11T10:49:11.264278+00:00 smithi012 kernel: #012which lock already depends on the new lock. 2016-08-11T10:49:11.264302+00:00 smithi012 kernel: #012the existing dependency chain (in reverse order) is: 2016-08-11T10:49:11.264319+00:00 smithi012 kernel: #012-> #2 (&pipe->mutex/1){+.+.+.}: 2016-08-11T10:49:11.264335+00:00 smithi012 kernel: [<ffffffff811077a7>] lock_acquire+0x197/0x1f0 2016-08-11T10:49:11.264345+00:00 smithi012 kernel: [<ffffffff8180fb19>] mutex_lock_nested+0x79/0x370 2016-08-11T10:49:11.264354+00:00 smithi012 kernel: [<ffffffff8127a11d>] pipe_lock+0x1d/0x20 2016-08-11T10:49:11.264362+00:00 smithi012 kernel: [<ffffffff812a8842>] splice_to_pipe+0x32/0x260 2016-08-11T10:49:11.264370+00:00 smithi012 kernel: [<ffffffff812a9eb9>] __generic_file_splice_read+0x4f9/0x620 2016-08-11T10:49:11.264379+00:00 smithi012 kernel: [<ffffffff812aa01e>] generic_file_splice_read+0x3e/0x70 2016-08-11T10:49:11.264387+00:00 smithi012 kernel: [<ffffffffa051527b>] xfs_file_splice_read+0x19b/0x220 [xfs] 2016-08-11T10:49:11.264398+00:00 smithi012 kernel: [<ffffffff812a8e17>] do_splice_to+0x77/0x90 2016-08-11T10:49:11.264406+00:00 smithi012 kernel: [<ffffffff812ab021>] SyS_splice+0x6c1/0x720 2016-08-11T10:49:11.264415+00:00 smithi012 kernel: [<ffffffff81003b4a>] do_syscall_64+0x7a/0x1c0 2016-08-11T10:49:11.264423+00:00 smithi012 kernel: [<ffffffff81813c3f>] return_from_SYSCALL_64+0x0/0x7a 2016-08-11T10:49:11.264450+00:00 smithi012 kernel: #012-> #1 (&(&ip->i_iolock)->mr_lock){++++++}: 2016-08-11T10:49:11.264471+00:00 smithi012 kernel: [<ffffffff811077a7>] lock_acquire+0x197/0x1f0 2016-08-11T10:49:11.264489+00:00 smithi012 kernel: [<ffffffff81101ccf>] down_write_nested+0x4f/0x80 2016-08-11T10:49:11.264507+00:00 smithi012 kernel: [<ffffffffa052421b>] xfs_ilock+0x11b/0x1d0 [xfs] 2016-08-11T10:49:11.264524+00:00 smithi012 kernel: [<ffffffffa0522600>] xfs_vn_setattr+0x30/0x70 [xfs] 2016-08-11T10:49:11.264542+00:00 smithi012 kernel: [<ffffffff81292233>] notify_change+0x263/0x3a0 2016-08-11T10:49:11.264598+00:00 smithi012 kernel: [<ffffffff8126ebfa>] do_truncate+0x7a/0xc0 2016-08-11T10:49:11.264632+00:00 smithi012 kernel: [<ffffffff8126f003>] do_sys_ftruncate.constprop.12+0x153/0x170 2016-08-11T10:49:11.264654+00:00 smithi012 kernel: [<ffffffff8126f05e>] SyS_ftruncate+0xe/0x10 2016-08-11T10:49:11.264671+00:00 smithi012 kernel: [<ffffffff81003b4a>] do_syscall_64+0x7a/0x1c0 2016-08-11T10:49:11.264689+00:00 smithi012 kernel: [<ffffffff81813c3f>] return_from_SYSCALL_64+0x0/0x7a 2016-08-11T10:49:11.264722+00:00 smithi012 kernel: #012-> #0 (&sb->s_type->i_mutex_key#17){+.+.+.}: 2016-08-11T10:49:11.264741+00:00 smithi012 kernel: [<ffffffff81106ab3>] __lock_acquire+0xe73/0x1680 2016-08-11T10:49:11.264760+00:00 smithi012 kernel: [<ffffffff811077a7>] lock_acquire+0x197/0x1f0 2016-08-11T10:49:11.264778+00:00 smithi012 kernel: [<ffffffff81810e19>] down_write+0x49/0x80 2016-08-11T10:49:11.264796+00:00 smithi012 kernel: [<ffffffffa05162c4>] xfs_file_buffered_aio_write+0x64/0x300 [xfs] 2016-08-11T10:49:11.264814+00:00 smithi012 kernel: [<ffffffffa05165ec>] xfs_file_write_iter+0x8c/0x110 [xfs] 2016-08-11T10:49:11.264832+00:00 smithi012 kernel: [<ffffffff81270e7a>] vfs_iter_write+0xaa/0xf0 2016-08-11T10:49:11.264850+00:00 smithi012 kernel: [<ffffffff812a97f1>] iter_file_splice_write+0x281/0x3a0 2016-08-11T10:49:11.264868+00:00 smithi012 kernel: [<ffffffff812aae3a>] SyS_splice+0x4da/0x720 2016-08-11T10:49:11.264889+00:00 smithi012 kernel: [<ffffffff81003b4a>] do_syscall_64+0x7a/0x1c0 2016-08-11T10:49:11.264907+00:00 smithi012 kernel: [<ffffffff81813c3f>] return_from_SYSCALL_64+0x0/0x7a 2016-08-11T10:49:11.264925+00:00 smithi012 kernel: #012other info that might help us debug this: 2016-08-11T10:49:11.264944+00:00 smithi012 kernel: Chain exists of:#012 &sb->s_type->i_mutex_key#17 --> &(&ip->i_iolock)->mr_lock --> &pipe->mutex/1 2016-08-11T10:49:11.264962+00:00 smithi012 kernel: Possible unsafe locking scenario: 2016-08-11T10:49:11.264980+00:00 smithi012 kernel: CPU0 CPU1 2016-08-11T10:49:11.264998+00:00 smithi012 kernel: ---- ---- 2016-08-11T10:49:11.265016+00:00 smithi012 kernel: lock(&pipe->mutex/1); 2016-08-11T10:49:11.265039+00:00 smithi012 kernel: lock(&(&ip->i_iolock)->mr_lock); 2016-08-11T10:49:11.265057+00:00 smithi012 kernel: lock(&pipe->mutex/1); 2016-08-11T10:49:11.265079+00:00 smithi012 kernel: lock(&sb->s_type->i_mutex_key#17); 2016-08-11T10:49:11.265102+00:00 smithi012 kernel: #012 *** DEADLOCK *** 2016-08-11T10:49:11.265132+00:00 smithi012 kernel: 2 locks held by tp_fstore_op/24906: 2016-08-11T10:49:11.265152+00:00 smithi012 kernel: #0: (sb_writers#14){.+.+.+}, at: [<ffffffff8127470d>] __sb_start_write+0xad/0xe0 2016-08-11T10:49:11.265170+00:00 smithi012 kernel: #1: (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8127a11d>] pipe_lock+0x1d/0x20 2016-08-11T10:49:11.265203+00:00 smithi012 kernel: #012stack backtrace: 2016-08-11T10:49:11.265222+00:00 smithi012 kernel: CPU: 3 PID: 24906 Comm: tp_fstore_op Not tainted 4.8.0-rc1-ceph-00009-gc149a93 #1 2016-08-11T10:49:11.265240+00:00 smithi012 kernel: Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 2016-08-11T10:49:11.265258+00:00 smithi012 kernel: 0000000000000086 0000000095b383ec ffff88084753fb40 ffffffff81413ee1 2016-08-11T10:49:11.265276+00:00 smithi012 kernel: ffffffff828f0460 ffffffff828ef530 ffff88084753fb80 ffffffff811d40dc 2016-08-11T10:49:11.265298+00:00 smithi012 kernel: ffff88084753fbc0 ffff88081ca208d8 ffff88081ca20000 0000000000000002 2016-08-11T10:49:11.265317+00:00 smithi012 kernel: Call Trace: 2016-08-11T10:49:11.265335+00:00 smithi012 kernel: [<ffffffff81413ee1>] dump_stack+0x85/0xc4 2016-08-11T10:49:11.265352+00:00 smithi012 kernel: [<ffffffff811d40dc>] print_circular_bug+0x200/0x20e 2016-08-11T10:49:11.265370+00:00 smithi012 kernel: [<ffffffff81106ab3>] __lock_acquire+0xe73/0x1680 2016-08-11T10:49:11.265388+00:00 smithi012 kernel: [<ffffffff811077a7>] lock_acquire+0x197/0x1f0 2016-08-11T10:49:11.265405+00:00 smithi012 kernel: [<ffffffffa05162c4>] ? xfs_file_buffered_aio_write+0x64/0x300 [xfs] 2016-08-11T10:49:11.265423+00:00 smithi012 kernel: [<ffffffff81810e19>] down_write+0x49/0x80 2016-08-11T10:49:11.265441+00:00 smithi012 kernel: [<ffffffffa05162c4>] ? xfs_file_buffered_aio_write+0x64/0x300 [xfs] 2016-08-11T10:49:11.265462+00:00 smithi012 kernel: [<ffffffffa05162c4>] xfs_file_buffered_aio_write+0x64/0x300 [xfs] 2016-08-11T10:49:11.265480+00:00 smithi012 kernel: [<ffffffff81105863>] ? trace_hardirqs_on_caller+0x163/0x1d0 2016-08-11T10:49:11.265498+00:00 smithi012 kernel: [<ffffffffa05165ec>] xfs_file_write_iter+0x8c/0x110 [xfs] 2016-08-11T10:49:11.265515+00:00 smithi012 kernel: [<ffffffff81270e7a>] vfs_iter_write+0xaa/0xf0 2016-08-11T10:49:11.265533+00:00 smithi012 kernel: [<ffffffff812a97f1>] iter_file_splice_write+0x281/0x3a0 2016-08-11T10:49:11.265562+00:00 smithi012 kernel: [<ffffffff812aae3a>] SyS_splice+0x4da/0x720 2016-08-11T10:49:11.265588+00:00 smithi012 kernel: [<ffffffff81003713>] ? syscall_trace_enter+0x2f3/0x370 2016-08-11T10:49:11.265605+00:00 smithi012 kernel: [<ffffffff81003b4a>] do_syscall_64+0x7a/0x1c0 2016-08-11T10:49:11.265623+00:00 smithi012 kernel: [<ffffffff81813c3f>] entry_SYSCALL64_slow_path+0x25/0x25
#7 Updated by Ilya Dryomov over 7 years ago
No need to whitelist. This is not a k* run (krbd, kcephfs, etc), so it should have been scheduled with a -k distro option. lockdep, which emits these warnings, is enabled on our testing kernels; distro kernel have it disabled.
#8 Updated by Loïc Dachary over 7 years ago
Oh, I get it now ! I thought the -k distro was the default but the default behavior is to keep whatever kernel has been installed last, hence the transient errors. Is that right ?
#9 Updated by Ilya Dryomov over 7 years ago
That's right. crontab runs all use -k distro where appropriate (or so I think - we've been over this before with Yuri).
#10 Updated by Loïc Dachary over 7 years ago
- Status changed from Fix Under Review to Rejected
#11 Updated by Jeff Layton over 7 years ago
FWIW, this is a known problem, still in mainline kenrels with how splice() handles locking. See:
http://oss.sgi.com/pipermail/xfs/2016-March/047694.html