Project

General

Profile

Bug #17003

possible circular locking dependency detected

Added by Loïc Dachary over 7 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
Normal
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/loic-2016-08-10_06:54:41-rados-jewel-backports---basic-smithi/357421/

smithi012 kernel: [ INFO: possible circular locking dependency detected ] ' in syslog 

this happened in dozens of jobs yesterday.

History

#1 Updated by Zack Cerza over 7 years ago

  • Status changed from New to Need More Info

Is this in the wrong project?

#2 Updated by Loïc Dachary over 7 years ago

I suspect this is infrastructure noise, reason why I added it to the sepia project.

#3 Updated by David Galloway over 7 years ago

  • Project changed from sepia to teuthology
  • Subject changed from smithi: possible circular locking dependency detected to possible circular locking dependency detected
  • Status changed from Need More Info to Fix Under Review
  • Assignee set to David Galloway

#4 Updated by David Galloway over 7 years ago

<sjusthm> loicd: iirc, it's an xfs bug
<sjusthm> whitelist it
<dgalloway> sjusthm, i that something that can be whitelisted on the dev/test end or are you saying it should be added to the teuthology whitelist?
<sjusthm> dgalloway: whatever is required to make it stop failing
<dgalloway> k
<sjusthm> dgalloway loicd: we don't need to fix the bug, we just need to stop it from throwing up failures
<sjusthm> I think it's the teuthology whitelist which throws up these failures, right
<sjusthm> ?
<dgalloway> pretty sure, yeah.
<sjusthm> yep, that one
<sjusthm> basically, that error signifies nothing useful for the test
<sjusthm> so we can ignore it
<sjusthm> unless
<sjusthm> cephfs cares about it for their tests
<sjusthm> don't know about that
<sjusthm> still, best to whitelist it for now

#5 Updated by Zack Cerza over 7 years ago

Based on the above, it seems like we should confirm with the fs team before whitelisting this...

#6 Updated by David Galloway over 7 years ago

Sounds like we want to only whitelist this if it's xfs-related.

Here's the full error from kern.log. I'll see if only grep -v it if it's due to xfs.

2016-08-11T10:49:11.264086+00:00 smithi012 kernel: 
2016-08-11T10:49:11.264138+00:00 smithi012 kernel: ======================================================
2016-08-11T10:49:11.264162+00:00 smithi012 kernel: [ INFO: possible circular locking dependency detected ]
2016-08-11T10:49:11.264180+00:00 smithi012 kernel: 4.8.0-rc1-ceph-00009-gc149a93 #1 Not tainted
2016-08-11T10:49:11.264197+00:00 smithi012 kernel: -------------------------------------------------------
2016-08-11T10:49:11.264212+00:00 smithi012 kernel: tp_fstore_op/24906 is trying to acquire lock:
2016-08-11T10:49:11.264228+00:00 smithi012 kernel: (&sb->s_type->i_mutex_key#17){+.+.+.}, at: [<ffffffffa05162c4>] xfs_file_buffered_aio_write+0x64/0x300 [xfs]
2016-08-11T10:49:11.264246+00:00 smithi012 kernel: #012but task is already holding lock:
2016-08-11T10:49:11.264262+00:00 smithi012 kernel: (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8127a11d>] pipe_lock+0x1d/0x20
2016-08-11T10:49:11.264278+00:00 smithi012 kernel: #012which lock already depends on the new lock.
2016-08-11T10:49:11.264302+00:00 smithi012 kernel: #012the existing dependency chain (in reverse order) is:
2016-08-11T10:49:11.264319+00:00 smithi012 kernel: #012-> #2 (&pipe->mutex/1){+.+.+.}:
2016-08-11T10:49:11.264335+00:00 smithi012 kernel:       [<ffffffff811077a7>] lock_acquire+0x197/0x1f0
2016-08-11T10:49:11.264345+00:00 smithi012 kernel:       [<ffffffff8180fb19>] mutex_lock_nested+0x79/0x370
2016-08-11T10:49:11.264354+00:00 smithi012 kernel:       [<ffffffff8127a11d>] pipe_lock+0x1d/0x20
2016-08-11T10:49:11.264362+00:00 smithi012 kernel:       [<ffffffff812a8842>] splice_to_pipe+0x32/0x260
2016-08-11T10:49:11.264370+00:00 smithi012 kernel:       [<ffffffff812a9eb9>] __generic_file_splice_read+0x4f9/0x620
2016-08-11T10:49:11.264379+00:00 smithi012 kernel:       [<ffffffff812aa01e>] generic_file_splice_read+0x3e/0x70
2016-08-11T10:49:11.264387+00:00 smithi012 kernel:       [<ffffffffa051527b>] xfs_file_splice_read+0x19b/0x220 [xfs]
2016-08-11T10:49:11.264398+00:00 smithi012 kernel:       [<ffffffff812a8e17>] do_splice_to+0x77/0x90
2016-08-11T10:49:11.264406+00:00 smithi012 kernel:       [<ffffffff812ab021>] SyS_splice+0x6c1/0x720
2016-08-11T10:49:11.264415+00:00 smithi012 kernel:       [<ffffffff81003b4a>] do_syscall_64+0x7a/0x1c0
2016-08-11T10:49:11.264423+00:00 smithi012 kernel:       [<ffffffff81813c3f>] return_from_SYSCALL_64+0x0/0x7a
2016-08-11T10:49:11.264450+00:00 smithi012 kernel: #012-> #1 (&(&ip->i_iolock)->mr_lock){++++++}:
2016-08-11T10:49:11.264471+00:00 smithi012 kernel:       [<ffffffff811077a7>] lock_acquire+0x197/0x1f0
2016-08-11T10:49:11.264489+00:00 smithi012 kernel:       [<ffffffff81101ccf>] down_write_nested+0x4f/0x80
2016-08-11T10:49:11.264507+00:00 smithi012 kernel:       [<ffffffffa052421b>] xfs_ilock+0x11b/0x1d0 [xfs]
2016-08-11T10:49:11.264524+00:00 smithi012 kernel:       [<ffffffffa0522600>] xfs_vn_setattr+0x30/0x70 [xfs]
2016-08-11T10:49:11.264542+00:00 smithi012 kernel:       [<ffffffff81292233>] notify_change+0x263/0x3a0
2016-08-11T10:49:11.264598+00:00 smithi012 kernel:       [<ffffffff8126ebfa>] do_truncate+0x7a/0xc0
2016-08-11T10:49:11.264632+00:00 smithi012 kernel:       [<ffffffff8126f003>] do_sys_ftruncate.constprop.12+0x153/0x170
2016-08-11T10:49:11.264654+00:00 smithi012 kernel:       [<ffffffff8126f05e>] SyS_ftruncate+0xe/0x10
2016-08-11T10:49:11.264671+00:00 smithi012 kernel:       [<ffffffff81003b4a>] do_syscall_64+0x7a/0x1c0
2016-08-11T10:49:11.264689+00:00 smithi012 kernel:       [<ffffffff81813c3f>] return_from_SYSCALL_64+0x0/0x7a
2016-08-11T10:49:11.264722+00:00 smithi012 kernel: #012-> #0 (&sb->s_type->i_mutex_key#17){+.+.+.}:
2016-08-11T10:49:11.264741+00:00 smithi012 kernel:       [<ffffffff81106ab3>] __lock_acquire+0xe73/0x1680
2016-08-11T10:49:11.264760+00:00 smithi012 kernel:       [<ffffffff811077a7>] lock_acquire+0x197/0x1f0
2016-08-11T10:49:11.264778+00:00 smithi012 kernel:       [<ffffffff81810e19>] down_write+0x49/0x80
2016-08-11T10:49:11.264796+00:00 smithi012 kernel:       [<ffffffffa05162c4>] xfs_file_buffered_aio_write+0x64/0x300 [xfs]
2016-08-11T10:49:11.264814+00:00 smithi012 kernel:       [<ffffffffa05165ec>] xfs_file_write_iter+0x8c/0x110 [xfs]
2016-08-11T10:49:11.264832+00:00 smithi012 kernel:       [<ffffffff81270e7a>] vfs_iter_write+0xaa/0xf0
2016-08-11T10:49:11.264850+00:00 smithi012 kernel:       [<ffffffff812a97f1>] iter_file_splice_write+0x281/0x3a0
2016-08-11T10:49:11.264868+00:00 smithi012 kernel:       [<ffffffff812aae3a>] SyS_splice+0x4da/0x720
2016-08-11T10:49:11.264889+00:00 smithi012 kernel:       [<ffffffff81003b4a>] do_syscall_64+0x7a/0x1c0
2016-08-11T10:49:11.264907+00:00 smithi012 kernel:       [<ffffffff81813c3f>] return_from_SYSCALL_64+0x0/0x7a
2016-08-11T10:49:11.264925+00:00 smithi012 kernel: #012other info that might help us debug this:
2016-08-11T10:49:11.264944+00:00 smithi012 kernel: Chain exists of:#012  &sb->s_type->i_mutex_key#17 --> &(&ip->i_iolock)->mr_lock --> &pipe->mutex/1
2016-08-11T10:49:11.264962+00:00 smithi012 kernel: Possible unsafe locking scenario:
2016-08-11T10:49:11.264980+00:00 smithi012 kernel:       CPU0                    CPU1
2016-08-11T10:49:11.264998+00:00 smithi012 kernel:       ----                    ----
2016-08-11T10:49:11.265016+00:00 smithi012 kernel:  lock(&pipe->mutex/1);
2016-08-11T10:49:11.265039+00:00 smithi012 kernel:                               lock(&(&ip->i_iolock)->mr_lock);
2016-08-11T10:49:11.265057+00:00 smithi012 kernel:                               lock(&pipe->mutex/1);
2016-08-11T10:49:11.265079+00:00 smithi012 kernel:  lock(&sb->s_type->i_mutex_key#17);
2016-08-11T10:49:11.265102+00:00 smithi012 kernel: #012 *** DEADLOCK ***
2016-08-11T10:49:11.265132+00:00 smithi012 kernel: 2 locks held by tp_fstore_op/24906:
2016-08-11T10:49:11.265152+00:00 smithi012 kernel: #0:  (sb_writers#14){.+.+.+}, at: [<ffffffff8127470d>] __sb_start_write+0xad/0xe0
2016-08-11T10:49:11.265170+00:00 smithi012 kernel: #1:  (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8127a11d>] pipe_lock+0x1d/0x20
2016-08-11T10:49:11.265203+00:00 smithi012 kernel: #012stack backtrace:
2016-08-11T10:49:11.265222+00:00 smithi012 kernel: CPU: 3 PID: 24906 Comm: tp_fstore_op Not tainted 4.8.0-rc1-ceph-00009-gc149a93 #1
2016-08-11T10:49:11.265240+00:00 smithi012 kernel: Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
2016-08-11T10:49:11.265258+00:00 smithi012 kernel: 0000000000000086 0000000095b383ec ffff88084753fb40 ffffffff81413ee1
2016-08-11T10:49:11.265276+00:00 smithi012 kernel: ffffffff828f0460 ffffffff828ef530 ffff88084753fb80 ffffffff811d40dc
2016-08-11T10:49:11.265298+00:00 smithi012 kernel: ffff88084753fbc0 ffff88081ca208d8 ffff88081ca20000 0000000000000002
2016-08-11T10:49:11.265317+00:00 smithi012 kernel: Call Trace:
2016-08-11T10:49:11.265335+00:00 smithi012 kernel: [<ffffffff81413ee1>] dump_stack+0x85/0xc4
2016-08-11T10:49:11.265352+00:00 smithi012 kernel: [<ffffffff811d40dc>] print_circular_bug+0x200/0x20e
2016-08-11T10:49:11.265370+00:00 smithi012 kernel: [<ffffffff81106ab3>] __lock_acquire+0xe73/0x1680
2016-08-11T10:49:11.265388+00:00 smithi012 kernel: [<ffffffff811077a7>] lock_acquire+0x197/0x1f0
2016-08-11T10:49:11.265405+00:00 smithi012 kernel: [<ffffffffa05162c4>] ? xfs_file_buffered_aio_write+0x64/0x300 [xfs]
2016-08-11T10:49:11.265423+00:00 smithi012 kernel: [<ffffffff81810e19>] down_write+0x49/0x80
2016-08-11T10:49:11.265441+00:00 smithi012 kernel: [<ffffffffa05162c4>] ? xfs_file_buffered_aio_write+0x64/0x300 [xfs]
2016-08-11T10:49:11.265462+00:00 smithi012 kernel: [<ffffffffa05162c4>] xfs_file_buffered_aio_write+0x64/0x300 [xfs]
2016-08-11T10:49:11.265480+00:00 smithi012 kernel: [<ffffffff81105863>] ? trace_hardirqs_on_caller+0x163/0x1d0
2016-08-11T10:49:11.265498+00:00 smithi012 kernel: [<ffffffffa05165ec>] xfs_file_write_iter+0x8c/0x110 [xfs]
2016-08-11T10:49:11.265515+00:00 smithi012 kernel: [<ffffffff81270e7a>] vfs_iter_write+0xaa/0xf0
2016-08-11T10:49:11.265533+00:00 smithi012 kernel: [<ffffffff812a97f1>] iter_file_splice_write+0x281/0x3a0
2016-08-11T10:49:11.265562+00:00 smithi012 kernel: [<ffffffff812aae3a>] SyS_splice+0x4da/0x720
2016-08-11T10:49:11.265588+00:00 smithi012 kernel: [<ffffffff81003713>] ? syscall_trace_enter+0x2f3/0x370
2016-08-11T10:49:11.265605+00:00 smithi012 kernel: [<ffffffff81003b4a>] do_syscall_64+0x7a/0x1c0
2016-08-11T10:49:11.265623+00:00 smithi012 kernel: [<ffffffff81813c3f>] entry_SYSCALL64_slow_path+0x25/0x25

#7 Updated by Ilya Dryomov over 7 years ago

No need to whitelist. This is not a k* run (krbd, kcephfs, etc), so it should have been scheduled with a -k distro option. lockdep, which emits these warnings, is enabled on our testing kernels; distro kernel have it disabled.

#8 Updated by Loïc Dachary over 7 years ago

Oh, I get it now ! I thought the -k distro was the default but the default behavior is to keep whatever kernel has been installed last, hence the transient errors. Is that right ?

#9 Updated by Ilya Dryomov over 7 years ago

That's right. crontab runs all use -k distro where appropriate (or so I think - we've been over this before with Yuri).

#10 Updated by Loïc Dachary over 7 years ago

  • Status changed from Fix Under Review to Rejected

#11 Updated by Jeff Layton over 7 years ago

FWIW, this is a known problem, still in mainline kenrels with how splice() handles locking. See:

http://oss.sgi.com/pipermail/xfs/2016-March/047694.html

Also available in: Atom PDF