Project

General

Profile

Actions

Feature #9477

closed

Handle kclient shutdown with dead network more gracefully

Added by John Spray over 9 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Introspection/Control
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
kceph
Labels (FS):
Pull request ID:

Description

[ 2281.086454] INFO: task sync:3132 blocked for more than 120 seconds.
[ 2281.092778]       Tainted: G            E  3.17.0-rc5-ceph-00011-g177c131 #1
[ 2281.099911] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2281.107836] sync            D ffff8804256ab900     0  3132   3125 0x00000000
[ 2281.115087]  ffff8800369cfc18 0000000000000046 ffff8800369cfbc8 ffff8800369a0000
[ 2281.122730]  0000000000013b80 ffff8800369cffd8 ffff8800369cfbd8 0000000000013b80
[ 2281.130393]  ffff88042965c3a0 ffff8800369a0000 ffffffff81118503 ffff88043fc944f8
[ 2281.137983] Call Trace:
[ 2281.140515]  [<ffffffff81118503>] ? __delayacct_blkio_start+0x23/0x30
[ 2281.147043]  [<ffffffff8172c9c0>] ? bit_wait+0x50/0x50
[ 2281.152264]  [<ffffffff8172c149>] schedule+0x29/0x70
[ 2281.157309]  [<ffffffff8172c21f>] io_schedule+0x8f/0xd0
[ 2281.162623]  [<ffffffff8172c9eb>] bit_wait_io+0x2b/0x50
[ 2281.167934]  [<ffffffff8172c8b5>] __wait_on_bit+0x65/0x90
[ 2281.173428]  [<ffffffff8115e8fe>] ? find_get_pages_tag+0x2e/0x1e0
[ 2281.179605]  [<ffffffff8115dcc7>] wait_on_page_bit+0xc7/0xd0
[ 2281.185358]  [<ffffffff8109a260>] ? wake_atomic_t_function+0x40/0x40
[ 2281.191802]  [<ffffffff8115e234>] filemap_fdatawait_range+0xf4/0x190
[ 2281.198288]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 2281.204206]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 2281.210135]  [<ffffffff8115e2fb>] filemap_fdatawait+0x2b/0x30
[ 2281.215966]  [<ffffffff811fb949>] sync_inodes_sb+0x1e9/0x280
[ 2281.221712]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 2281.227896]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 2281.234070]  [<ffffffff8120311d>] sync_inodes_one_sb+0x1d/0x30
[ 2281.239983]  [<ffffffff811d43a1>] iterate_supers+0xf1/0x100
[ 2281.245639]  [<ffffffff81203285>] sys_sync+0x35/0x90
[ 2281.250691]  [<ffffffff81732096>] system_call_fastpath+0x1a/0x1f
[ 2281.256778] 1 lock held by sync/3132:
[ 2281.260522]  #0:  (&type->s_umount_key#28){+++++.}, at: [<ffffffff811d4338>] iterate_supers+0x88/0x100
[ 2355.114904] libceph: mds0 10.214.132.128:6816 socket closed (con state CONNECTING)
Reproduce:
  • Mount a kernel cephfs client
  • Block all ceph traffic (I'm using an iptables rule)
  • shutdown -r now
Result:
  • Host becomes un-sshable (openssh server has shut down), but does not complete shutdown
Actions

Also available in: Atom PDF