Project

General

Profile

Actions

Feature #9477

closed

Handle kclient shutdown with dead network more gracefully

Added by John Spray over 9 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Introspection/Control
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
kceph
Labels (FS):
Pull request ID:

Description

[ 2281.086454] INFO: task sync:3132 blocked for more than 120 seconds.
[ 2281.092778]       Tainted: G            E  3.17.0-rc5-ceph-00011-g177c131 #1
[ 2281.099911] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2281.107836] sync            D ffff8804256ab900     0  3132   3125 0x00000000
[ 2281.115087]  ffff8800369cfc18 0000000000000046 ffff8800369cfbc8 ffff8800369a0000
[ 2281.122730]  0000000000013b80 ffff8800369cffd8 ffff8800369cfbd8 0000000000013b80
[ 2281.130393]  ffff88042965c3a0 ffff8800369a0000 ffffffff81118503 ffff88043fc944f8
[ 2281.137983] Call Trace:
[ 2281.140515]  [<ffffffff81118503>] ? __delayacct_blkio_start+0x23/0x30
[ 2281.147043]  [<ffffffff8172c9c0>] ? bit_wait+0x50/0x50
[ 2281.152264]  [<ffffffff8172c149>] schedule+0x29/0x70
[ 2281.157309]  [<ffffffff8172c21f>] io_schedule+0x8f/0xd0
[ 2281.162623]  [<ffffffff8172c9eb>] bit_wait_io+0x2b/0x50
[ 2281.167934]  [<ffffffff8172c8b5>] __wait_on_bit+0x65/0x90
[ 2281.173428]  [<ffffffff8115e8fe>] ? find_get_pages_tag+0x2e/0x1e0
[ 2281.179605]  [<ffffffff8115dcc7>] wait_on_page_bit+0xc7/0xd0
[ 2281.185358]  [<ffffffff8109a260>] ? wake_atomic_t_function+0x40/0x40
[ 2281.191802]  [<ffffffff8115e234>] filemap_fdatawait_range+0xf4/0x190
[ 2281.198288]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 2281.204206]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 2281.210135]  [<ffffffff8115e2fb>] filemap_fdatawait+0x2b/0x30
[ 2281.215966]  [<ffffffff811fb949>] sync_inodes_sb+0x1e9/0x280
[ 2281.221712]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 2281.227896]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 2281.234070]  [<ffffffff8120311d>] sync_inodes_one_sb+0x1d/0x30
[ 2281.239983]  [<ffffffff811d43a1>] iterate_supers+0xf1/0x100
[ 2281.245639]  [<ffffffff81203285>] sys_sync+0x35/0x90
[ 2281.250691]  [<ffffffff81732096>] system_call_fastpath+0x1a/0x1f
[ 2281.256778] 1 lock held by sync/3132:
[ 2281.260522]  #0:  (&type->s_umount_key#28){+++++.}, at: [<ffffffff811d4338>] iterate_supers+0x88/0x100
[ 2355.114904] libceph: mds0 10.214.132.128:6816 socket closed (con state CONNECTING)
Reproduce:
  • Mount a kernel cephfs client
  • Block all ceph traffic (I'm using an iptables rule)
  • shutdown -r now
Result:
  • Host becomes un-sshable (openssh server has shut down), but does not complete shutdown
Actions #1

Updated by John Spray over 9 years ago

  • Subject changed from kclient prevents reboot when network is dead to kclient prevents shutdown/reboot when network is dead
  • Description updated (diff)
Actions #2

Updated by John Spray over 9 years ago

Ah, this only happens if I have some dirty state from userspace at the time. In this instance it's my Mount.open_background routine from ceph-qa-suite/tasks/filesystem/mount.py:

            import time

            f = open("{path}", 'w')
            f.write('content')
            f.flush()
            f.write('content2')
            while True:
                time.sleep(1)

...aaand here's a slightly more complete console log. After this it repeats the "blocked for more than..." dump.

[ 1102.303850] libceph: client4134 fsid 86b8eeb8-25a3-46de-9aa3-a3119ca78128
[ 1102.313948] libceph: mon0 10.214.132.128:6789 session established
[ 1106.379213] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 1107.504720] init: idmapd main process (830) killed by TERM signal
[ 1107.513234] init: tty4 main process (1114) killed by TERM signal
[ 1107.519597] init: tty5 main process (1120) killed by TERM signal
[ 1107.526005] init: tty2 main process (1127) killed by TERM signal
[ 1107.532507] init: tty3 main process (1128) killed by TERM signal
[ 1107.538940] init: tty6 main process (1130) killed by TERM signal
[ 1107.545384] init: cron main process (1145) killed by TERM signal
[ 1107.552122] init: irqbalance main process (1162) killed by TERM signal
[ 1107.559293] init: ttyS2 main process (1518) killed by TERM signal
[ 1107.565893] init: tty1 main process (1519) killed by TERM signal
[ 1107.572364] init: ttyS1 main process (1520) killed by TERM signal
 * Stopping Dist[ 1107.579808] init: plymouth-upstart-bridge main process (3204) terminated with status 1
ributed Compiler Daemon: distccd                                         [ OK ]
                                                                         [ OK ] * Stopping nagios-nrpe nagios-nrpe
                                                                               Stopping PKCS#11 slot daemon: pkcsslotd.
                                                                                                                       No /usr/bin/radosgw found running; none killed.
                                        * Stopping System status server r[ OK ]
                                                                         [ OK ] * Stopping Trusted Computing daemon tcsd
                                                                                * Unmounted debugfs from /sys/kernel/debug
                                                                                                                           * Stopping quota service rpc.rquotad                                         [ OK ]
                                                                         [ OK ] * Stopping NFS kernel daemon
                                                                               [ 1107.833517] nfsd: last server has exited, flushing export cache
 * Unexporting directories for NFS kernel daemon...                      [ OK ]
                                                                         [ OK ] * Turning off quotas...
                                                                               [ 1109.031451] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1110.386623] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1112.385543] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1115.388049] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1120.389271] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1129.384467] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1146.399377] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1172.393523] ceph: mds0 caps stale
[ 1179.405782] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1244.395107] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1319.962782] INFO: task sync:3382 blocked for more than 120 seconds.
[ 1319.969152]       Tainted: G            E  3.17.0-rc5-ceph-00011-g177c131 #1
[ 1319.976283] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1319.984210] sync            D ffff880425bde3c0     0  3382   3375 0x00000000
[ 1319.991392]  ffff880427447c18 0000000000000046 ffff880427447bc8 ffff880426698000
[ 1319.999040]  0000000000013b80 ffff880427447fd8 ffff880427447bd8 0000000000013b80
[ 1320.006684]  ffff88042966c3a0 ffff880426698000 ffffffff81118503 ffff88043fd544f8
[ 1320.014269] Call Trace:
[ 1320.016807]  [<ffffffff81118503>] ? __delayacct_blkio_start+0x23/0x30
[ 1320.023339]  [<ffffffff8172c9c0>] ? bit_wait+0x50/0x50
[ 1320.028565]  [<ffffffff8172c149>] schedule+0x29/0x70
[ 1320.033612]  [<ffffffff8172c21f>] io_schedule+0x8f/0xd0
[ 1320.038923]  [<ffffffff8172c9eb>] bit_wait_io+0x2b/0x50
[ 1320.044233]  [<ffffffff8172c8b5>] __wait_on_bit+0x65/0x90
[ 1320.049720]  [<ffffffff8115e8fe>] ? find_get_pages_tag+0x2e/0x1e0
[ 1320.055898]  [<ffffffff8115dcc7>] wait_on_page_bit+0xc7/0xd0
[ 1320.061644]  [<ffffffff8109a260>] ? wake_atomic_t_function+0x40/0x40
[ 1320.068086]  [<ffffffff8115e234>] filemap_fdatawait_range+0xf4/0x190
[ 1320.074533]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 1320.080450]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 1320.086366]  [<ffffffff8115e2fb>] filemap_fdatawait+0x2b/0x30
[ 1320.092199]  [<ffffffff811fb949>] sync_inodes_sb+0x1e9/0x280
[ 1320.097943]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 1320.104116]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 1320.110297]  [<ffffffff8120311d>] sync_inodes_one_sb+0x1d/0x30
[ 1320.116216]  [<ffffffff811d43a1>] iterate_supers+0xf1/0x100
[ 1320.121871]  [<ffffffff81203285>] sys_sync+0x35/0x90
[ 1320.126926]  [<ffffffff81732096>] system_call_fastpath+0x1a/0x1f
[ 1320.133012] 1 lock held by sync/3382:
[ 1320.136753]  #0:  (&type->s_umount_key#28){+++++.}, at: [<ffffffff811d4338>] iterate_supers+0x88/0x100
[ 1373.606150] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1402.270808] ceph: mds0 hung
Actions #3

Updated by John Spray over 9 years ago

  • Project changed from Ceph to CephFS
  • Category set to 53
Actions #4

Updated by John Spray over 9 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from kclient prevents shutdown/reboot when network is dead to Handle kclient shutdown with dead network more gracefully

In the general case (e.g. root filesystem is cephfs) there's nothing we can do: the system can't shut down until the filesystem data can be flushed. But for typical distributed filesystem use cases, there are things we can do to be more helpful:

  • Hook into userspace init system to try remounting cephfs as RO before shutting down the SSH daemon
  • Detect if the network has been shut down in userspace and assume it will be unavailable thereafter (so "crash" the filesystem in response)
  • Log to syslog before shutting down userspace services to indicate that we are going into shutdown with dirty data in the filesystem
  • Print a more descriptive message to the console while shutdown is stuck, like "Cannot complete shutdown because Ceph client caches are dirty, waiting..."
Actions #5

Updated by Vasu Kulkarni almost 8 years ago

All the above 4 options look interesting, but if we can print the 4 option in the meantime that would be very useful.

Actions #6

Updated by Greg Farnum almost 8 years ago

  • Component(FS) kceph added
Actions #7

Updated by Greg Farnum almost 8 years ago

  • Category changed from 53 to Introspection/Control
Actions #8

Updated by Zheng Yan over 4 years ago

  • Status changed from New to Closed

this can be handled by 'umount -f'

Actions

Also available in: Atom PDF