Feature #9477: Handle kclient shutdown with dead network more gracefully - CephFS - Ceph

Actions

Copy link

Feature #9477

closed

Handle kclient shutdown with dead network more gracefully

Added by John Spray over 9 years ago. Updated over 4 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Introspection/Control

Target version:

% Done:

Source:

other

Tags:

Backport:

Reviewed:

Affected Versions:

Component(FS):

kceph

Labels (FS):

Pull request ID:

Description

[ 2281.086454] INFO: task sync:3132 blocked for more than 120 seconds.
[ 2281.092778]       Tainted: G            E  3.17.0-rc5-ceph-00011-g177c131 #1
[ 2281.099911] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2281.107836] sync            D ffff8804256ab900     0  3132   3125 0x00000000
[ 2281.115087]  ffff8800369cfc18 0000000000000046 ffff8800369cfbc8 ffff8800369a0000
[ 2281.122730]  0000000000013b80 ffff8800369cffd8 ffff8800369cfbd8 0000000000013b80
[ 2281.130393]  ffff88042965c3a0 ffff8800369a0000 ffffffff81118503 ffff88043fc944f8
[ 2281.137983] Call Trace:
[ 2281.140515]  [<ffffffff81118503>] ? __delayacct_blkio_start+0x23/0x30
[ 2281.147043]  [<ffffffff8172c9c0>] ? bit_wait+0x50/0x50
[ 2281.152264]  [<ffffffff8172c149>] schedule+0x29/0x70
[ 2281.157309]  [<ffffffff8172c21f>] io_schedule+0x8f/0xd0
[ 2281.162623]  [<ffffffff8172c9eb>] bit_wait_io+0x2b/0x50
[ 2281.167934]  [<ffffffff8172c8b5>] __wait_on_bit+0x65/0x90
[ 2281.173428]  [<ffffffff8115e8fe>] ? find_get_pages_tag+0x2e/0x1e0
[ 2281.179605]  [<ffffffff8115dcc7>] wait_on_page_bit+0xc7/0xd0
[ 2281.185358]  [<ffffffff8109a260>] ? wake_atomic_t_function+0x40/0x40
[ 2281.191802]  [<ffffffff8115e234>] filemap_fdatawait_range+0xf4/0x190
[ 2281.198288]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 2281.204206]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 2281.210135]  [<ffffffff8115e2fb>] filemap_fdatawait+0x2b/0x30
[ 2281.215966]  [<ffffffff811fb949>] sync_inodes_sb+0x1e9/0x280
[ 2281.221712]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 2281.227896]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 2281.234070]  [<ffffffff8120311d>] sync_inodes_one_sb+0x1d/0x30
[ 2281.239983]  [<ffffffff811d43a1>] iterate_supers+0xf1/0x100
[ 2281.245639]  [<ffffffff81203285>] sys_sync+0x35/0x90
[ 2281.250691]  [<ffffffff81732096>] system_call_fastpath+0x1a/0x1f
[ 2281.256778] 1 lock held by sync/3132:
[ 2281.260522]  #0:  (&type->s_umount_key#28){+++++.}, at: [<ffffffff811d4338>] iterate_supers+0x88/0x100
[ 2355.114904] libceph: mds0 10.214.132.128:6816 socket closed (con state CONNECTING)

Reproduce:

Mount a kernel cephfs client
Block all ceph traffic (I'm using an iptables rule)
shutdown -r now

Result:

Host becomes un-sshable (openssh server has shut down), but does not complete shutdown

Actions

Copy link

Updated by John Spray over 9 years ago

Subject changed from kclient prevents reboot when network is dead to kclient prevents shutdown/reboot when network is dead
Description updated (diff)

Actions

Copy link

Updated by John Spray over 9 years ago

Ah, this only happens if I have some dirty state from userspace at the time. In this instance it's my Mount.open_background routine from ceph-qa-suite/tasks/filesystem/mount.py:

            import time

            f = open("{path}", 'w')
            f.write('content')
            f.flush()
            f.write('content2')
            while True:
                time.sleep(1)

...aaand here's a slightly more complete console log. After this it repeats the "blocked for more than..." dump.

[ 1102.303850] libceph: client4134 fsid 86b8eeb8-25a3-46de-9aa3-a3119ca78128
[ 1102.313948] libceph: mon0 10.214.132.128:6789 session established
[ 1106.379213] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 1107.504720] init: idmapd main process (830) killed by TERM signal
[ 1107.513234] init: tty4 main process (1114) killed by TERM signal
[ 1107.519597] init: tty5 main process (1120) killed by TERM signal
[ 1107.526005] init: tty2 main process (1127) killed by TERM signal
[ 1107.532507] init: tty3 main process (1128) killed by TERM signal
[ 1107.538940] init: tty6 main process (1130) killed by TERM signal
[ 1107.545384] init: cron main process (1145) killed by TERM signal
[ 1107.552122] init: irqbalance main process (1162) killed by TERM signal
[ 1107.559293] init: ttyS2 main process (1518) killed by TERM signal
[ 1107.565893] init: tty1 main process (1519) killed by TERM signal
[ 1107.572364] init: ttyS1 main process (1520) killed by TERM signal
 * Stopping Dist[ 1107.579808] init: plymouth-upstart-bridge main process (3204) terminated with status 1
ributed Compiler Daemon: distccd                                         [ OK ]
                                                                         [ OK ] * Stopping nagios-nrpe nagios-nrpe
                                                                               Stopping PKCS#11 slot daemon: pkcsslotd.
                                                                                                                       No /usr/bin/radosgw found running; none killed.
                                        * Stopping System status server r[ OK ]
                                                                         [ OK ] * Stopping Trusted Computing daemon tcsd
                                                                                * Unmounted debugfs from /sys/kernel/debug
                                                                                                                           * Stopping quota service rpc.rquotad                                         [ OK ]
                                                                         [ OK ] * Stopping NFS kernel daemon
                                                                               [ 1107.833517] nfsd: last server has exited, flushing export cache
 * Unexporting directories for NFS kernel daemon...                      [ OK ]
                                                                         [ OK ] * Turning off quotas...
                                                                               [ 1109.031451] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1110.386623] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1112.385543] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1115.388049] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1120.389271] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1129.384467] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1146.399377] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1172.393523] ceph: mds0 caps stale
[ 1179.405782] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1244.395107] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1319.962782] INFO: task sync:3382 blocked for more than 120 seconds.
[ 1319.969152]       Tainted: G            E  3.17.0-rc5-ceph-00011-g177c131 #1
[ 1319.976283] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1319.984210] sync            D ffff880425bde3c0     0  3382   3375 0x00000000
[ 1319.991392]  ffff880427447c18 0000000000000046 ffff880427447bc8 ffff880426698000
[ 1319.999040]  0000000000013b80 ffff880427447fd8 ffff880427447bd8 0000000000013b80
[ 1320.006684]  ffff88042966c3a0 ffff880426698000 ffffffff81118503 ffff88043fd544f8
[ 1320.014269] Call Trace:
[ 1320.016807]  [<ffffffff81118503>] ? __delayacct_blkio_start+0x23/0x30
[ 1320.023339]  [<ffffffff8172c9c0>] ? bit_wait+0x50/0x50
[ 1320.028565]  [<ffffffff8172c149>] schedule+0x29/0x70
[ 1320.033612]  [<ffffffff8172c21f>] io_schedule+0x8f/0xd0
[ 1320.038923]  [<ffffffff8172c9eb>] bit_wait_io+0x2b/0x50
[ 1320.044233]  [<ffffffff8172c8b5>] __wait_on_bit+0x65/0x90
[ 1320.049720]  [<ffffffff8115e8fe>] ? find_get_pages_tag+0x2e/0x1e0
[ 1320.055898]  [<ffffffff8115dcc7>] wait_on_page_bit+0xc7/0xd0
[ 1320.061644]  [<ffffffff8109a260>] ? wake_atomic_t_function+0x40/0x40
[ 1320.068086]  [<ffffffff8115e234>] filemap_fdatawait_range+0xf4/0x190
[ 1320.074533]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 1320.080450]  [<ffffffff811fb905>] ? sync_inodes_sb+0x1a5/0x280
[ 1320.086366]  [<ffffffff8115e2fb>] filemap_fdatawait+0x2b/0x30
[ 1320.092199]  [<ffffffff811fb949>] sync_inodes_sb+0x1e9/0x280
[ 1320.097943]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 1320.104116]  [<ffffffff81203100>] ? fdatawrite_one_bdev+0x20/0x20
[ 1320.110297]  [<ffffffff8120311d>] sync_inodes_one_sb+0x1d/0x30
[ 1320.116216]  [<ffffffff811d43a1>] iterate_supers+0xf1/0x100
[ 1320.121871]  [<ffffffff81203285>] sys_sync+0x35/0x90
[ 1320.126926]  [<ffffffff81732096>] system_call_fastpath+0x1a/0x1f
[ 1320.133012] 1 lock held by sync/3382:
[ 1320.136753]  #0:  (&type->s_umount_key#28){+++++.}, at: [<ffffffff811d4338>] iterate_supers+0x88/0x100
[ 1373.606150] libceph: osd0 10.214.132.128:6800 socket closed (con state CONNECTING)
[ 1402.270808] ceph: mds0 hung

Actions

Copy link

Updated by John Spray over 9 years ago

Project changed from Ceph to CephFS
Category set to 53

Actions

Copy link

Updated by John Spray over 9 years ago

Tracker changed from Bug to Feature
Subject changed from kclient prevents shutdown/reboot when network is dead to Handle kclient shutdown with dead network more gracefully

In the general case (e.g. root filesystem is cephfs) there's nothing we can do: the system can't shut down until the filesystem data can be flushed. But for typical distributed filesystem use cases, there are things we can do to be more helpful:

Hook into userspace init system to try remounting cephfs as RO before shutting down the SSH daemon
Detect if the network has been shut down in userspace and assume it will be unavailable thereafter (so "crash" the filesystem in response)
Log to syslog before shutting down userspace services to indicate that we are going into shutdown with dirty data in the filesystem
Print a more descriptive message to the console while shutdown is stuck, like "Cannot complete shutdown because Ceph client caches are dirty, waiting..."

Actions

Copy link