Project

General

Profile

Actions

Bug #61843

open

kclient: was evicted without responding for 300 seconds

Added by Xiubo Li 11 months ago. Updated 11 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Venky reported it in another tracker days ago: https://tracker.ceph.com/issues/61394#note-11

The qa teuthology test:

https://pulpito.ceph.com/vshankar-2023-06-20_10:07:44-fs-wip-vshankar-testing-20230620.052303-testing-default-smithi/7308836/

It failed dues to:

failure_reason: '"2023-06-20T14:05:30.304943+0000 mds.c (mds.0) 1 : cluster [WRN]
  evicting unresponsive client smithi202 (6105), after 305.619 seconds" in cluster
  log'

The kclient client.6105 was blocklisted.

From remote/smithi202/syslog/kern.log.gz, we can see that the session of client.6105 was closed and then rejected:

Jun 20 14:00:24 smithi202 kernel: libceph: mon0 (1)172.21.15.33:6789 session established
Jun 20 14:00:24 smithi202 kernel: libceph: client6105 fsid 054ad731-bfaf-4587-9674-7b36e070bce0
Jun 20 14:00:24 smithi202 kernel: ceph: test_dummy_encryption mode enabled
Jun 20 14:00:25 smithi202 sudo[185381]:   ubuntu : PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/sbin/logrotate /etc/logrotate.d/ceph-test.conf
Jun 20 14:00:25 smithi202 sudo[185381]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Jun 20 14:00:25 smithi202 sudo[185381]: pam_unix(sudo:session): session closed for user root
Jun 20 14:00:25 smithi202 systemd[1]: fstests-generic-460.scope: Deactivated successfully.
Jun 20 14:00:25 smithi202 systemd[1]: fstests-generic-460.scope: Consumed 1.351s CPU time.
Jun 20 14:00:25 smithi202 systemd[1]: tmp-tmp.NXtTPrQZ4vtest.mount: Deactivated successfully.
Jun 20 14:00:25 smithi202 systemd[1]: tmp-tmp.UR7IGvggwFscratch.mount: Deactivated successfully.
Jun 20 14:00:55 smithi202 sudo[185389]:   ubuntu : PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/sbin/logrotate /etc/logrotate.d/ceph-test.conf
Jun 20 14:00:55 smithi202 sudo[185389]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Jun 20 14:00:55 smithi202 sudo[185389]: pam_unix(sudo:session): session closed for user root
Jun 20 14:04:52 smithi202 sudo[185395]:   ubuntu : PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/sbin/logrotate /etc/logrotate.d/ceph-test.conf
Jun 20 14:04:52 smithi202 sudo[185395]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Jun 20 14:04:52 smithi202 sudo[185395]: pam_unix(sudo:session): session closed for user root
Jun 20 14:05:01 smithi202 CRON[185398]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 20 14:05:01 smithi202 CRON[185399]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 20 14:05:01 smithi202 CRON[185398]: pam_unix(cron:session): session closed for user root
Jun 20 14:05:30 smithi202 kernel: libceph: mds0 (1)172.21.15.33:6837 socket closed (con state OPEN)
Jun 20 14:05:30 smithi202 kernel: ceph: [054ad731-bfaf-4587-9674-7b36e070bce0 6105] check_session_state: mds0 hung
Jun 20 14:05:30 smithi202 kernel: libceph: mds0 (1)172.21.15.33:6837 session reset
Jun 20 14:05:30 smithi202 kernel: ceph: [054ad731-bfaf-4587-9674-7b36e070bce0 6105] mds_peer_reset: mds0 closed our session
Jun 20 14:05:30 smithi202 kernel: ceph: [054ad731-bfaf-4587-9674-7b36e070bce0 6105] send_mds_reconnect: mds0 reconnect start
Jun 20 14:05:30 smithi202 kernel: ceph: [054ad731-bfaf-4587-9674-7b36e070bce0 6105] handle_session: mds0 reconnect denied
Jun 20 14:05:30 smithi202 root[185434]: run xfstest generic/462

And the xfstests-dev test case should be generic/460 and the xfstests-dev tool have successfully deactived the scratch mount but it seems the kclient didn't finish the unmouting in time.

From the teuthology.log we can see that the generic/460 succeeded, but it took 306 seconds:

2023-06-20T14:00:20.460 INFO:teuthology.orchestra.run.smithi202.stdout:generic/451        31s
2023-06-20T14:00:22.305 INFO:teuthology.orchestra.run.smithi202.stdout:generic/452        2s
2023-06-20T14:00:22.678 INFO:teuthology.orchestra.run.smithi202.stdout:generic/453       [not run] ceph does not allow unrestricted byte streams for names
2023-06-20T14:00:23.022 INFO:teuthology.orchestra.run.smithi202.stdout:generic/454       [not run] ceph does not allow unrestricted byte streams for names
2023-06-20T14:00:23.361 INFO:teuthology.orchestra.run.smithi202.stdout:generic/456       [not run] xfs_io falloc  failed (old kernel/wrong fs?)
2023-06-20T14:00:24.217 INFO:teuthology.orchestra.run.smithi202.stdout:generic/458       [not run] Reflink not supported by scratch filesystem type: ceph
2023-06-20T14:00:25.160 DEBUG:teuthology.orchestra.run.smithi033:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:00:25.163 DEBUG:teuthology.orchestra.run.smithi045:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:00:25.165 DEBUG:teuthology.orchestra.run.smithi202:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:00:55.249 DEBUG:teuthology.orchestra.run.smithi033:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:00:55.251 DEBUG:teuthology.orchestra.run.smithi045:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:00:55.254 DEBUG:teuthology.orchestra.run.smithi202:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:00:55.303 INFO:tasks.ceph.osd.2.smithi033.stderr:2023-06-20T14:00:55.302+0000 7fc573687640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 47901) UID: 0
2023-06-20T14:00:55.303 INFO:tasks.ceph.osd.0.smithi033.stderr:2023-06-20T14:00:55.302+0000 7ff9a4b52640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 47901) UID: 0
2023-06-20T14:00:55.304 INFO:tasks.ceph.osd.3.smithi033.stderr:2023-06-20T14:00:55.302+0000 7fcc53edf640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 47901) UID: 0
2023-06-20T14:00:55.305 INFO:tasks.ceph.osd.1.smithi033.stderr:2023-06-20T14:00:55.302+0000 7fc2e95e0640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 47901) UID: 0
2023-06-20T14:04:52.099 DEBUG:teuthology.orchestra.run.smithi033:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:04:52.102 DEBUG:teuthology.orchestra.run.smithi045:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:04:52.104 DEBUG:teuthology.orchestra.run.smithi202:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2023-06-20T14:04:52.116 INFO:tasks.ceph.osd.6.smithi045.stderr:2023-06-20T14:04:52.114+0000 7f9fa0df4640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 43452) UID: 0
2023-06-20T14:04:52.116 INFO:tasks.ceph.osd.7.smithi045.stderr:2023-06-20T14:04:52.114+0000 7f9307c81640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 43452) UID: 0
2023-06-20T14:04:52.117 INFO:tasks.ceph.osd.5.smithi045.stderr:2023-06-20T14:04:52.114+0000 7f5dc749b640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 43452) UID: 0
2023-06-20T14:04:52.117 INFO:tasks.ceph.osd.4.smithi045.stderr:2023-06-20T14:04:52.114+0000 7f7008e46640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 43452) UID: 0
2023-06-20T14:04:52.128 INFO:tasks.ceph.osd.1.smithi033.stderr:2023-06-20T14:04:52.127+0000 7fc2e95e0640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 47916) UID: 0
2023-06-20T14:04:52.129 INFO:tasks.ceph.osd.0.smithi033.stderr:2023-06-20T14:04:52.127+0000 7ff9a4b52640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 47916) UID: 0
2023-06-20T14:04:52.129 INFO:tasks.ceph.osd.3.smithi033.stderr:2023-06-20T14:04:52.127+0000 7fcc53edf640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 47916) UID: 0
2023-06-20T14:04:52.130 INFO:tasks.ceph.osd.2.smithi033.stderr:2023-06-20T14:04:52.127+0000 7fc573687640 -1 received  signal: Hangup from killall ceph-osd -1  (PID: 47916) UID: 0
2023-06-20T14:05:30.945 INFO:teuthology.orchestra.run.smithi202.stdout:generic/460        306s
2023-06-20T14:05:31.476 INFO:teuthology.orchestra.run.smithi202.stdout:generic/462       [not run] mount 172.21.15.33:6789:/scratch with dax failed
2023-06-20T14:05:31.908 INFO:teuthology.orchestra.run.smithi202.stdout:generic/463       [not run] Reflink not supported by test filesystem type: ceph

There is no kclient logs so we couldn't say what has happened directly from logs.

While from the remote/smithi033/log/ceph-mds.c.log.4.gz logs,we can see that just after the scratch mount was deactivated it didn't receive any ping and reqeust from client.6105 in around 300s:

2023-06-20T14:00:25.931+0000 7fa2d6058640  1 -- [v2:172.21.15.33:6834/1215596373,v1:172.21.15.33:6837/1215596373] <== client.6105 v1:172.21.15.202:0/2228992724 20 ==== client_caps(update ino 0x1000000b7f6 1 seq 15 tid 16 caps=pAsxXsxFsxcrwb dirty=Fw wanted=pAsxXsxFsxcrwb follows 1 size 1073741824/0 mtime 2023-06-20T14:00:25.703292+0000 ctime 2023-06-20T14:00:25.703292+0000 change_attr 1024 xattrs(v=4171204432 l=0)) v12 ==== 316+0+0 (unknown 3999766560 0 0) 0x5606c4a66380 con 0x5606bd6fd800
...
2023-06-20T14:05:30.842+0000 7fa2d6058640 10 mds.c  new session 0x5606c10a6500 for client.6105 v1:172.21.15.202:0/2228992724 con 0x5606bd6fd800
2023-06-20T14:05:30.842+0000 7fa2d6058640 10 mds.c parse_caps: parsing auth_cap_str='allow *'
2023-06-20T14:05:30.842+0000 7fa2d6058640 10 mds.c ms_handle_accept v1:172.21.15.202:0/2228992724 con 0x5606bd6fd800 session 0x5606c10a6500
2023-06-20T14:05:30.842+0000 7fa2d6058640  1 -- [v2:172.21.15.33:6834/1215596373,v1:172.21.15.33:6837/1215596373] <== client.6105 v1:172.21.15.202:0/2228992724 1 ==== client_reconnect(1 caps 1 realms ) v3 ==== 0+0+98 (unknown 0 0 3797099533) 0x5606bd4aaea0 con 0x5606bd6fd800

While reading the kclient code I found that when umounting it would set mdsc->stopping = CEPH_MDSC_STOPPING_BEGIN or CEPH_MDSC_STOPPING_FLUSHED, then the tick thread would be stopped, this should be why mds.0 couldn't receive the renew caps request in 300 seconds.

Just after setting the mdsc->stopping it will try to flush the dirty buffer to Rados just before closing all the mds sessions, that means if the Rados couldn't response in time the kclient would stop pinging to MDS while keeping the sessions opened.

It's a bug in kclient and should be fixed.

Actions #1

Updated by Xiubo Li 11 months ago

Flushing the dirty buffer may take a long time if the Rados is overloaded or there is network issue. So we should ping the MDSs to keep alive, else the MDS will blocklist us.

The following is one case when running the ffstress test and the flush dirty buffer blocked more than 2 seconds on my VM, on which the disk space is bit low:

[xiubli@ceph modules]$ sudo cat /proc/kmsg 
<4>[54094.501840] hrtimer: interrupt took 2449200 ns
<3>[55176.534510] INFO: task ffsb:13911 blocked for more than 122 seconds.
<3>[55176.534521]       Tainted: G        W          6.4.0+ #8
<3>[55176.534526] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[55176.534529] task:ffsb            state:D stack:0     pid:13911 ppid:10993  flags:0x00004002
<6>[55176.534537] Call Trace:
<6>[55176.534541]  <TASK>
<6>[55176.534547]  __schedule+0x50e/0xde0
<6>[55176.534558]  ? __pfx___schedule+0x10/0x10
<6>[55176.534565]  ? rcu_is_watching+0x1f/0x50
<6>[55176.534572]  ? lock_acquire+0xc7/0x3e0
<6>[55176.534579]  ? __pfx_lock_acquire+0x10/0x10
<6>[55176.534585]  ? do_raw_spin_trylock+0xd0/0x120
<6>[55176.534592]  ? __pfx_do_raw_spin_trylock+0x10/0x10
<6>[55176.534598]  ? rcu_is_watching+0x1f/0x50
<6>[55176.534605]  schedule+0x8e/0x120
<6>[55176.534612]  io_schedule+0x6d/0xa0
<6>[55176.534617]  folio_wait_bit_common+0x243/0x590
<6>[55176.534635]  ? __pfx_folio_wait_bit_common+0x10/0x10
<6>[55176.534641]  ? filemap_get_folios_tag+0x4a1/0x5b0
<6>[55176.534649]  ? __pfx_wake_page_function+0x10/0x10
<6>[55176.534656]  ? rcu_is_watching+0x1f/0x50
<6>[55176.534662]  ? lockdep_hardirqs_on_prepare+0xc/0x60
<6>[55176.534669]  folio_wait_writeback+0x1b/0x110
<6>[55176.534677]  __filemap_fdatawait_range+0xcc/0x170
<6>[55176.534683]  ? __pfx___filemap_fdatawait_range+0x10/0x10
<6>[55176.534689]  ? wait_sb_inodes+0xb9/0x520
<6>[55176.534696]  ? rcu_is_watching+0x1f/0x50
<6>[55176.534701]  ? lock_acquire+0xc7/0x3e0
<6>[55176.534710]  ? __list_del_entry_valid+0x66/0x100
<6>[55176.534722]  filemap_fdatawait_keep_errors+0x1a/0x30
<6>[55176.534729]  wait_sb_inodes+0x2de/0x520
<6>[55176.534736]  ? __pfx_wait_sb_inodes+0x10/0x10
<6>[55176.534741]  ? __pfx_autoremove_wake_function+0x10/0x10
<6>[55176.534749]  ? __up_write+0x184/0x2d0
<6>[55176.534757]  sync_inodes_sb+0x14d/0x1a0
<6>[55176.534764]  ? __pfx_sync_inodes_sb+0x10/0x10
<6>[55176.534770]  ? rcu_is_watching+0x1f/0x50
<6>[55176.534780]  ? __pfx_sync_inodes_one_sb+0x10/0x10
<6>[55176.534785]  iterate_supers+0xbe/0x140
<6>[55176.534793]  ksys_sync+0x84/0x110
<6>[55176.534799]  ? __pfx_ksys_sync+0x10/0x10
<6>[55176.534804]  ? ktime_get_coarse_real_ts64+0x100/0x110
<6>[55176.534812]  ? syscall_trace_enter.constprop.0+0x153/0x1d0
<6>[55176.534819]  __do_sys_sync+0xa/0x20
<6>[55176.534824]  do_syscall_64+0x5c/0x90
<6>[55176.534830]  ? rcu_is_watching+0x1f/0x50
<6>[55176.534836]  ? trace_hardirqs_on_prepare+0xc9/0xe0
<6>[55176.534841]  ? lockdep_hardirqs_on_prepare+0xc/0x60
<6>[55176.534847]  ? do_syscall_64+0x69/0x90
<6>[55176.534852]  ? rcu_is_watching+0x1f/0x50
<6>[55176.534858]  ? trace_hardirqs_off+0x61/0xe0
<6>[55176.534862]  ? rcu_is_watching+0x1f/0x50
<6>[55176.534868]  ? trace_hardirqs_on_prepare+0xc9/0xe0
<6>[55176.534873]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
<6>[55176.534880] RIP: 0033:0x7fd0b5c3ed5b
<6>[55176.534885] RSP: 002b:00007ffc05a3fe38 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
<6>[55176.534892] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd0b5c3ed5b
<6>[55176.534896] RDX: 000000000051b722 RSI: 00000000649cf749 RDI: 0000000000000001
<6>[55176.534900] RBP: 00007ffc05a40460 R08: 0000000000000000 R09: 00000000ffffffff
<6>[55176.534904] R10: 00007ffc05a3fe40 R11: 0000000000000246 R12: 00000000012599a8
<6>[55176.534907] R13: 00007ffc05a3ffc0 R14: 0000000001259970 R15: 0000000000000001
<6>[55176.534917]  </TASK>
<4>[55176.534934] INFO: lockdep is turned off.
<3>[55299.414106] INFO: task ffsb:13911 blocked for more than 245 seconds.
<3>[55299.414117]       Tainted: G        W          6.4.0+ #8
<3>[55299.414122] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[55299.414125] task:ffsb            state:D stack:0     pid:13911 ppid:10993  flags:0x00004002
<6>[55299.414132] Call Trace:
<6>[55299.414136]  <TASK>
<6>[55299.414141]  __schedule+0x50e/0xde0
<6>[55299.414153]  ? __pfx___schedule+0x10/0x10
<6>[55299.414158]  ? rcu_is_watching+0x1f/0x50
<6>[55299.414165]  ? lock_acquire+0xc7/0x3e0
<6>[55299.414172]  ? __pfx_lock_acquire+0x10/0x10
<6>[55299.414179]  ? do_raw_spin_trylock+0xd0/0x120
<6>[55299.414185]  ? __pfx_do_raw_spin_trylock+0x10/0x10
<6>[55299.414191]  ? rcu_is_watching+0x1f/0x50
<6>[55299.414198]  schedule+0x8e/0x120
<6>[55299.414206]  io_schedule+0x6d/0xa0
<6>[55299.414215]  folio_wait_bit_common+0x243/0x590
<6>[55299.414233]  ? __pfx_folio_wait_bit_common+0x10/0x10
<6>[55299.414242]  ? filemap_get_folios_tag+0x4a1/0x5b0
<6>[55299.414257]  ? __pfx_wake_page_function+0x10/0x10
<6>[55299.414269]  ? rcu_is_watching+0x1f/0x50
<6>[55299.414278]  ? lockdep_hardirqs_on_prepare+0xc/0x60
<6>[55299.414290]  folio_wait_writeback+0x1b/0x110
<6>[55299.414300]  __filemap_fdatawait_range+0xcc/0x170
<6>[55299.414310]  ? __pfx___filemap_fdatawait_range+0x10/0x10
<6>[55299.414316]  ? wait_sb_inodes+0xb9/0x520
<6>[55299.414322]  ? rcu_is_watching+0x1f/0x50
<6>[55299.414328]  ? lock_acquire+0xc7/0x3e0
<6>[55299.414337]  ? __list_del_entry_valid+0x66/0x100
<6>[55299.414345]  filemap_fdatawait_keep_errors+0x1a/0x30
<6>[55299.414352]  wait_sb_inodes+0x2de/0x520
<6>[55299.414359]  ? __pfx_wait_sb_inodes+0x10/0x10
<6>[55299.414368]  ? __pfx_autoremove_wake_function+0x10/0x10
<6>[55299.414381]  ? __up_write+0x184/0x2d0
<6>[55299.414391]  sync_inodes_sb+0x14d/0x1a0
<6>[55299.414400]  ? __pfx_sync_inodes_sb+0x10/0x10
<6>[55299.414408]  ? rcu_is_watching+0x1f/0x50
<6>[55299.414418]  ? __pfx_sync_inodes_one_sb+0x10/0x10
<6>[55299.414423]  iterate_supers+0xbe/0x140
<6>[55299.414433]  ksys_sync+0x84/0x110
<6>[55299.414442]  ? __pfx_ksys_sync+0x10/0x10
<6>[55299.414450]  ? ktime_get_coarse_real_ts64+0x100/0x110
<6>[55299.414462]  ? syscall_trace_enter.constprop.0+0x153/0x1d0
<6>[55299.414472]  __do_sys_sync+0xa/0x20
<6>[55299.414479]  do_syscall_64+0x5c/0x90
<6>[55299.414484]  ? rcu_is_watching+0x1f/0x50
<6>[55299.414490]  ? trace_hardirqs_on_prepare+0xc9/0xe0
<6>[55299.414495]  ? lockdep_hardirqs_on_prepare+0xc/0x60
<6>[55299.414501]  ? do_syscall_64+0x69/0x90
<6>[55299.414506]  ? rcu_is_watching+0x1f/0x50
<6>[55299.414511]  ? trace_hardirqs_off+0x61/0xe0
<6>[55299.414516]  ? rcu_is_watching+0x1f/0x50
<6>[55299.414521]  ? trace_hardirqs_on_prepare+0xc9/0xe0
<6>[55299.414526]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
<6>[55299.414533] RIP: 0033:0x7fd0b5c3ed5b
<6>[55299.414538] RSP: 002b:00007ffc05a3fe38 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
<6>[55299.414546] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd0b5c3ed5b
<6>[55299.414551] RDX: 000000000051b722 RSI: 00000000649cf749 RDI: 0000000000000001
<6>[55299.414554] RBP: 00007ffc05a40460 R08: 0000000000000000 R09: 00000000ffffffff
<6>[55299.414558] R10: 00007ffc05a3fe40 R11: 0000000000000246 R12: 00000000012599a8
<6>[55299.414597] R13: 00007ffc05a3ffc0 R14: 0000000001259970 R15: 0000000000000001
<6>[55299.414612]  </TASK>
<4>[55299.414623] INFO: lockdep is turned off.

Actions #2

Updated by Xiubo Li 11 months ago

  • Status changed from In Progress to Fix Under Review
Actions

Also available in: Atom PDF