Project

General

Profile

Actions

Bug #61591

closed

[RHEL stock] kclient appears to wrongly believe it has been blocklisted

Added by Patrick Donnelly 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2023-05-28T19:57:05.892 DEBUG:teuthology.orchestra.run.smithi062:> (cd /home/ubuntu/cephtest/mnt.0 && exec bash -c 'sudo dd if=/sys/kernel/debug/ceph/7e93fa48-fd90-11ed-9b1d-001a4aab830c.client15495/status')
2023-05-28T19:57:05.973 INFO:teuthology.orchestra.run.smithi062.stdout:instance: client.15495 (3)192.168.0.1:0/3891638898
2023-05-28T19:57:05.973 INFO:teuthology.orchestra.run.smithi062.stdout:blocklisted: false
...
2023-05-28T20:16:04.912 DEBUG:teuthology.orchestra.run.smithi062:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd blocklist ls
...
2023-05-28T20:16:05.653 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/3903573374 2023-05-29T19:57:15.421701+0000
2023-05-28T20:16:05.653 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/3810945376 2023-05-29T19:57:15.421701+0000
2023-05-28T20:16:05.653 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/532471957 2023-05-29T19:57:15.421701+0000
2023-05-28T20:16:05.654 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/3996750006 2023-05-29T19:49:17.588828+0000
2023-05-28T20:16:05.654 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:6800/1518238550 2023-05-29T19:49:17.588828+0000
2023-05-28T20:16:05.654 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/325104692 2023-05-29T19:49:17.588828+0000
2023-05-28T20:16:05.655 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:6800/43495430 2023-05-29T19:48:51.176696+0000
2023-05-28T20:16:05.655 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:6800/2478711060 2023-05-29T19:57:15.421701+0000
2023-05-28T20:16:05.655 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/4180666335 2023-05-29T19:48:51.176696+0000
2023-05-28T20:16:05.656 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:6801/2478711060 2023-05-29T19:57:15.421701+0000
2023-05-28T20:16:05.656 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/3239137281 2023-05-29T19:48:51.176696+0000
2023-05-28T20:16:05.656 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:6801/43495430 2023-05-29T19:48:51.176696+0000
2023-05-28T20:16:05.656 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/3171002779 2023-05-29T19:57:15.421701+0000
2023-05-28T20:16:05.657 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:6801/1518238550 2023-05-29T19:49:17.588828+0000
2023-05-28T20:16:05.657 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/781602504 2023-05-29T19:48:51.176696+0000
2023-05-28T20:16:05.657 INFO:teuthology.orchestra.run.smithi062.stdout:172.21.15.62:0/4265251340 2023-05-29T19:49:17.588828+0000
...
2023-05-28T20:16:05.676 INFO:teuthology.orchestra.run.smithi062.stderr:listed 16 entries
2023-05-28T20:16:05.700 DEBUG:tasks.cephfs.kernel_mount:Unmounting client client.0...
2023-05-28T20:16:05.701 INFO:teuthology.orchestra.run:Running command with timeout 300
2023-05-28T20:16:05.701 DEBUG:teuthology.orchestra.run.smithi062:> sudo umount /home/ubuntu/cephtest/mnt.0
...
2023-05-28T20:21:05.754 ERROR:teuthology:Uncaught exception (Hub)
...
2023-05-28T20:21:09.947 INFO:journalctl@ceph.mon.a.smithi062.stdout:May 28 20:21:09 smithi062 ceph-mon[94025]: evicting unresponsive client smithi062:0 (15495), after 361.407 seconds
2023-05-28T20:21:09.947 INFO:journalctl@ceph.mon.a.smithi062.stdout:May 28 20:21:09 smithi062 ceph-mon[94025]: Evicting (and blocklisting) client session 15495 (v1:192.168.0.1:0/3891638898)
2023-05-28T20:21:09.947 INFO:journalctl@ceph.mon.a.smithi062.stdout:May 28 20:21:09 smithi062 ceph-mon[94025]: from='mds.? [v2:172.21.15.176:6838/3335453750,v1:172.21.15.176:6839/3335453750]' entity='mds.i' cmd=[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]: dispatch
2023-05-28T20:21:09.947 INFO:journalctl@ceph.mon.a.smithi062.stdout:May 28 20:21:09 smithi062 ceph-mon[94025]: from='mds.? ' entity='mds.i' cmd=[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]: dispatch
2023-05-28T20:21:10.036 INFO:journalctl@ceph.mon.b.smithi157.stdout:May 28 20:21:09 smithi157 ceph-mon[95431]: pgmap v764: 129 pgs: 129 active+clean; 1.6 GiB data, 16 GiB used, 1.0 TiB / 1.0 TiB avail; 16 KiB/s rd, 53 KiB/s wr, 27 op/s
2023-05-28T20:21:10.037 INFO:journalctl@ceph.mon.b.smithi157.stdout:May 28 20:21:09 smithi157 ceph-mon[95431]: evicting unresponsive client smithi062:0 (15495), after 361.407 seconds
2023-05-28T20:21:10.037 INFO:journalctl@ceph.mon.b.smithi157.stdout:May 28 20:21:09 smithi157 ceph-mon[95431]: Evicting (and blocklisting) client session 15495 (v1:192.168.0.1:0/3891638898)
2023-05-28T20:21:10.037 INFO:journalctl@ceph.mon.b.smithi157.stdout:May 28 20:21:09 smithi157 ceph-mon[95431]: from='mds.? [v2:172.21.15.176:6838/3335453750,v1:172.21.15.176:6839/3335453750]' entity='mds.i' cmd=[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]: dispatch
2023-05-28T20:21:10.038 INFO:journalctl@ceph.mon.b.smithi157.stdout:May 28 20:21:09 smithi157 ceph-mon[95431]: from='mds.? ' entity='mds.i' cmd=[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]: dispatch
2023-05-28T20:21:10.117 INFO:journalctl@ceph.mon.c.smithi176.stdout:May 28 20:21:09 smithi176 ceph-mon[95827]: pgmap v764: 129 pgs: 129 active+clean; 1.6 GiB data, 16 GiB used, 1.0 TiB / 1.0 TiB avail; 16 KiB/s rd, 53 KiB/s wr, 27 op/s
2023-05-28T20:21:10.117 INFO:journalctl@ceph.mon.c.smithi176.stdout:May 28 20:21:09 smithi176 ceph-mon[95827]: evicting unresponsive client smithi062:0 (15495), after 361.407 seconds
2023-05-28T20:21:10.117 INFO:journalctl@ceph.mon.c.smithi176.stdout:May 28 20:21:09 smithi176 ceph-mon[95827]: Evicting (and blocklisting) client session 15495 (v1:192.168.0.1:0/3891638898)
2023-05-28T20:21:10.118 INFO:journalctl@ceph.mon.c.smithi176.stdout:May 28 20:21:09 smithi176 ceph-mon[95827]: from='mds.? [v2:172.21.15.176:6838/3335453750,v1:172.21.15.176:6839/3335453750]' entity='mds.i' cmd=[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]: dispatch
2023-05-28T20:21:10.118 INFO:journalctl@ceph.mon.c.smithi176.stdout:May 28 20:21:09 smithi176 ceph-mon[95827]: from='mds.? ' entity='mds.i' cmd=[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]: dispatch
2023-05-28T20:21:10.946 INFO:journalctl@ceph.mon.a.smithi062.stdout:May 28 20:21:10 smithi062 ceph-mon[94025]: from='mds.? ' entity='mds.i' cmd='[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]': finished
2023-05-28T20:21:10.947 INFO:journalctl@ceph.mon.a.smithi062.stdout:May 28 20:21:10 smithi062 ceph-mon[94025]: osdmap e135: 12 total, 12 up, 12 in
2023-05-28T20:21:11.036 INFO:journalctl@ceph.mon.b.smithi157.stdout:May 28 20:21:10 smithi157 ceph-mon[95431]: from='mds.? ' entity='mds.i' cmd='[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]': finished
2023-05-28T20:21:11.036 INFO:journalctl@ceph.mon.b.smithi157.stdout:May 28 20:21:10 smithi157 ceph-mon[95431]: osdmap e135: 12 total, 12 up, 12 in
2023-05-28T20:21:11.117 INFO:journalctl@ceph.mon.c.smithi176.stdout:May 28 20:21:10 smithi176 ceph-mon[95827]: from='mds.? ' entity='mds.i' cmd='[{"prefix":"osd blocklist", "blocklistop":"add","addr":"v1:192.168.0.1:0/3891638898"}]': finished

From: /teuthology/yuriw-2023-05-28_14:46:14-fs-reef-release-distro-default-smithi/7288964/teuthology.log

2023-05-28T19:57:05.499690+00:00 smithi062 kernel: ceph: loaded (mds proto 32)
2023-05-28T19:57:05.516681+00:00 smithi062 kernel: ceph: device name is missing path (no : separator in 0@7e93fa48-fd90-11ed-9b1d-001a4aab830c.cephfs=/volumes/_nogroup/sv_1/73b32d9a-d1cb-4d38-ac72-222cea3557ab)
2023-05-28T19:57:05.527684+00:00 smithi062 kernel: libceph: mon0 (1)172.21.15.62:6789 session established
2023-05-28T19:57:05.527774+00:00 smithi062 kernel: libceph: client15495 fsid 7e93fa48-fd90-11ed-9b1d-001a4aab830c
2023-05-28T19:57:05.543697+00:00 smithi062 kernel: ceph: mds3 session blocklisted
2023-05-28T19:57:05.561687+00:00 smithi062 kernel: ceph: mds0 session blocklisted
2023-05-28T19:57:17.951686+00:00 smithi062 kernel: ceph: mds1 session blocklisted
2023-05-28T19:57:17.975685+00:00 smithi062 kernel: ceph: mds4 session blocklisted
2023-05-28T19:58:01.123723+00:00 smithi062 kernel: ceph: mds2 session blocklisted
2023-05-28T20:19:09.319869+00:00 smithi062 kernel: INFO: task umount:154689 blocked for more than 120 seconds.
2023-05-28T20:19:09.320011+00:00 smithi062 kernel:      Not tainted 4.18.0-372.9.1.el8.x86_64 #1
2023-05-28T20:19:09.320054+00:00 smithi062 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2023-05-28T20:19:09.336919+00:00 smithi062 kernel: task:umount          state:D stack:    0 pid:154689 ppid:154669 flags:0x00004080
2023-05-28T20:19:09.336980+00:00 smithi062 kernel: Call Trace:
2023-05-28T20:19:09.343691+00:00 smithi062 kernel: __schedule+0x2d1/0x830
2023-05-28T20:19:09.343742+00:00 smithi062 kernel: schedule+0x35/0xa0
2023-05-28T20:19:09.352093+00:00 smithi062 kernel: ceph_mdsc_sync+0x2f1/0x350 [ceph]
2023-05-28T20:19:09.352144+00:00 smithi062 kernel: ? finish_wait+0x80/0x80
2023-05-28T20:19:09.356071+00:00 smithi062 kernel: ceph_sync_fs+0x2f/0xb0 [ceph]
2023-05-28T20:19:09.364710+00:00 smithi062 kernel: sync_filesystem+0x71/0x90
2023-05-28T20:19:09.364761+00:00 smithi062 kernel: generic_shutdown_super+0x22/0x100
2023-05-28T20:19:09.369549+00:00 smithi062 kernel: kill_anon_super+0x14/0x30
2023-05-28T20:19:09.373670+00:00 smithi062 kernel: ceph_kill_sb+0x39/0x70 [ceph]
2023-05-28T20:19:09.382918+00:00 smithi062 kernel: deactivate_locked_super+0x34/0x70
2023-05-28T20:19:09.382971+00:00 smithi062 kernel: cleanup_mnt+0x3b/0x70
2023-05-28T20:19:09.390667+00:00 smithi062 kernel: task_work_run+0x8a/0xb0
2023-05-28T20:19:09.390714+00:00 smithi062 kernel: exit_to_usermode_loop+0xeb/0xf0
2023-05-28T20:19:09.395294+00:00 smithi062 kernel: do_syscall_64+0x198/0x1a0
2023-05-28T20:19:09.399409+00:00 smithi062 kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
2023-05-28T20:19:09.408761+00:00 smithi062 kernel: RIP: 0033:0x7f82e4991dab
2023-05-28T20:19:09.408807+00:00 smithi062 kernel: Code: Unable to access opcode bytes at RIP 0x7f82e4991d81.
2023-05-28T20:19:09.415675+00:00 smithi062 kernel: RSP: 002b:00007ffe0ada7a78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
2023-05-28T20:19:09.423626+00:00 smithi062 kernel: RAX: 0000000000000000 RBX: 00005603ed4ce5d0 RCX: 00007f82e4991dab
2023-05-28T20:19:09.431142+00:00 smithi062 kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00005603ed4ce7b0
2023-05-28T20:19:09.438654+00:00 smithi062 kernel: RBP: 0000000000000000 R08: 00005603ed4ce7e0 R09: 00007f82e4ad8820
2023-05-28T20:19:09.446163+00:00 smithi062 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00005603ed4ce7b0
2023-05-28T20:19:09.453681+00:00 smithi062 kernel: R13: 00007f82e5802184 R14: 0000000000000000 R15: 00000000ffffffff
2023-05-28T20:21:09.680697+00:00 smithi062 kernel: libceph: mds1 (1)172.21.15.176:6839 socket closed (con state OPEN)
2023-05-28T20:21:09.680835+00:00 smithi062 kernel: libceph: mds0 (1)172.21.15.157:6839 socket closed (con state OPEN)
2023-05-28T20:21:09.680871+00:00 smithi062 kernel: libceph: mds3 (1)172.21.15.176:6837 socket closed (con state OPEN)
2023-05-28T20:21:09.802654+00:00 smithi062 kernel: libceph: mds2 (1)172.21.15.157:6837 socket closed (con state OPEN)
2023-05-28T20:21:09.871354+00:00 smithi062 kernel: libceph: mds4 (1)172.21.15.176:6835 socket closed (con state OPEN)

From: /teuthology/yuriw-2023-05-28_14:46:14-fs-reef-release-distro-default-smithi/7288964/remote/smithi062/syslog/kern.log.gz

It was not actually blocklisted at the time ~19:57. This happened much later after umount timed out (~5 seconds) and the MDS noted the mount was unresponsive for 5 minutes.

First signs of trouble are:

./remote/smithi157/log/7e93fa48-fd90-11ed-9b1d-001a4aab830c/ceph-mds.b.log-20230529.gz:2023-05-28T19:57:05.826+0000 7faf32c58700 20 mds.0.server laggiest active session is client.15495 v1:192.168.0.1:0/3891638898 and renewed caps recently (0.278995s ago)
Actions #1

Updated by Xiubo Li 11 months ago

  • Assignee set to Xiubo Li
Actions #2

Updated by Xiubo Li 11 months ago

  • Status changed from New to Resolved

From the mds.b.log, I can see the session mds0 was successfully opened, if it really blocklisted then in both mds and kclient side we should see the reject ... logs, but there is no any:

2023-05-28T19:57:05.545+0000 7faf3445b700  1 -- [v2:172.21.15.157:6838/3324018403,v1:172.21.15.157:6839/3324018403] <== client.15495 v1:192.168.0.1:0/3891638898 1 ==== client_session(request_open) v4 ==== 223+0+0 (unknown 1689163983 0 0) 0x560af4eb0380 con 0x560af4e8e000
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.14 get_session have 0x560af4d8b400 client.15495 v1:192.168.0.1:0/3891638898 state closed
2023-05-28T19:57:05.545+0000 7faf3445b700  3 mds.0.server handle_client_session client_session(request_open) v4 from client.15495
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.server handle_client_session CEPH_SESSION_REQUEST_OPEN metadata entries:
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.server  features: '0x0000000000007bff'
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.server  metric specification: [{metric_flags: '0x00000000000003ff'}]
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.server   entity_id: 0
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.server   hostname: smithi062
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.server   kernel_version: 4.18.0-372.9.1.el8.x86_64
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.server   root: /volumes/_nogroup/sv_1/73b32d9a-d1cb-4d38-ac72-222cea3557ab
2023-05-28T19:57:05.545+0000 7faf3445b700 10 mds.0.sessionmap add_session s=0x560af4d8b400 name=client.15495
2023-05-28T19:57:05.545+0000 7faf3445b700 20 mds.0.sessionmap mark_projected s=0x560af4d8b400 name=client.15495 pv=15 -> 16
2023-05-28T19:57:05.545+0000 7faf3445b700 10 mds.0.sessionmap touch_session s=0x560af4d8b400 name=client.15495
2023-05-28T19:57:05.545+0000 7faf2dc4e700  5 mds.0.log _submit_thread 4252432~298 : ESession client.15495 v1:192.168.0.1:0/3891638898 open cmapv 16
2023-05-28T19:57:05.545+0000 7faf2dc4e700  1 -- [v2:172.21.15.157:6838/3324018403,v1:172.21.15.157:6839/3324018403] --> [v2:172.21.15.157:6824/1792092911,v1:172.21.15.157:6825/1792092911] -- osd_op(unknown.0.14:569 2.14 2:2e2fa760:::200.00000001:head [write 58128~318 in=318b] snapc 0=[] ondisk+write+known_if_redirected+full_force+supports_pool_eio e88) v8 -- 0x560af5045c00 con 0x560af4db0000
2023-05-28T19:57:05.545+0000 7faf2dc4e700  1 -- [v2:172.21.15.157:6838/3324018403,v1:172.21.15.157:6839/3324018403] --> [v2:172.21.15.157:6824/1792092911,v1:172.21.15.157:6825/1792092911] -- osd_op(unknown.0.14:570 2.14 2:292cf221:::200.00000000:head [writefull 0~90 in=90b] snapc 0=[] ondisk+write+known_if_redirected+full_force+supports_pool_eio e88) v8 -- 0x560af4ede700 con 0x560af4db0000
2023-05-28T19:57:05.547+0000 7faf37c62700  1 -- [v2:172.21.15.157:6838/3324018403,v1:172.21.15.157:6839/3324018403] <== osd.10 v2:172.21.15.157:6824/1792092911 67 ==== osd_op_reply(569 200.00000001 [write 58128~318] v88'37 uv37 ondisk = 0) v8 ==== 156+0+0 (crc 0 0 0) 0x560af4da1440 con 0x560af4db0000
2023-05-28T19:57:05.547+0000 7faf2e44f700 10 MDSIOContextBase::complete: 20C_MDS_session_finish
2023-05-28T19:57:05.547+0000 7faf2e44f700 10 MDSContext::complete: 20C_MDS_session_finish
2023-05-28T19:57:05.547+0000 7faf2e44f700 10 mds.0.server _session_logged client.15495 v1:192.168.0.1:0/3891638898 state_seq 1 open 16 inos_to_free [] inotablev 0 inos_to_purge []
2023-05-28T19:57:05.547+0000 7faf2e44f700 20 mds.0.sessionmap mark_dirty s=0x560af4d8b400 name=client.15495 v=15
2023-05-28T19:57:05.547+0000 7faf2e44f700 10 mds.0.sessionmap touch_session s=0x560af4d8b400 name=client.15495
2023-05-28T19:57:05.547+0000 7faf2e44f700 10 add_session: mds.metrics: session=0x560af4d8b400, client=client.15495 v1:192.168.0.1:0/3891638898
2023-05-28T19:57:05.547+0000 7faf2e44f700 20 add_session: mds.metrics: metrics=[update_type=0, metrics={cap_hit_metric={hits=0, misses=0}, read_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, write_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, metadata_latency={latency=0.000000, avg_latency=0.000000, sq_sum=0, count=0}, dentry_lease={hits=0, misses=0}, opened_files_metric={opened_files=0, total_inodes=0}, pinned_icaps_metric={pinned_icaps=0, total_inodes=0}, opened_inodes_metric={opened_inodes=0, total_inodes=0}, read_io_sizes_metric={total_ops=0, total_size=0}, write_io_sizes_metric={total_ops=0, total_size=0}}]
2023-05-28T19:57:05.547+0000 7faf2e44f700 10 mds.0.14 send_message_client client.15495 v1:192.168.0.1:0/3891638898 client_session(open) v5
2023-05-28T19:57:05.547+0000 7faf2e44f700  1 -- [v2:172.21.15.157:6838/3324018403,v1:172.21.15.157:6839/3324018403] --> v1:192.168.0.1:0/3891638898 -- client_session(open) v5 -- 0x560af4e0d880 con 0x560af4e8e000
2023-05-28T19:57:05.547+0000 7faf2e44f700  2 mds.0.server New client session: addr="v1:192.168.0.1:0/3891638898",elapsed=0.002000,throttled=0.000014,status="ACCEPTED",root="/volumes/_nogroup/sv_1/73b32d9a-d1cb-4d38-ac72-222cea3557ab" 

So it's most probably the msg was corrupted due to some reasons. I didn't see any issue from the mds logs for the session msg. And checked the kclient decoding code, it's buggy:

3379         if (msg_version >= 5) {
3380                 u32 flags;
3381                 /* version >= 4, struct_v, struct_cv, len, metric_spec */
3382                 ceph_decode_skip_n(&p, end, 2 + sizeof(u32) * 2, bad);
3383                 /* version >= 5, flags   */
3384                 ceph_decode_32_safe(&p, end, flags, bad);
3385                 if (flags & CEPH_SESSION_BLOCKLISTED) {
3386                         pr_warn("mds%d session blocklisted\n", session->s_mds);                                                                                                                                                   
3387                         blocklisted = true;
3388                 }
3389         }

Line 3382 will always assume the the size of _vec[] in metric_spec is 1, which is incorrect.

This is a bug in kclient and Luis has fixed it before but not downstream yet:

commit ea16567f11018e2f58e72b667b0c803ff92b8153
Author: Luís Henriques <lhenriques@suse.de>
Date:   Mon May 23 17:09:51 2022 +0100

    ceph: fix decoding of client session messages flags

    The cephfs kernel client started to show  the message:

     ceph: mds0 session blocklisted

    when mounting a filesystem.  This is due to the fact that the session
    messages are being incorrectly decoded: the skip needs to take into
    account the 'len'.

    While there, fixed some whitespaces too.

    Cc: stable@vger.kernel.org
    Fixes: e1c9788cb397 ("ceph: don't rely on error_string to validate blocklisted session.")
    Signed-off-by: Luís Henriques <lhenriques@suse.de>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

I will downstream it soon.

Actions

Also available in: Atom PDF