Bug #43599
closedkclient: corrupt message failure on RHEL8 distribution kernel
0%
Description
2020-01-13T09:20:06.126283+00:00 smithi142 kernel: libceph: mon1 172.21.15.120:6789 session established 2020-01-13T09:20:06.134352+00:00 smithi142 kernel: libceph: client4621 fsid 48ce1088-e239-461d-897a-70a91ffd5fc6 2020-01-13T09:20:06.142310+00:00 smithi142 kernel: ceph: mdsc_handle_session corrupt message mds0 len 38 2020-01-13T09:20:06.142350+00:00 smithi142 kernel: header: 00000000: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 2020-01-13T09:20:06.142369+00:00 smithi142 kernel: header: 00000010: 16 00 7f 00 01 00 26 00 00 00 00 00 00 00 00 00 ......&......... 2020-01-13T09:20:06.142386+00:00 smithi142 kernel: header: 00000020: 00 00 00 00 02 00 00 00 00 00 00 00 00 01 00 00 ................ 2020-01-13T09:20:06.142402+00:00 smithi142 kernel: header: 00000030: 00 06 f4 9c 93 ..... 2020-01-13T09:20:06.142421+00:00 smithi142 kernel: front: 00000000: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 2020-01-13T09:20:06.142438+00:00 smithi142 kernel: front: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 01 01 04 00 ................ 2020-01-13T09:20:06.142454+00:00 smithi142 kernel: front: 00000020: 00 00 00 00 00 00 ...... 2020-01-13T09:20:06.142472+00:00 smithi142 kernel: footer: 00000000: a0 8f fa 5d 00 00 00 00 00 00 00 00 5d 5f 89 eb ...]........]_.. 2020-01-13T09:20:06.142489+00:00 smithi142 kernel: footer: 00000010: a3 8b f2 88 05 ..... 2020-01-13T09:21:10.061315+00:00 smithi142 kernel: ceph: mdsc_handle_session corrupt message mds0 len 38 2020-01-13T09:21:10.061384+00:00 smithi142 kernel: header: 00000000: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 2020-01-13T09:21:10.061408+00:00 smithi142 kernel: header: 00000010: 16 00 7f 00 01 00 26 00 00 00 00 00 00 00 00 00 ......&......... 2020-01-13T09:21:10.061425+00:00 smithi142 kernel: header: 00000020: 00 00 00 00 02 00 00 00 00 00 00 00 00 01 00 00 ................ 2020-01-13T09:21:10.061442+00:00 smithi142 kernel: header: 00000030: 00 f1 40 fb de ..@.. 2020-01-13T09:21:10.061459+00:00 smithi142 kernel: front: 00000000: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 2020-01-13T09:21:10.061475+00:00 smithi142 kernel: front: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 01 01 04 00 ................ 2020-01-13T09:21:10.061492+00:00 smithi142 kernel: front: 00000020: 00 00 00 00 00 00 ...... 2020-01-13T09:21:10.061508+00:00 smithi142 kernel: footer: 00000000: e0 cf e5 da 00 00 00 00 00 00 00 00 43 18 24 12 ............C.$. 2020-01-13T09:21:10.061525+00:00 smithi142 kernel: footer: 00000010: dc f6 a6 a1 05 .....
From: /ceph/teuthology-archive/pdonnell-2020-01-13_01:55:25-kcephfs-wip-pdonnell-testing-20200112.224135-distro-basic-smithi/4661268/remote/smithi142/syslog/kern.log.gz
All of these failures:
Failure: Command failed on smithi124 with status 32: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage /bin/mount -t ceph :/ /home/ubuntu/cephtest/mnt.3 -v -o name=3,norequire_active_mds,conf=/etc/ceph/ceph.conf' 124 jobs suites intersection: ['conf/{client.yaml', 'log-config.yaml', 'mds.yaml', 'mon.yaml', 'osd-asserts.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'rhel_8.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}'] suites union: ['clusters/1-mds-1-client.yaml', 'clusters/1-mds-2-client.yaml', 'clusters/1-mds-4-client.yaml', 'conf/{client.yaml', 'inline/no.yaml', 'inline/yes.yaml', 'kcephfs/cephfs/{begin.yaml', 'kcephfs/mixed-clients/{begin.yaml', 'kcephfs/recovery/{begin.yaml', 'kcephfs/thrash/{begin.yaml', 'kclient-overrides/{distro/rhel/{k-distro.yaml', 'kclient/{mount.yaml', 'log-config.yaml', 'mds.yaml', 'mon.yaml', 'ms-die-on-skipped.yaml}', 'ms-die-on-skipped.yaml}}', 'objectstore-ec/bluestore-bitmap.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd-asserts.yaml', 'osd.yaml}', 'overrides/{distro/rhel/{k-distro.yaml', 'overrides/{frag_enable.yaml', 'rhel_8.yaml}', 'tasks/acls-kernel-client.yaml}', 'tasks/auto-repair.yaml}', 'tasks/client-limits.yaml}', 'tasks/client-recovery.yaml}', 'tasks/data-scan.yaml}', 'tasks/failover.yaml}', 'tasks/journal-repair.yaml}', 'tasks/kclient_workunit_direct_io.yaml}', 'tasks/kclient_workunit_kernel_untar_build.yaml}', 'tasks/kclient_workunit_misc.yaml}', 'tasks/kclient_workunit_o_trunc.yaml}', 'tasks/kclient_workunit_snaps.yaml}', 'tasks/kclient_workunit_suites_dbench.yaml}', 'tasks/kclient_workunit_suites_ffsb.yaml}', 'tasks/kclient_workunit_suites_fsstress.yaml}', 'tasks/kclient_workunit_suites_fsx.yaml}', 'tasks/kclient_workunit_suites_fsync.yaml}', 'tasks/kclient_workunit_suites_iozone.yaml}', 'tasks/kclient_workunit_suites_pjd.yaml}', 'tasks/kclient_workunit_trivial_sync.yaml}', 'tasks/kernel_cfuse_workunits_untarbuild_blogbench.yaml}', 'tasks/mds-flush.yaml}', 'tasks/pool-perm.yaml}', 'tasks/sessionmap.yaml}', 'tasks/volume-client.yaml}', 'thrash-health-whitelist.yaml', 'thrashers/default.yaml', 'thrashers/mds.yaml', 'thrashers/mon.yaml', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}', 'workloads/kclient_workunit_suites_ffsb.yaml}', 'workloads/kclient_workunit_suites_iozone.yaml}']
The message looks like:
2020-01-13T09:20:06.134+0000 7fe9db520700 1 -- [v2:172.21.15.120:6834/3149285072,v1:172.21.15.120:6835/3149285072] --> v1:172.21.15.142:0/2769041657 -- client_session(open) v4 -- 0x5589aec98b40 con 0x5589adf24900
Updated by Patrick Donnelly over 4 years ago
Jeff Layton wrote:
What kernel is this?
2020-01-13T09:09:32.013 INFO:teuthology.orchestra.run.smithi166:> uname -r 2020-01-13T09:09:32.029 INFO:teuthology.orchestra.run.smithi166.stdout:4.18.0-80.11.2.el8_0.x86_64
From: /ceph/teuthology-archive/pdonnell-2020-01-13_01:55:25-kcephfs-wip-pdonnell-testing-20200112.224135-distro-basic-smithi/4661268/teuthology.log
Updated by Patrick Donnelly over 4 years ago
- Assignee changed from Patrick Donnelly to Jeff Layton
Updated by Jeff Layton over 4 years ago
This is just one of those places where the kernel client did not ever expect to see a struct be extended. I suspect this is something already fixed for RHEL8.2. handle_session has this:
/* decode */ if (msg->front.iov_len != sizeof(*h)) goto bad;
...and the struct version got extended in mainline ceph.
This may be a regression in the MDS in 55d8fdef68740c4f0a83d55a04ac7a10ff4db15b. Should the presentation of metric_spec also be gated on something to ensure the client can parse it?
Updated by Patrick Donnelly over 4 years ago
- Status changed from New to In Progress
- Assignee changed from Jeff Layton to Patrick Donnelly
Updated by Patrick Donnelly over 4 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 32659
Updated by Patrick Donnelly over 4 years ago
- Has duplicate Bug #43664: mds: metric_spec is encoded into version 1 MClientSession added
Updated by Patrick Donnelly over 4 years ago
- Status changed from Fix Under Review to Resolved