Project

General

Profile

Bug #38652

mds|kclient: MDS_CLIENT_LATE_RELEASE warning caused by inline bug on RHEL 7.5

Added by Patrick Donnelly 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, kceph
Labels (FS):
Pull request ID:

Description

Failure: "2019-03-07 20:19:55.015557 mon.b (mon.0) 310 : cluster [WRN] Health check failed: 1 clients failing to respond to capability release (MDS_CLIENT_LATE_RELEASE)" in cluster log
23 jobs: ['3679196', '3679259', '3679211', '3679328', '3679383', '3679117', '3679469', '3679375', '3679414', '3679274', '3679156', '3679289', '3679282', '3679297', '3679242', '3679159', '3679070', '3679446', '3679461', '3679125', '3679417', '3679110', '3679312']
suites intersection: ['conf/{client.yaml', 'fuse-default-perm-no.yaml}', 'mds.yaml', 'mon.yaml', 'mount/kclient/{mount.yaml', 'ms-die-on-skipped.yaml}}', 'multimds/basic/{begin.yaml', 'osd.yaml}', 'overrides/{basic/{frag_enable.yaml', 'q_check_counter/check_counter.yaml', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['clusters/3-mds.yaml', 'clusters/9-mds.yaml', 'conf/{client.yaml', 'fuse-default-perm-no.yaml}', 'inline/no.yaml', 'inline/yes.yaml', 'k-distro.yaml}', 'mds.yaml', 'mon.yaml', 'mount/kclient/{mount.yaml', 'ms-die-on-skipped.yaml}}', 'multimds/basic/{begin.yaml', 'objectstore-ec/bluestore-bitmap.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd.yaml}', 'overrides/{basic/{frag_enable.yaml', 'overrides/{distro/random/{k-testing.yaml', 'overrides/{distro/rhel/{7.5.yaml', 'q_check_counter/check_counter.yaml', 'supported$/{rhel_latest.yaml}}', 'supported$/{ubuntu_16.04.yaml}}', 'tasks/cfuse_workunit_kernel_untar_build.yaml}', 'tasks/cfuse_workunit_misc.yaml}', 'tasks/cfuse_workunit_suites_blogbench.yaml}', 'tasks/cfuse_workunit_suites_dbench.yaml}', 'tasks/cfuse_workunit_suites_ffsb.yaml}', 'tasks/cfuse_workunit_suites_fsstress.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

Ignore k-testing and inline/yes.yaml above, the scrape tool merged this test failure with the above: http://pulpito.ceph.com/pdonnell-2019-03-07_15:13:09-multimds-wip-pdonnell-testing-20190307.041917-distro-basic-smithi/3679312/

From: http://pulpito.ceph.com/pdonnell-2019-03-07_15:13:09-multimds-wip-pdonnell-testing-20190307.041917-distro-basic-smithi/

Zheng suggested it might be related to inline data:

2019-03-07 16:00:35.377 7fde44e5e700  7 mds.1.locker issue_caps loner client.4591 allowed=pAsxLsXsxFsxcrwb, xlocker allowed=pAsxLsXsxFsxcrwb, others allowed=pLs on [inode 0x100000001f0 [2,head] /client.0/tmp/blogbench-1.0/src/blogtest_in/blog-2/article-59.xml auth{0=1} v190 dirtyparent s=0 n(v0 rc2019-03-07 16:00:35.311945 1=1+0)/n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4591=0-4194304@1} caps={4591=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@1},l=4591 | importingcaps=1 caps=1 dirtyrstat=1 dirtyparent=1 replicated=1 dirty=1 0x5599cb630000]
2019-03-07 16:00:35.377 7fde44e5e700 20 mds.1.locker  client.4591 pending pAsxLsXsxFsxcrwb allowed pAsxLsXsxFsxcb wanted pAsxXsxFxcwb
2019-03-07 16:00:35.377 7fde44e5e700  7 mds.1.locker    sending MClientCaps to client.4591 seq 2 new pending pAsxLsXsxFsxcb was pAsxLsXsxFsxcrwb
2019-03-07 16:00:35.377 7fde44e5e700 20 mds.1.cache.ino(0x100000001f0) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 ctime 2019-03-07 16:00:35.311945
2019-03-07 16:00:35.377 7fde44e5e700 10 mds.1.7 send_message_client_counted client.4591 seq 117 client_caps(revoke ino 0x100000001f0 151 seq 2 caps=pAsxLsXsxFsxcb dirty=- wanted=pAsxXsxFxcwb follows 0 mseq 1 size 0/4194304 ts 1/18446744073709551615 mtime 2019-03-07 16:00:35.306945) v11

"Inline data related bug. You can see Frw magically disappeared. I suspect session->get_connection() is null." -Zheng


Related issues

Duplicated by fs - Bug #38636: Inline data compatibly check in Locker::issue_caps is buggy Duplicate 03/08/2019
Copied to fs - Backport #39225: nautilus: mds|kclient: MDS_CLIENT_LATE_RELEASE warning caused by inline bug on RHEL 7.5 Resolved

History

#1 Updated by Patrick Donnelly 3 months ago

  • Duplicated by Bug #38636: Inline data compatibly check in Locker::issue_caps is buggy added

#2 Updated by Patrick Donnelly 3 months ago

See also Zheng's analysis in the ticket he opened: #38636

#3 Updated by Zheng Yan 3 months ago

  • Status changed from Verified to Need Review
  • Pull request ID set to 26811

#4 Updated by Zheng Yan 3 months ago

new issue that can cause this warning (file lock become sync state while Fcb is issued)

/ceph/teuthology-archive/pdonnell-2019-03-16_00:19:15-multimds-wip-pdonnell-testing-20190315.213331-distro-basic-smithi/3730992/


2019-03-17 13:21:55.857 1e672700 20 mds.2.migrator  did replicate_relax_locks, now [inode 0x2000000039a [2,head] /client.0/tmp/clients/client9/~dmtmp/PARADOX/COURSES.DB auth v103 dirtyparent s=260096 n(v0 rc2019-03-17 13:19:01.074278 b260096 1=1+0) (iauth excl) (ixattr excl) (iversion lock) cr={4564=0-4194304@1} caps={4564=pAsxLsXsxFcb/pAsxXsxFsxcrwb@5},l=4564 | ptrwaiter=2 lock=0 caps=1 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x210b9c90]
2019-03-17 13:21:55.858 1e672700 20 mds.2.migrator encode_export_inode_caps [inode 0x2000000039a [2,head] /client.0/tmp/clients/client9/~dmtmp/PARADOX/COURSES.DB auth v103 dirtyparent s=260096 n(v0 rc2019-03-17 13:19:01.074278 b260096 1=1+0) (iauth excl) (ixattr excl) (iversion lock) cr={4564=0-419

#5 Updated by Patrick Donnelly 2 months ago

  • Status changed from Need Review to Pending Backport
  • Backport changed from nautilus,mimic,luminous to nautilus

#6 Updated by Nathan Cutler 2 months ago

  • Copied to Backport #39225: nautilus: mds|kclient: MDS_CLIENT_LATE_RELEASE warning caused by inline bug on RHEL 7.5 added

#7 Updated by Nathan Cutler 2 months ago

  • Pull request ID changed from 26811 to 26881

#8 Updated by Nathan Cutler 2 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF