Project

General

Profile

Actions

Bug #47565

closed

qa: "client.4606 isn't responding to mclientcaps(revoke), ino 0x200000007d5 pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 60.889494 seconds ago"

Added by Patrick Donnelly over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Testing
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, MDS
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

failure_reason: '"2020-09-20T16:27:35.946549+0000 mds.b (mds.0) 1 : cluster [WRN]
  client.4606 isn''t responding to mclientcaps(revoke), ino 0x200000007d5 pending
  pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 60.889494 seconds ago" in cluster log'

From: /ceph/teuthology-archive/pdonnell-2020-09-20_07:13:47-multimds-wip-pdonnell-testing-20200920.040823-distro-basic-smithi/5451587/teuthology.log

and

Failure: "2020-09-20T16:27:35.946549+0000 mds.b (mds.0) 1 : cluster [WRN] client.4606 isn't responding to mclientcaps(revoke), ino 0x200000007d5 pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 60.889494 seconds ago" in cluster log
7 jobs: ['5451587', '5451424', '5451680', '5451482', '5451488', '5451475', '5451715']
suites intersection: ['conf/{client', 'mds', 'mon', 'osd}', 'whitelist_health']
suites union: ['begin', 'centos_latest', 'clusters/3-mds', 'clusters/9-mds', 'conf/{client', 'fuse-default-perm-no}', 'inline/no', 'inline/yes', 'k-testing}', 'mds', 'mon', 'mon-debug', 'mount', 'mount/fuse', 'mount/kclient/{k-testing', 'mount/kclient/{mount', 'ms-die-on-skipped}', 'ms-die-on-skipped}}', 'multimds/basic/{0-supported-random-distro$/{centos_8}', 'multimds/basic/{0-supported-random-distro$/{rhel_8}', 'multimds/basic/{0-supported-random-distro$/{ubuntu_latest}', 'multimds/verify/{begin', 'objectstore-ec/bluestore-bitmap', 'objectstore-ec/bluestore-comp', 'objectstore-ec/bluestore-comp-ec-root', 'objectstore-ec/bluestore-ec-root', 'osd}', 'overrides/{basic/{frag_enable', 'overrides/{distro/testing/{flavor/ubuntu_latest', 'overrides/{fuse-default-perm-no', 'q_check_counter/check_counter', 'tasks/cfuse_workunit_suites_blogbench}', 'tasks/cfuse_workunit_suites_ffsb}', 'tasks/cfuse_workunit_suites_fsstress', 'validater/valgrind}', 'verify/{frag_enable', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'whitelist_wrongly_marked_down}}']

This looks like some kind of multimds bug with a client having caps with 2+ MDS. Here's the relevant MDS log for 5451587:

./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:26:35.055+0000 7f47c6334700 10 mds.0.4 send_message_client_counted client.4606 seq 1751 client_caps(revoke ino 0x200000007d5 182 seq 11 caps=pAsLsXsFscr dirty=- wanted=Fxcb follows 0 mseq 1 size 104857600/213909504 ts 1/18446744073709551615 mtime 2020-09-20T16:18:26.669617+0000) v11
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:26:35.055+0000 7f47c6334700  1 -- [v2:172.21.15.120:6835/1535229919,v1:172.21.15.120:6837/1535229919] --> 172.21.15.110:0/2458736152 -- client_caps(revoke ino 0x200000007d5 182 seq 11 caps=pAsLsXsFscr dirty=- wanted=Fxcb follows 0 mseq 1 size 104857600/213909504 ts 1/18446744073709551615 mtime 2020-09-20T16:18:26.669617+0000) v11 -- 0x55a9729bd200 con 0x55a972739800
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:26:35.055+0000 7f47c6334700 10 mds.0.cache.ino(0x200000007d5) auth_pin by 0x55a972d41a10 on [inode 0x200000007d5 [2,head] /client.1/tmp/tmp/data/datafile116 auth{1=1} v2245 ap=2 dirtyparent s=104857600 n(v0 rc2020-09-20T16:18:26.669617+0000 b104857600 1=1+0) (ifile excl->sync) (iversion lock) cr={4606=0-213909504@1} caps={4606=pAsLsXsFscr/pAsLsXsFsxcrwb/Fxcb@11},l=4606 | ptrwaiter=0 request=0 lock=0 importingcaps=0 caps=1 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=1 0x55a972d41700] now 2
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:26:35.944+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 0.888669 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:26:40.944+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 5.888904 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:26:45.944+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 10.889030 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:26:50.944+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 15.888847 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:26:55.944+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 20.888905 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:00.944+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 25.889039 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:05.944+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 30.889069 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:10.945+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 35.889172 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:15.946+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 40.889251 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:20.945+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 45.889279 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:25.945+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 50.889367 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:30.945+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 55.889374 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:35.945+0000 7f47c4b31700 20 mds.0.locker caps_tick age = 60.889494 client.4606.0x200000007d5
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:35.945+0000 7f47c4b31700  0 log_channel(cluster) log [WRN] : client.4606 isn't responding to mclientcaps(revoke), ino 0x200000007d5 pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 60.889494 seconds ago
./remote/smithi120/log/ceph-mds.b.log.gz:2020-09-20T16:27:35.945+0000 7f47c4b31700 20 mds.0.locker caps_tick client.4606 isn't responding to mclientcaps(revoke), ino 0x200000007d5 pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 60.889494 seconds ago

For debugging this, I suggest looking at the messages going to each MDS from the client. Fortunately this is a ceph-fuse client so the logs are available for what it's doing. This does happen on the kernel client too. For example:

/ceph/teuthology-archive/pdonnell-2020-09-20_07:13:47-multimds-wip-pdonnell-testing-20200920.040823-distro-basic-smithi/5451482/teuthology.log


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #47990: nautilus: qa: "client.4606 isn't responding to mclientcaps(revoke), ino 0x200000007d5 pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 60.889494 seconds ago"ResolvedNathan CutlerActions
Copied to CephFS - Backport #47991: octopus: qa: "client.4606 isn't responding to mclientcaps(revoke), ino 0x200000007d5 pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 60.889494 seconds ago"ResolvedNathan CutlerActions
Actions

Also available in: Atom PDF