Project

General

Profile

Bug #40305

qa: spurious unresponsive client causes eviction due to valgrind/multimds

Added by Patrick Donnelly 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:

Description

Failure: Command failed (workunit test suites/fsstress.sh) on smithi160 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5b90a6db883d8ede253330f79ce5766715a7fb9f TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.1/qa/workunits/suites/fsstress.sh'
4 jobs: ['4020865', '4020801', '4020822', '4020844']
suites intersection: ['clusters/3-mds.yaml', 'conf/{client.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'multimds/verify/{begin.yaml', 'osd.yaml}', 'overrides/{fuse-default-perm-no.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'verify/{frag_enable.yaml', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}}']
suites union: ['clusters/3-mds.yaml', 'conf/{client.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'multimds/verify/{begin.yaml', 'objectstore-ec/bluestore-bitmap.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'osd.yaml}', 'overrides/{fuse-default-perm-no.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'verify/{frag_enable.yaml', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}}']

From: /ceph/teuthology-archive/pdonnell-2019-06-11_01:14:05-multimds-wip-pdonnell-testing-20190610.220401-distro-basic-smithi/4020865/

Looks like mds.1 had spammed the client with client_caps import messages and the client never got around to renewing its caps in time. This is likely just do to valgrind slowing everything down. I'm going to suggest modifying the session timeout for these valgrind workload tests.

History

#1 Updated by Patrick Donnelly 2 months ago

  • Assignee changed from Patrick Donnelly to Zheng Yan

Zheng is going to take a look.

#2 Updated by Patrick Donnelly about 2 months ago

/ceph/teuthology-archive/pdonnell-2019-06-21_01:51:23-multimds-wip-pdonnell-testing-20190620.220400-distro-basic-smithi/4054061/teuthology.log

/ceph/teuthology-archive/pdonnell-2019-06-21_01:51:23-multimds-wip-pdonnell-testing-20190620.220400-distro-basic-smithi/4054037/teuthology.log

Also available in: Atom PDF