Project

General

Profile

Bug #40305

qa: spurious unresponsive client causes eviction due to valgrind/multimds

Added by Patrick Donnelly almost 5 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Failure: Command failed (workunit test suites/fsstress.sh) on smithi160 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5b90a6db883d8ede253330f79ce5766715a7fb9f TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.1/qa/workunits/suites/fsstress.sh'
4 jobs: ['4020865', '4020801', '4020822', '4020844']
suites intersection: ['clusters/3-mds.yaml', 'conf/{client.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'multimds/verify/{begin.yaml', 'osd.yaml}', 'overrides/{fuse-default-perm-no.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'verify/{frag_enable.yaml', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}}']
suites union: ['clusters/3-mds.yaml', 'conf/{client.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'multimds/verify/{begin.yaml', 'objectstore-ec/bluestore-bitmap.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'osd.yaml}', 'overrides/{fuse-default-perm-no.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'verify/{frag_enable.yaml', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}}']

From: /ceph/teuthology-archive/pdonnell-2019-06-11_01:14:05-multimds-wip-pdonnell-testing-20190610.220401-distro-basic-smithi/4020865/

Looks like mds.1 had spammed the client with client_caps import messages and the client never got around to renewing its caps in time. This is likely just do to valgrind slowing everything down. I'm going to suggest modifying the session timeout for these valgrind workload tests.

History

#1 Updated by Patrick Donnelly almost 5 years ago

  • Assignee changed from Patrick Donnelly to Zheng Yan

Zheng is going to take a look.

#2 Updated by Patrick Donnelly over 4 years ago

/ceph/teuthology-archive/pdonnell-2019-06-21_01:51:23-multimds-wip-pdonnell-testing-20190620.220400-distro-basic-smithi/4054061/teuthology.log

/ceph/teuthology-archive/pdonnell-2019-06-21_01:51:23-multimds-wip-pdonnell-testing-20190620.220400-distro-basic-smithi/4054037/teuthology.log

#3 Updated by Patrick Donnelly about 4 years ago

  • Target version deleted (v15.0.0)

#4 Updated by Patrick Donnelly over 3 years ago

  • Assignee deleted (Zheng Yan)

Also available in: Atom PDF