Project

General

Profile

Actions

Bug #22886

closed

kclient: Test failure: test_full_same_file (tasks.cephfs.test_full.TestClusterFull)

Added by Patrick Donnelly about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From: http://pulpito.ceph.com/pdonnell-2018-01-30_23:38:56-kcephfs-wip-pdonnell-i22627-testing-basic-smithi/2129601/

kclient is slow to or does not release caps (I think?) and this leads to a test timeout waiting for purge.

/ceph/teuthology-archive/pdonnell-2018-01-30_23:38:56-kcephfs-wip-pdonnell-i22627-testing-basic-smithi/2129751/remote/smithi052/log/ceph-mds.a-s.log.gz

File is not purged until after client unmounts.

This is with my branch updating the kcephfs suite: https://github.com/ceph/ceph-ci/tree/wip-pdonnell-i22627


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #22966: luminous: kclient: Test failure: test_full_same_file (tasks.cephfs.test_full.TestClusterFull) ResolvedPrashant DActions
Actions #1

Updated by Patrick Donnelly about 6 years ago

These may be related:

Failure: Test failure: test_purge_queue_op_rate (tasks.cephfs.test_strays.TestStrays)
3 jobs: ['2129710', '2129760', '2129610']
suites intersection: ['debug/mds_client.yaml', 'dirfrag/frag_enable.yaml', 'frag_enable.yaml', 'kcephfs/recovery/{clusters/1-mds-4-client.yaml', 'log-config.yaml', 'mounts/kmounts.yaml', 'osd-asserts.yaml', 'overrides/{debug.yaml', 'tasks/strays.yaml', 'whitelist_health.yaml', 'whitelist_health.yaml}', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['debug/mds_client.yaml', 'dirfrag/frag_enable.yaml', 'frag_enable.yaml', 'kcephfs/recovery/{clusters/1-mds-4-client.yaml', 'log-config.yaml', 'mounts/kmounts.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd-asserts.yaml', 'overrides/{debug.yaml', 'tasks/strays.yaml', 'whitelist_health.yaml', 'whitelist_health.yaml}', 'whitelist_wrongly_marked_down.yaml}']
Actions #2

Updated by Zheng Yan about 6 years ago

it seems to be caused by delay dirty metadata writeback

Actions #4

Updated by Zheng Yan about 6 years ago

this patch https://github.com/ceph/ceph-ci/commit/2fff0eb4c491f04803debec7c0f5de66e3825ee7 seems to make full tests pass on kclient.

http://pulpito.ceph.com/zyan-2018-02-02_13:57:27-kcephfs-wip-pdonnell-i22627-testing-basic-mira/2141813/

but the test still failed with error

{client.0-kernel-sha1: b9e5d03b6e64972164bff45ae3adb64a23e7568a, client.1-kernel-sha1: b9e5d03b6e64972164bff45ae3adb64a23e7568a,
  client.2-kernel-sha1: b9e5d03b6e64972164bff45ae3adb64a23e7568a, client.3-kernel-sha1: b9e5d03b6e64972164bff45ae3adb64a23e7568a,
  description: 'kcephfs/recovery/{clusters/1-mds-4-client.yaml debug/mds_client.yaml
    dirfrag/frag_enable.yaml mounts/kmounts.yaml objectstore-ec/bluestore-comp.yaml
    overrides/{debug.yaml frag_enable.yaml log-config.yaml osd-asserts.yaml whitelist_health.yaml
    whitelist_wrongly_marked_down.yaml} tasks/mds-full.yaml whitelist_health.yaml}',
  duration: 1410.9979951381683, failure_reason: '"2018-02-02 14:19:25.023368 mon.a
    mon.0 172.21.4.108:6789/0 230 : cluster [WRN] Health check failed: pauserd,pausewr
    flag(s) set (OSDMAP_FLAGS)" in cluster log', flavor: basic, mon.a-kernel-sha1: b9e5d03b6e64972164bff45ae3adb64a23e7568a,
  mon.b-kernel-sha1: b9e5d03b6e64972164bff45ae3adb64a23e7568a, owner: scheduled_zyan@teuthology,
  success: false}

Actions #5

Updated by Patrick Donnelly about 6 years ago

  • Status changed from New to In Progress

Yes, that error has been happening for the mds-full tests now with and without kclient. I'll look into that today. Thanks Zheng!

Actions #6

Updated by Patrick Donnelly about 6 years ago

  • Status changed from In Progress to Pending Backport
  • Backport set to luminous
Actions #8

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #22966: luminous: kclient: Test failure: test_full_same_file (tasks.cephfs.test_full.TestClusterFull) added
Actions #9

Updated by Nathan Cutler about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF