Project

General

Profile

Actions

Bug #40867

closed

mgr: failover during in qa testing causes unresponsive client warnings

Added by Patrick Donnelly almost 5 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes, qa-suite
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

"2019-07-19T23:47:47.246664+0000 mds.c (mds.0) 1 : cluster [WRN] evicting unresponsive client smithi114:x (4576), after 300.508 seconds" in cluster log

From: http://pulpito.ceph.com/sage-2019-07-19_21:25:20-rados-master-distro-basic-smithi/4130627/

This either needs to be whitelisted or (preferably) the volumes plugin should cleanup libcephfs connections on terminal signal.


Related issues 2 (0 open2 closed)

Related to CephFS - Bug #43943: qa: "[WRN] evicting unresponsive client smithi131:z (6314), after 304.461 seconds"ResolvedVenky Shankar

Actions
Copied to CephFS - Backport #40944: nautilus: mgr: failover during in qa testing causes unresponsive client warningsResolvedPrashant DActions
Actions #1

Updated by Patrick Donnelly almost 5 years ago

Sage's whitelist PR: https://github.com/ceph/ceph/pull/29169

As I said in issue description, I'd prefer if we cleanup the libcephfs handles after a fatal signal if possible.

Actions #2

Updated by Venky Shankar almost 5 years ago

Patrick Donnelly wrote:

Sage's whitelist PR: https://github.com/ceph/ceph/pull/29169

As I said in issue description, I'd prefer if we cleanup the libcephfs handles after a fatal signal if possible.

ACK.

Actions #3

Updated by Sage Weil almost 5 years ago

  • Status changed from New to Pending Backport
Actions #4

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #40944: nautilus: mgr: failover during in qa testing causes unresponsive client warnings added
Actions #5

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #6

Updated by Sage Weil about 4 years ago

another instance of this on master,

2020-01-28T18:16:42.687 INFO:teuthology.orchestra.run.smithi118.stdout:2020-01-28T18:15:31.406372+0000 mds.a (mds.0) 1 : cluster [WRN] evicting unresponsive client smithi131:z (6314), after 304.461 seconds

/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713589
description: rados/mgr/{clusters/{2-node-mgr.yaml} debug/mgr.yaml objectstore/bluestore-bitmap.yaml
supported-random-distro$/{ubuntu_latest.yaml} tasks/module_selftest.yaml}

Actions #7

Updated by Venky Shankar about 4 years ago

Sage Weil wrote:

another instance of this on master,
[...]
/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713589
description: rados/mgr/{clusters/{2-node-mgr.yaml} debug/mgr.yaml objectstore/bluestore-bitmap.yaml
supported-random-distro$/{ubuntu_latest.yaml} tasks/module_selftest.yaml}

I'll take a look.

Actions #8

Updated by Sage Weil about 4 years ago

  • Status changed from Resolved to In Progress

Another one:
/a/sage-2020-01-30_22:27:29-rados-wip-sage-testing-2020-01-30-1230-distro-basic-smithi/4719492

Actions #9

Updated by Sage Weil about 4 years ago

  • Priority changed from Immediate to Urgent
Actions #10

Updated by Patrick Donnelly about 4 years ago

  • Status changed from In Progress to Resolved

Moving this back to resolved. Opened #43943

Actions #11

Updated by Patrick Donnelly about 4 years ago

  • Related to Bug #43943: qa: "[WRN] evicting unresponsive client smithi131:z (6314), after 304.461 seconds" added
Actions

Also available in: Atom PDF