Project

General

Profile

Bug #40867

mgr: failover during in qa testing causes unresponsive client warnings

Added by Patrick Donnelly about 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes, qa-suite
Labels (FS):
Pull request ID:
Crash signature:

Description

"2019-07-19T23:47:47.246664+0000 mds.c (mds.0) 1 : cluster [WRN] evicting unresponsive client smithi114:x (4576), after 300.508 seconds" in cluster log

From: http://pulpito.ceph.com/sage-2019-07-19_21:25:20-rados-master-distro-basic-smithi/4130627/

This either needs to be whitelisted or (preferably) the volumes plugin should cleanup libcephfs connections on terminal signal.


Related issues

Related to fs - Bug #43943: qa: "[WRN] evicting unresponsive client smithi131:z (6314), after 304.461 seconds" Resolved
Copied to fs - Backport #40944: nautilus: mgr: failover during in qa testing causes unresponsive client warnings Resolved

History

#1 Updated by Patrick Donnelly about 1 year ago

Sage's whitelist PR: https://github.com/ceph/ceph/pull/29169

As I said in issue description, I'd prefer if we cleanup the libcephfs handles after a fatal signal if possible.

#2 Updated by Venky Shankar about 1 year ago

Patrick Donnelly wrote:

Sage's whitelist PR: https://github.com/ceph/ceph/pull/29169

As I said in issue description, I'd prefer if we cleanup the libcephfs handles after a fatal signal if possible.

ACK.

#3 Updated by Sage Weil about 1 year ago

  • Status changed from New to Pending Backport

#4 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #40944: nautilus: mgr: failover during in qa testing causes unresponsive client warnings added

#5 Updated by Nathan Cutler 10 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#6 Updated by Sage Weil 8 months ago

another instance of this on master,

2020-01-28T18:16:42.687 INFO:teuthology.orchestra.run.smithi118.stdout:2020-01-28T18:15:31.406372+0000 mds.a (mds.0) 1 : cluster [WRN] evicting unresponsive client smithi131:z (6314), after 304.461 seconds

/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713589
description: rados/mgr/{clusters/{2-node-mgr.yaml} debug/mgr.yaml objectstore/bluestore-bitmap.yaml
supported-random-distro$/{ubuntu_latest.yaml} tasks/module_selftest.yaml}

#7 Updated by Venky Shankar 8 months ago

Sage Weil wrote:

another instance of this on master,
[...]
/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713589
description: rados/mgr/{clusters/{2-node-mgr.yaml} debug/mgr.yaml objectstore/bluestore-bitmap.yaml
supported-random-distro$/{ubuntu_latest.yaml} tasks/module_selftest.yaml}

I'll take a look.

#8 Updated by Sage Weil 8 months ago

  • Status changed from Resolved to In Progress

Another one:
/a/sage-2020-01-30_22:27:29-rados-wip-sage-testing-2020-01-30-1230-distro-basic-smithi/4719492

#9 Updated by Sage Weil 8 months ago

  • Priority changed from Immediate to Urgent

#10 Updated by Patrick Donnelly 8 months ago

  • Status changed from In Progress to Resolved

Moving this back to resolved. Opened #43943

#11 Updated by Patrick Donnelly 8 months ago

  • Related to Bug #43943: qa: "[WRN] evicting unresponsive client smithi131:z (6314), after 304.461 seconds" added

Also available in: Atom PDF