Project

General

Profile

Actions

Bug #47140

closed

mgr/volumes: unresponsive Client::abort_conn() when cleaning stale libcephfs handle

Added by Venky Shankar over 3 years ago. Updated over 3 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Libcephfs connection pool in mgr (mgr_util) identifies stale filesystem handles and cleans them up by calling abort_conn(). In certain teuthology runs abort_conn() never returns causing the test to fail (timeout).

Sample run: https://pulpito.ceph.com/vshankar-2020-08-26_05:34:12-fs-wip-pdonnell-testing-20200826.032941-distro-basic-smithi/5376678/

From manager logs:

2020-08-26T05:56:16.163+0000 7fab865a3700  1 -- [v2:172.21.15.142:6800/15025,v1:172.21.15.142:6801/15025] <== client.7542 172.21.15.142:0/1077893343 1 ==== mgr_command(tid 0: {"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_0000000000761393", "target": ["mon-mgr", ""]}) v1 ==== 148+0+0 (secure 0 0 0) 0x55f674b95e40 con 0x55f674be9400
2020-08-26T05:56:16.167+0000 7fab865a3700  0 log_channel(audit) log [DBG] : from='client.7542 -' entity='client.admin' cmd=[{"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_0000000000761393", "target": ["mon-mgr", ""]}]: dispatch
2020-08-26T05:56:16.167+0000 7fab86da4700  0 [volumes INFO volumes.module] Starting _cmd_fs_subvolume_create(prefix:fs subvolume create, sub_name:subvolume_0000000000761393, target:['mon-mgr', ''], vol_name:cephfs) < "" 
2020-08-26T05:56:16.167+0000 7fab86da4700  0 [volumes DEBUG mgr_util] self.fs_id=2, fs_id=3
2020-08-26T05:56:16.167+0000 7fab86da4700  0 [volumes DEBUG mgr_util] self.fs_id=2, fs_id=3
2020-08-26T05:56:16.167+0000 7fab86da4700  0 [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
2020-08-26T05:56:16.167+0000 7fab86da4700  2 client.5220 unmounting (abort)

So subsequent log messages seen w.r.t. abort returning. esp this log: https://github.com/ceph/ceph/blob/master/src/pybind/mgr/mgr_util.py#L138


Related issues 1 (0 open1 closed)

Is duplicate of CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'ResolvedXiubo Li

Actions
Actions

Also available in: Atom PDF