Actions
Bug #47140
closedmgr/volumes: unresponsive Client::abort_conn() when cleaning stale libcephfs handle
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Libcephfs connection pool in mgr (mgr_util) identifies stale filesystem handles and cleans them up by calling abort_conn(). In certain teuthology runs abort_conn() never returns causing the test to fail (timeout).
From manager logs:
2020-08-26T05:56:16.163+0000 7fab865a3700 1 -- [v2:172.21.15.142:6800/15025,v1:172.21.15.142:6801/15025] <== client.7542 172.21.15.142:0/1077893343 1 ==== mgr_command(tid 0: {"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_0000000000761393", "target": ["mon-mgr", ""]}) v1 ==== 148+0+0 (secure 0 0 0) 0x55f674b95e40 con 0x55f674be9400 2020-08-26T05:56:16.167+0000 7fab865a3700 0 log_channel(audit) log [DBG] : from='client.7542 -' entity='client.admin' cmd=[{"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_0000000000761393", "target": ["mon-mgr", ""]}]: dispatch 2020-08-26T05:56:16.167+0000 7fab86da4700 0 [volumes INFO volumes.module] Starting _cmd_fs_subvolume_create(prefix:fs subvolume create, sub_name:subvolume_0000000000761393, target:['mon-mgr', ''], vol_name:cephfs) < "" 2020-08-26T05:56:16.167+0000 7fab86da4700 0 [volumes DEBUG mgr_util] self.fs_id=2, fs_id=3 2020-08-26T05:56:16.167+0000 7fab86da4700 0 [volumes DEBUG mgr_util] self.fs_id=2, fs_id=3 2020-08-26T05:56:16.167+0000 7fab86da4700 0 [volumes INFO mgr_util] aborting connection from cephfs 'cephfs' 2020-08-26T05:56:16.167+0000 7fab86da4700 2 client.5220 unmounting (abort)
So subsequent log messages seen w.r.t. abort returning. esp this log: https://github.com/ceph/ceph/blob/master/src/pybind/mgr/mgr_util.py#L138
Actions