Project

General

Profile

Actions

Bug #44276

closed

pybind/mgr/volumes: cleanup stale connection hang

Added by Patrick Donnelly about 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-02-24 13:59:12.933 7fa5f7aa5700  4 mgr[volumes] scanning for idle connections..
2020-02-24 13:59:12.933 7fa5f7aa5700  4 mgr[volumes] cleaning up connection for 'cephfs'
2020-02-24 13:59:12.933 7fa5f7aa5700 20 mgr[volumes] self.fs_id=43, fs_id=43
2020-02-24 13:59:12.933 7fa5f7aa5700  4 mgr[volumes] disconnecting from cephfs 'cephfs'
2020-02-24 13:59:12.933 7fa5f7aa5700  1 -- 172.21.15.16:0/1494282980 --> [v2:172.21.15.61:6832/333244281,v1:172.21.15.61:6833/333244281] -- client_session(request_close seq 11) v3 -- 0x560b4af0dd40 con 0x560b4b415400

From: /ceph/teuthology-archive/vshankar-2020-02-24_12:33:54-fs-wip-vshankar-testing-testing-basic-smithi/4798102/remote/smithi016/log/ceph-mgr.x.log.gz

We don't see 7fa5f7aa5700 again for the rest of the log. It's stuck in disconnecting the handle or an exception (as Venky mentioned just in comment 2).

Later on, the subvolume group command hangs:

2020-02-24 13:59:31.481 7fa6032fc700  1 -- [v2:172.21.15.16:6800/25634,v1:172.21.15.16:6801/25634] <== client.31354 172.21.15.61:0/1231806590 1 ==== command(tid 0: {"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_6955", "target": ["mgr", ""], "size": 20971520}) v1 ==== 150+0+0 (crc 0 0 0) 0x560b4ae290e0 con 0x560b4cd1d800
2020-02-24 13:59:31.481 7fa6032fc700  4 mgr.server _handle_command decoded 5
2020-02-24 13:59:31.481 7fa6032fc700  4 mgr.server _handle_command prefix=fs subvolume create
2020-02-24 13:59:31.482 7fa6032fc700 20 is_capable service=py module=volumes command=fs subvolume create read write addr - on cap allow *
2020-02-24 13:59:31.482 7fa6032fc700 20  allow so far , doing grant allow *
2020-02-24 13:59:31.482 7fa6032fc700 20  allow all
2020-02-24 13:59:31.482 7fa6032fc700 10 mgr.server _allowed_command  client.admin capable
2020-02-24 13:59:31.482 7fa6032fc700  0 log_channel(audit) log [DBG] : from='client.31354 -' entity='client.admin' cmd=[{"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_6955", "target": ["mgr", ""], "size": 20971520}]: dispatch
2020-02-24 13:59:31.482 7fa6032fc700 10 mgr.server _handle_command passing through 5
2020-02-24 13:59:31.482 7fa603afd700 20 mgr Gil Switched to new thread state 0x560b4b16a2c0

This is testing with the backport of https://github.com/ceph/ceph/pull/33413


Related issues 4 (0 open4 closed)

Related to CephFS - Bug #44207: mgr/volumes: deadlock when trying to purge large number of trash entriesResolvedVenky Shankar

Actions
Has duplicate CephFS - Bug #44281: pybind/mgr/volumes: cleanup stale connection hangDuplicateVenky Shankar

Actions
Copied to CephFS - Backport #46388: nautilus: pybind/mgr/volumes: cleanup stale connection hangResolvedVenky ShankarActions
Copied to CephFS - Backport #46389: octopus: pybind/mgr/volumes: cleanup stale connection hangResolvedNathan CutlerActions
Actions

Also available in: Atom PDF