Bug #44276: pybind/mgr/volumes: cleanup stale connection hang - CephFS - Ceph

Actions

Copy link

Bug #44276

closed

pybind/mgr/volumes: cleanup stale connection hang

Added by Patrick Donnelly about 4 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Venky Shankar

Category:

Target version:

Ceph - v16.0.0

% Done:

Source:

Development

Tags:

Backport:

octopus,nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

mgr/volumes

Labels (FS):

Pull request ID:

35496

Crash signature (v1):

Crash signature (v2):

Description

2020-02-24 13:59:12.933 7fa5f7aa5700  4 mgr[volumes] scanning for idle connections..
2020-02-24 13:59:12.933 7fa5f7aa5700  4 mgr[volumes] cleaning up connection for 'cephfs'
2020-02-24 13:59:12.933 7fa5f7aa5700 20 mgr[volumes] self.fs_id=43, fs_id=43
2020-02-24 13:59:12.933 7fa5f7aa5700  4 mgr[volumes] disconnecting from cephfs 'cephfs'
2020-02-24 13:59:12.933 7fa5f7aa5700  1 -- 172.21.15.16:0/1494282980 --> [v2:172.21.15.61:6832/333244281,v1:172.21.15.61:6833/333244281] -- client_session(request_close seq 11) v3 -- 0x560b4af0dd40 con 0x560b4b415400

From: /ceph/teuthology-archive/vshankar-2020-02-24_12:33:54-fs-wip-vshankar-testing-testing-basic-smithi/4798102/remote/smithi016/log/ceph-mgr.x.log.gz

We don't see 7fa5f7aa5700 again for the rest of the log. It's stuck in disconnecting the handle or an exception (as Venky mentioned just in comment 2).

Later on, the subvolume group command hangs:

2020-02-24 13:59:31.481 7fa6032fc700  1 -- [v2:172.21.15.16:6800/25634,v1:172.21.15.16:6801/25634] <== client.31354 172.21.15.61:0/1231806590 1 ==== command(tid 0: {"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_6955", "target": ["mgr", ""], "size": 20971520}) v1 ==== 150+0+0 (crc 0 0 0) 0x560b4ae290e0 con 0x560b4cd1d800
2020-02-24 13:59:31.481 7fa6032fc700  4 mgr.server _handle_command decoded 5
2020-02-24 13:59:31.481 7fa6032fc700  4 mgr.server _handle_command prefix=fs subvolume create
2020-02-24 13:59:31.482 7fa6032fc700 20 is_capable service=py module=volumes command=fs subvolume create read write addr - on cap allow *
2020-02-24 13:59:31.482 7fa6032fc700 20  allow so far , doing grant allow *
2020-02-24 13:59:31.482 7fa6032fc700 20  allow all
2020-02-24 13:59:31.482 7fa6032fc700 10 mgr.server _allowed_command  client.admin capable
2020-02-24 13:59:31.482 7fa6032fc700  0 log_channel(audit) log [DBG] : from='client.31354 -' entity='client.admin' cmd=[{"prefix": "fs subvolume create", "vol_name": "cephfs", "sub_name": "subvolume_6955", "target": ["mgr", ""], "size": 20971520}]: dispatch
2020-02-24 13:59:31.482 7fa6032fc700 10 mgr.server _handle_command passing through 5
2020-02-24 13:59:31.482 7fa603afd700 20 mgr Gil Switched to new thread state 0x560b4b16a2c0

This is testing with the backport of https://github.com/ceph/ceph/pull/33413

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #44276

pybind/mgr/volumes: cleanup stale connection hang

Updated by Patrick Donnelly about 4 years ago

Updated by Venky Shankar about 4 years ago

Updated by Patrick Donnelly about 4 years ago

Updated by Patrick Donnelly about 4 years ago

Updated by Patrick Donnelly about 4 years ago

Updated by Patrick Donnelly about 4 years ago

Updated by Venky Shankar about 4 years ago

Updated by Venky Shankar about 4 years ago

Updated by Venky Shankar about 4 years ago

Updated by Patrick Donnelly almost 4 years ago

Updated by Patrick Donnelly almost 4 years ago

Updated by Nathan Cutler almost 4 years ago

Updated by Nathan Cutler almost 4 years ago

Updated by Nathan Cutler over 3 years ago