Project

General

Profile

Actions

Fix #51177

closed

pybind/mgr/volumes: investigate moving calls which may block on libcephfs into another thread

Added by Patrick Donnelly almost 3 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
backport_processed
Backport:
reef,quincy,pacific
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

To not block the ceph-mgr finisher thread on any calls out to cephfs. This can have disastrous consequences as the mgr will then stop functioning.

(I think it may need changes to ceph-mgr to support this. Namely, the core ceph-mgr code handles the return from the command and creates a reply message. We need that reply message to be constructed within the module and sent out.)


Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #59415: reef: pybind/mgr/volumes: investigate moving calls which may block on libcephfs into another threadResolvedKotresh Hiremath RavishankarActions
Copied to CephFS - Backport #59416: quincy: pybind/mgr/volumes: investigate moving calls which may block on libcephfs into another threadResolvedKotresh Hiremath RavishankarActions
Copied to CephFS - Backport #59417: pacific: pybind/mgr/volumes: investigate moving calls which may block on libcephfs into another threadResolvedKotresh Hiremath RavishankarActions
Actions #1

Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v17.0.0)
Actions #3

Updated by Venky Shankar over 1 year ago

  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version set to v18.0.0
  • Backport changed from pacific to pacific,quincy

Kotresh, please take a look at this.

Actions #4

Updated by Venky Shankar over 1 year ago

Spoke to Kotersh today - we may want to introduce an async command execution interface in plugins that the finisher thread would call and "handover" the request to be replied back by the plugin itself. Plugins can choose to implement this execution "mode" or use the default blocking call by the finisher thread.

Actions #5

Updated by Kotresh Hiremath Ravishankar over 1 year ago

  • Pull request ID set to 47893

Discussion Summary with Patrick

1. Have a thread for each module to execute module commands. Since the finisher thread infrastructure is already in place, it's better to use one finisher thread per module.
Currently with the draft PR, there is one finisher thread per module and one generic finisher thread via which all other things like config, notify is done. This is different from the
comment 4. Both has it's pros and cons. With this approach, if any command is stuck in a python module, only that module is affected (the subsequent module commands waits) and other
module commands goes through. This is comparatively easy to implement. The comment 4's approach needs change in every python module and the effect of asynchronous nature is to be tested
as all module commands is asynchronous.

2. Add a warning if the finisher thread's queue is growing or if the it takes more than 15 secs for the single command.

3. Add extensive performance counters for osdmap, mdsmap changes, command processed (op time, command count)

Actions #6

Updated by Kotresh Hiremath Ravishankar over 1 year ago

  • Status changed from New to In Progress
Actions #7

Updated by Venky Shankar over 1 year ago

  • Status changed from In Progress to Fix Under Review
Actions #8

Updated by Venky Shankar about 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from pacific,quincy to reef,quincy,pacific
Actions #9

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59415: reef: pybind/mgr/volumes: investigate moving calls which may block on libcephfs into another thread added
Actions #10

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59416: quincy: pybind/mgr/volumes: investigate moving calls which may block on libcephfs into another thread added
Actions #11

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59417: pacific: pybind/mgr/volumes: investigate moving calls which may block on libcephfs into another thread added
Actions #12

Updated by Backport Bot about 1 year ago

  • Tags set to backport_processed
Actions #13

Updated by Patrick Donnelly 11 months ago

  • Target version changed from v18.0.0 to v19.0.0
Actions #14

Updated by Patrick Donnelly 8 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF