Project

General

Profile

Actions

Bug #65669

open

QuiesceDB responds with a misleading error to a quiesce-await of a terminated set.

Added by Leonid Usov 9 days ago. Updated 4 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, quiesce
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This design decision appears counterintuitive after having seen it in the wild.

Here the --await was sent with a delay, and must have seen the set already timed out. However, the response code suggested that the set was still active and reached timeout after this await command was received:

2024-04-25T20:36:15.770 DEBUG:tasks.quiescer.fs.[cephfs]:Running ceph command: 'tell mds.24479 quiesce db --set-id d960ac51 --await-for 42.0'
...
2024-04-25T20:36:46.709 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't quiesce root with rc: 110 (ETIMEDOUT), stdout:
{
    "epoch": 33,
    "leader": 24479,
    "set_version": 6,
    "sets": {
        "d960ac51": {
            "version": 6,
            "age_ref": 86.5,
            "state": {
                "name": "TIMEDOUT",
                "age": 0.0
            },
            "timeout": 60.0,
            "expiration": 86.2,
            "members": {
                "file:/": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 60.0
                    }
                }
            }
        }
    }
}

It would be less surprising to receive an EPERM in this case, which will indicate a misconception of the calling side about the current state of the set.
Additionally, it will be consistent with how `--release --await` behaves, returning EPERM for a `QS_EXPIRED` set, while a pending release-await that began with QS_RELEASING will report ETIMEDOUT if the set fails to release before it expires.

Actions #1

Updated by Leonid Usov 9 days ago

  • Status changed from New to In Progress
  • Pull request ID set to 57099
Actions #2

Updated by Leonid Usov 9 days ago

  • Status changed from In Progress to Fix Under Review
Actions #3

Updated by Patrick Donnelly 4 days ago

  • Category set to Correctness/Safety
  • Target version set to v20.0.0
  • Source set to Development
Actions #4

Updated by Leonid Usov 4 days ago

  • Component(FS) MDS added
Actions

Also available in: Atom PDF