Project

General

Profile

Actions

Bug #65545

open

Quiesce may fail randomly with EBADF due to the same root submitted to the MDCache multiple times under the same quiesce request

Added by Leonid Usov about 1 month ago. Updated 19 days ago.

Status:
Pending Backport
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
squid
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, quiesce
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Reported by the QE team at https://bugzilla.redhat.com/show_bug.cgi?id=2275459

2024-04-17T07:26:33.666+0000 7fa0a7c0b640 10 quiesce.mgr.44189 <sanitize_roots> Normalized root '/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626' to 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.666+0000 7fa0a840c640 10 quiesce.mds.0 <operator()> submit_request: value:file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626
2024-04-17T07:26:33.667+0000 7fa0a840c640 10 quiesce.agt <agent_thread_main> got request handle < mds.0:3431> for 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.669+0000 7fa0a840c640 10 quiesce.mds.0 <operator()> submit_request: value:file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626
...
2024-04-17T07:26:33.670+0000 7fa0a840c640 10 quiesce.agt <agent_thread_main> got request handle < mds.0:3437> for 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.674+0000 7fa0a7c0b640  5 quiesce.mgr.44189 <leader_upkeep_set> [cg_test1_p00@106,file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626] reported by at least one peer as: QS_FAILED (6)

This problem is due to a race condition that appears when multiple db updates are posted to the agent rapidly.
When new roots begin processing but don't yet make it into the currently tracked set, there is a window for the next update with the same roots to treat them as new.


Related issues 1 (1 open0 closed)

Copied to CephFS - Backport #65570: squid: Quiesce may fail randomly with EBADF due to the same root submitted to the MDCache multiple times under the same quiesce requestFix Under ReviewLeonid UsovActions
Actions

Also available in: Atom PDF