Project

General

Profile

Actions

Bug #65545

open

Quiesce may fail randomly with EBADF due to the same root submitted to the MDCache multiple times under the same quiesce request

Added by Leonid Usov 13 days ago. Updated 12 days ago.

Status:
Pending Backport
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
squid
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
quiesce
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Reported by the QE team at https://bugzilla.redhat.com/show_bug.cgi?id=2275459

2024-04-17T07:26:33.666+0000 7fa0a7c0b640 10 quiesce.mgr.44189 <sanitize_roots> Normalized root '/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626' to 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.666+0000 7fa0a840c640 10 quiesce.mds.0 <operator()> submit_request: value:file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626
2024-04-17T07:26:33.667+0000 7fa0a840c640 10 quiesce.agt <agent_thread_main> got request handle < mds.0:3431> for 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.669+0000 7fa0a840c640 10 quiesce.mds.0 <operator()> submit_request: value:file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626
...
2024-04-17T07:26:33.670+0000 7fa0a840c640 10 quiesce.agt <agent_thread_main> got request handle < mds.0:3437> for 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.674+0000 7fa0a7c0b640  5 quiesce.mgr.44189 <leader_upkeep_set> [cg_test1_p00@106,file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626] reported by at least one peer as: QS_FAILED (6)

This problem is due to a race condition that appears when multiple db updates are posted to the agent rapidly.
When new roots begin processing but don't yet make it into the currently tracked set, there is a window for the next update with the same roots to treat them as new.


Related issues 1 (1 open0 closed)

Copied to CephFS - Backport #65570: squid: Quiesce may fail randomly with EBADF due to the same root submitted to the MDCache multiple times under the same quiesce requestFix Under ReviewLeonid UsovActions
Actions #1

Updated by Leonid Usov 13 days ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 56956
Actions #2

Updated by Leonid Usov 13 days ago

  • Description updated (diff)
Actions #3

Updated by Leonid Usov 13 days ago

  • Component(FS) quiesce added
Actions #4

Updated by Leonid Usov 13 days ago

  • Description updated (diff)
Actions #5

Updated by Leonid Usov 13 days ago

  • Backport set to squid
Actions #6

Updated by Leonid Usov 12 days ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Backport Bot 12 days ago

  • Copied to Backport #65570: squid: Quiesce may fail randomly with EBADF due to the same root submitted to the MDCache multiple times under the same quiesce request added
Actions #8

Updated by Backport Bot 12 days ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF