Project

General

Profile

Bug #17568

multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errors

Added by Casey Bodley 5 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
10/13/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

RGWInitSyncStatusCoroutine locks the mdlog.sync-status object, queries log positions from the remote, writes shard marker objects, then writes the mdlog.sync-status object. The cls lock operation will actually create mdlog.sync-status, so there's a window where RGWReadSyncStatusCoroutine can read an empty buffer instead of getting ENOENT (before RGWInitSyncStatusCoroutine starts) or valid data (after RGWInitSyncStatusCoroutine completes). When RGWReadSyncStatusCoroutine sees a successful read with an empty buffer, RGWSimpleRadosReadCR tries to decode that into rgw_meta_sync_info and fails with EIO.


Related issues

Related to Bug #17569: multisite: assertion in RGWRados::wakeup_data_sync_shards Pending Backport 10/13/2016
Related to Bug #17570: multisite: segfault on shutdown after failure to start meta_sync_processor_thread New 10/13/2016
Related to Bug #17571: multisite: coroutine deadlock assertion on error in FetchAllMetaCR Resolved 10/13/2016
Copied to Backport #17710: jewel: multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errors Resolved

History

#1 Updated by Casey Bodley 5 months ago

  • Related to Bug #17569: multisite: assertion in RGWRados::wakeup_data_sync_shards added

#2 Updated by Casey Bodley 5 months ago

  • Related to Bug #17570: multisite: segfault on shutdown after failure to start meta_sync_processor_thread added

#3 Updated by Casey Bodley 5 months ago

  • Related to Bug #17571: multisite: coroutine deadlock assertion on error in FetchAllMetaCR added

#4 Updated by Casey Bodley 5 months ago

  • Status changed from New to Need Review
  • Backport set to jewel

#5 Updated by Casey Bodley 5 months ago

  • Priority changed from Normal to High

#6 Updated by Yehuda Sadeh 5 months ago

  • Status changed from Need Review to Pending Backport

#7 Updated by Loic Dachary 5 months ago

  • Copied to Backport #17710: jewel: multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errors added

#8 Updated by Nathan Cutler about 2 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF