Project

General

Profile

Actions

Bug #17568

closed

multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errors

Added by Casey Bodley over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

RGWInitSyncStatusCoroutine locks the mdlog.sync-status object, queries log positions from the remote, writes shard marker objects, then writes the mdlog.sync-status object. The cls lock operation will actually create mdlog.sync-status, so there's a window where RGWReadSyncStatusCoroutine can read an empty buffer instead of getting ENOENT (before RGWInitSyncStatusCoroutine starts) or valid data (after RGWInitSyncStatusCoroutine completes). When RGWReadSyncStatusCoroutine sees a successful read with an empty buffer, RGWSimpleRadosReadCR tries to decode that into rgw_meta_sync_info and fails with EIO.


Related issues 4 (1 open3 closed)

Related to rgw - Bug #17569: multisite: assertion in RGWRados::wakeup_data_sync_shardsResolvedCasey Bodley10/13/2016

Actions
Related to rgw - Bug #17570: rgw: segfault on shutdown after failure to start meta_sync_processor_threadIn ProgressCasey Bodley10/13/2016

Actions
Related to rgw - Bug #17571: multisite: coroutine deadlock assertion on error in FetchAllMetaCRResolvedCasey Bodley10/13/2016

Actions
Copied to rgw - Backport #17710: jewel: multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errorsResolvedLoïc DacharyActions
Actions #1

Updated by Casey Bodley over 7 years ago

  • Related to Bug #17569: multisite: assertion in RGWRados::wakeup_data_sync_shards added
Actions #2

Updated by Casey Bodley over 7 years ago

  • Related to Bug #17570: rgw: segfault on shutdown after failure to start meta_sync_processor_thread added
Actions #3

Updated by Casey Bodley over 7 years ago

  • Related to Bug #17571: multisite: coroutine deadlock assertion on error in FetchAllMetaCR added
Actions #4

Updated by Casey Bodley over 7 years ago

  • Status changed from New to Fix Under Review
  • Backport set to jewel
Actions #5

Updated by Casey Bodley over 7 years ago

  • Priority changed from Normal to High
Actions #6

Updated by Yehuda Sadeh over 7 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Loïc Dachary over 7 years ago

  • Copied to Backport #17710: jewel: multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errors added
Actions #8

Updated by Nathan Cutler about 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF