Project

General

Profile

Actions

Bug #39033

closed

multisite: data sync loops back to the start of the datalog after reaching the end

Added by Casey Bodley about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
multisite
Backport:
luminous mimic nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

RGWDataSyncShardCR was changed in https://github.com/ceph/ceph/pull/23596 to no longer call RGWReadRemoteDataLogShardInfoCR for the remote's max datalog marker, and instead relies on the truncated flag from RGWReadRemoteDataLogShardCR to detect the end. However, when RGWReadRemoteDataLogShardCR reaches the end of the listing, it gets back an empty string for the 'next marker'. This causes RGWDataSyncShardCR to restart listing from the beginning of the remote datalog, retrying bucket sync on each entry.


Related issues 3 (0 open3 closed)

Copied to rgw - Backport #39073: luminous: multisite: data sync loops back to the start of the datalog after reaching the endResolvedCasey BodleyActions
Copied to rgw - Backport #39074: mimic: multisite: data sync loops back to the start of the datalog after reaching the endResolvedNathan CutlerActions
Copied to rgw - Backport #39075: nautilus: multisite: data sync loops back to the start of the datalog after reaching the endResolvedNathan CutlerActions
Actions #1

Updated by Casey Bodley about 5 years ago

this leads to the following behavior after multisite tests:

$ grep 'shard_id=0 sync_marker=' run/c1/out/radosgw.8000.log
...
2019-03-29 10:44:39.220 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870679.138859_149.1 truncated=0
2019-03-29 10:44:40.254 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870679.138859_149.1
2019-03-29 10:44:40.258 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker= truncated=0
2019-03-29 10:44:40.431 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=
2019-03-29 10:44:40.445 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870679.138859_149.1 truncated=0
2019-03-29 10:44:42.663 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870679.138859_149.1
2019-03-29 10:44:42.669 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870681.641617_152.1 truncated=0
2019-03-29 10:44:52.075 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870681.641617_152.1
2019-03-29 10:44:52.091 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3 truncated=0
2019-03-29 10:45:00.447 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3
2019-03-29 10:45:00.452 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker= truncated=0
2019-03-29 10:45:02.671 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=
2019-03-29 10:45:02.690 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3 truncated=0
2019-03-29 10:45:12.091 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3
2019-03-29 10:45:12.096 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker= truncated=0
2019-03-29 10:45:20.452 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=
2019-03-29 10:45:20.468 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3 truncated=0
2019-03-29 10:45:22.690 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3
2019-03-29 10:45:22.694 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker= truncated=0
2019-03-29 10:45:32.096 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=
...

where each time we see "shard_id=0 sync_marker=", it started back at 1_1553870446.435778_121.1 and replayed sync from there.

with the fix applied, we see that the sync_marker always has a value and it increases steadily:

2019-03-29 11:03:22.177 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871781.588926_91.4
2019-03-29 11:03:22.182 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871781.588926_91.4 next_marker= truncated=0
2019-03-29 11:03:23.937 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871781.588926_91.4
2019-03-29 11:03:23.946 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871781.588926_91.4 next_marker=1_1553871803.598457_92.2 truncated=0
2019-03-29 11:03:27.128 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871803.598457_92.2
2019-03-29 11:03:27.133 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871803.598457_92.2 next_marker= truncated=0
2019-03-29 11:03:30.246 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871803.598457_92.2
2019-03-29 11:03:30.252 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871803.598457_92.2 next_marker=1_1553871809.143526_93.1 truncated=0
2019-03-29 11:03:33.451 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871809.143526_93.1
2019-03-29 11:03:33.457 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871809.143526_93.1 next_marker=1_1553871812.437323_94.1 truncated=0
2019-03-29 11:03:42.064 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871812.437323_94.1
2019-03-29 11:03:42.070 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871812.437323_94.1 next_marker=1_1553871820.995030_98.1 truncated=0
2019-03-29 11:03:47.133 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871820.995030_98.1
2019-03-29 11:03:47.138 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871820.995030_98.1 next_marker=1_1553871825.605422_104.2 truncated=0
2019-03-29 11:03:50.253 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871825.605422_104.2
2019-03-29 11:03:50.261 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871825.605422_104.2 next_marker= truncated=0
2019-03-29 11:03:51.882 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871825.605422_104.2
2019-03-29 11:03:51.886 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871825.605422_104.2 next_marker=1_1553871830.734200_107.1 truncated=0

Actions #2

Updated by Casey Bodley about 5 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 27276
Actions #3

Updated by Casey Bodley about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Casey Bodley about 5 years ago

  • Copied to Backport #39073: luminous: multisite: data sync loops back to the start of the datalog after reaching the end added
Actions #5

Updated by Casey Bodley about 5 years ago

  • Copied to Backport #39074: mimic: multisite: data sync loops back to the start of the datalog after reaching the end added
Actions #6

Updated by Casey Bodley about 5 years ago

  • Copied to Backport #39075: nautilus: multisite: data sync loops back to the start of the datalog after reaching the end added
Actions #7

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF