Project

General

Profile

Bug #39033

multisite: data sync loops back to the start of the datalog after reaching the end

Added by Casey Bodley 4 months ago. Updated 4 months ago.

Status:
Pending Backport
Priority:
High
Assignee:
Target version:
-
Start date:
03/29/2019
Due date:
% Done:

0%

Source:
Tags:
multisite
Backport:
luminous mimic nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

RGWDataSyncShardCR was changed in https://github.com/ceph/ceph/pull/23596 to no longer call RGWReadRemoteDataLogShardInfoCR for the remote's max datalog marker, and instead relies on the truncated flag from RGWReadRemoteDataLogShardCR to detect the end. However, when RGWReadRemoteDataLogShardCR reaches the end of the listing, it gets back an empty string for the 'next marker'. This causes RGWDataSyncShardCR to restart listing from the beginning of the remote datalog, retrying bucket sync on each entry.


Related issues

Copied to rgw - Backport #39073: luminous: multisite: data sync loops back to the start of the datalog after reaching the end Resolved
Copied to rgw - Backport #39074: mimic: multisite: data sync loops back to the start of the datalog after reaching the end In Progress
Copied to rgw - Backport #39075: nautilus: multisite: data sync loops back to the start of the datalog after reaching the end Resolved

History

#1 Updated by Casey Bodley 4 months ago

this leads to the following behavior after multisite tests:

$ grep 'shard_id=0 sync_marker=' run/c1/out/radosgw.8000.log
...
2019-03-29 10:44:39.220 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870679.138859_149.1 truncated=0
2019-03-29 10:44:40.254 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870679.138859_149.1
2019-03-29 10:44:40.258 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker= truncated=0
2019-03-29 10:44:40.431 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=
2019-03-29 10:44:40.445 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870679.138859_149.1 truncated=0
2019-03-29 10:44:42.663 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870679.138859_149.1
2019-03-29 10:44:42.669 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870681.641617_152.1 truncated=0
2019-03-29 10:44:52.075 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870681.641617_152.1
2019-03-29 10:44:52.091 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3 truncated=0
2019-03-29 10:45:00.447 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3
2019-03-29 10:45:00.452 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker= truncated=0
2019-03-29 10:45:02.671 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=
2019-03-29 10:45:02.690 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3 truncated=0
2019-03-29 10:45:12.091 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3
2019-03-29 10:45:12.096 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker= truncated=0
2019-03-29 10:45:20.452 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=
2019-03-29 10:45:20.468 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3 truncated=0
2019-03-29 10:45:22.690 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553870690.980602_156.3
2019-03-29 10:45:22.694 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker= truncated=0
2019-03-29 10:45:32.096 7fb4e960c700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=
...

where each time we see "shard_id=0 sync_marker=", it started back at 1_1553870446.435778_121.1 and replayed sync from there.

with the fix applied, we see that the sync_marker always has a value and it increases steadily:

2019-03-29 11:03:22.177 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871781.588926_91.4
2019-03-29 11:03:22.182 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871781.588926_91.4 next_marker= truncated=0
2019-03-29 11:03:23.937 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871781.588926_91.4
2019-03-29 11:03:23.946 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871781.588926_91.4 next_marker=1_1553871803.598457_92.2 truncated=0
2019-03-29 11:03:27.128 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871803.598457_92.2
2019-03-29 11:03:27.133 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871803.598457_92.2 next_marker= truncated=0
2019-03-29 11:03:30.246 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871803.598457_92.2
2019-03-29 11:03:30.252 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871803.598457_92.2 next_marker=1_1553871809.143526_93.1 truncated=0
2019-03-29 11:03:33.451 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871809.143526_93.1
2019-03-29 11:03:33.457 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871809.143526_93.1 next_marker=1_1553871812.437323_94.1 truncated=0
2019-03-29 11:03:42.064 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871812.437323_94.1
2019-03-29 11:03:42.070 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871812.437323_94.1 next_marker=1_1553871820.995030_98.1 truncated=0
2019-03-29 11:03:47.133 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871820.995030_98.1
2019-03-29 11:03:47.138 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871820.995030_98.1 next_marker=1_1553871825.605422_104.2 truncated=0
2019-03-29 11:03:50.253 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871825.605422_104.2
2019-03-29 11:03:50.261 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871825.605422_104.2 next_marker= truncated=0
2019-03-29 11:03:51.882 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871825.605422_104.2
2019-03-29 11:03:51.886 7f03b769e700 20 RGW-SYNC:data:sync:shard[0]: shard_id=0 sync_marker=1_1553871825.605422_104.2 next_marker=1_1553871830.734200_107.1 truncated=0

#2 Updated by Casey Bodley 4 months ago

  • Status changed from In Progress to Need Review
  • Pull request ID set to 27276

#3 Updated by Casey Bodley 4 months ago

  • Status changed from Need Review to Pending Backport

#4 Updated by Casey Bodley 4 months ago

  • Copied to Backport #39073: luminous: multisite: data sync loops back to the start of the datalog after reaching the end added

#5 Updated by Casey Bodley 4 months ago

  • Copied to Backport #39074: mimic: multisite: data sync loops back to the start of the datalog after reaching the end added

#6 Updated by Casey Bodley 4 months ago

  • Copied to Backport #39075: nautilus: multisite: data sync loops back to the start of the datalog after reaching the end added

Also available in: Atom PDF