Project

General

Profile

Actions

Bug #17047

closed

rgw multisite: doesn't retry RGWFetchAllMetaCR on failed lease

Added by Casey Bodley over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When starting multiple gateways in the same non-master zone, they race for the lease during StateBuildingFullSyncMaps. The gateway that loses logs that RGWFetchAllMetaCR fails with r=-16:

2016-08-16 11:10:27.530819 7f3d7d48e700 20 cr:s=0xbc20b40:op=0xb7e3500:17RGWFetchAllMetaCR: operate()
2016-08-16 11:10:27.530823 7f3d7d48e700  5 rgw meta sync: lease cr failed, done early
2016-08-16 11:10:27.530828 7f3d7d48e700 20 cr:s=0xbc20b40:op=0xb7e3500:17RGWFetchAllMetaCR: operate() returned r=-16
2016-08-16 11:10:27.530837 7f3d7d48e700 20 stack->operate() returned ret=-16
2016-08-16 11:10:27.530838 7f3d7d48e700 20 run: stack=0xbc20b40 is done

However, RGWRemoteMetaLog::run_sync() is using 'r = run(new RGWFetchAllMetaCR(...));' to get the error code, and RGWCoroutinesManager::run() is returning success instead. This causes the gateway to advance to StateSync, where RGWMetaSyncCR fails because it can't find the full sync maps.


Related issues 1 (0 open1 closed)

Copied to rgw - Backport #17162: jewel: rgw multisite: doesn't retry RGWFetchAllMetaCR on failed leaseResolvedLoïc DacharyActions
Actions #1

Updated by Casey Bodley over 7 years ago

  • Status changed from New to Fix Under Review
Actions #2

Updated by Yehuda Sadeh over 7 years ago

  • Assignee changed from Casey Bodley to Yehuda Sadeh
Actions #3

Updated by Yehuda Sadeh over 7 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #17162: jewel: rgw multisite: doesn't retry RGWFetchAllMetaCR on failed lease added
Actions #5

Updated by Nathan Cutler about 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF