Actions
Bug #17047
closedrgw multisite: doesn't retry RGWFetchAllMetaCR on failed lease
% Done:
0%
Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
When starting multiple gateways in the same non-master zone, they race for the lease during StateBuildingFullSyncMaps. The gateway that loses logs that RGWFetchAllMetaCR fails with r=-16:
2016-08-16 11:10:27.530819 7f3d7d48e700 20 cr:s=0xbc20b40:op=0xb7e3500:17RGWFetchAllMetaCR: operate() 2016-08-16 11:10:27.530823 7f3d7d48e700 5 rgw meta sync: lease cr failed, done early 2016-08-16 11:10:27.530828 7f3d7d48e700 20 cr:s=0xbc20b40:op=0xb7e3500:17RGWFetchAllMetaCR: operate() returned r=-16 2016-08-16 11:10:27.530837 7f3d7d48e700 20 stack->operate() returned ret=-16 2016-08-16 11:10:27.530838 7f3d7d48e700 20 run: stack=0xbc20b40 is done
However, RGWRemoteMetaLog::run_sync() is using 'r = run(new RGWFetchAllMetaCR(...));' to get the error code, and RGWCoroutinesManager::run() is returning success instead. This causes the gateway to advance to StateSync, where RGWMetaSyncCR fails because it can't find the full sync maps.
Actions