Bug #39547
closedmultisite: metadata consistency on non-master zones
100%
Description
RGW users and buckets are referred to as metadata, and are controlled by the RGWMetadataManager and its various RGWMetadataHandlers. In multisite, we require stronger consistency on metadata than we do for data (ie objects in buckets). The master zone of the master zonegroup is designated as the 'metadata master zone', and all other zones sync metadata changes from that metadata master zone only. This means that, in order to guarantee a consistent view of metadata across all zones/zonesgroups, any metadata mutations (for example, creation of a user or bucket) must first be applied on the metadata master zone.
S3/Swift APIs¶
Most operations that modify buckets, when sent to a non-master zone, are automatically forwarded to the metadata master zone with forward_request_to_master(). If they succeed against the master zone, they are then applied locally before responding to the client.
One gap here is for PUT Bucket Lifecycle (RGWPutLC), with an associated bug report in http://tracker.ceph.com/issues/22648. The interaction between multisite sync and lifecycle processing (both expiration and transitions) needs more thought.
Admin APIs¶
There are several admin apis that can modify user/bucket metadata. Some are documented in http://docs.ceph.com/docs/master/radosgw/adminops/. Most fall under rgw_rest_bucket.*, rgw_rest_metadata.*, and rgw_rest_user.*. All of these ops are applied on the local zone only, but any that write metadata should be made to call forward_request_to_master() first when sent to a non-master zone.
radosgw-admin commands¶
radosgw-admin exposes a lot of commands to create/modify users and buckets, but all of the modifications apply to the local zone. When run on a non-master zone, these commands could potentially issue http requests against the master zone's admin apis. However, our admin apis don't have full coverage of the radosgw-admin commands and options, so this strategy would be difficult and error-prone. These commands should instead fail with an error message saying to run the command on the master zone instead (accepting the --yes-i-really-mean-it flag to override).