bucket lifecycle breaks down when master-zone changed or period gets updated
If multisite metadata master moves to another zone (or period gets updated), lifecycle policy completely stops working. No objects will expire anymore in entire cluster and
radosgw-admin lc list returns empty list. Workaround is an update bucket lifecycle policy again for each buckets. Even though it is already present and we can get it.
Found in version 12.2.12
How to reproduce:
1. Deploy 2 ceph clusters, set up 2 multisite zones, e.g. master zone A and secondary B
2. Create bucket on zone A
3. Put bucket lifecycle
radosgw-admin lc list on metadata master zone A
5. See bucket in the list # Lifecycle working
6. Change metadata master zone from A to B
7. Change metadata master zone back from B to A
8. Wait a few days and see empty list # Lifecycle not working
9. Put exactly same bucket lifecycle policy again
10. See bucket in the list # Lifecycle working
#3 Updated by Casey Bodley over 1 year ago
- Status changed from New to Triaged
- Tags set to lifecycle multisite
Looking at RGWLC::set_bucket_config(), it first calls set_bucket_instance_attrs() to store the lifecycle policy (RGW_ATTR_LC) in the bucket instance metadata, and then calls cls_rgw_lc_set_entry() to add this bucket to the lifecycle processing queue.
In multisite, metadata sync will only replicate the changes to the bucket instance metadata. We need an exra step in metadata sync that updates the lifecycle processing queue accordingly.
We have a RGWMetadataHandlerPut_BucketInstance that processes writes to bucket instance metadata (whether via set_bucket_instance_attrs() or metadata sync). We should be able to add some logic there that detects when RGW_ATTR_LC is added or removed, and update the lifecycle processing queue accordingly.