Bug #47655
openAWS put-bucket-lifecycle command fails on the latest minor Octopus release
0%
Description
We are rebuilding some servers, and reinstalling Ceph using ceph-ansible also updated a RADOS gateway to Octopus version 15.2.5. The other gateways are running v15.2.4 and v15.2.2.
Since that update, requests to `put-bucket-lifecycle` which are received by the v15.2.5 backend fail:
```
$ cat lifecycle.json
{
"Rules": [
{
"Expiration": {
"Days": 7
},
"Prefix": "",
"Status": "Enabled"
}
]
}
$ aws s3api --endpoint-url=https://REDACTED put-bucket-lifecycle --bucket bucketname --lifecycle-configuration file://lifecycle.json
An error occurred (InvalidArgument) when calling the PutBucketLifecycle operation: Unknown
```
In the RGW logs we see
```
root@REDACTED:/var/log/ceph# tail -f ceph-rgw-REDACTED.rgw0.log
2020-09-25T22:26:04.692+0200 7f7e767ec700 1 ====== starting new request req=0x7f7f7469f680 =====
2020-09-25T22:26:04.704+0200 7f7e767ec700 0 RGWLC::RGWPutLC() failed to set entry on lc.6, ret=-22
2020-09-25T22:26:04.708+0200 7f7e767ec700 1 ====== req done req=0x7f7f7469f680 op status=-22 http_status=400 latency=0.016000314s ======
2020-09-25T22:26:04.708+0200 7f7e767ec700 1 beast: 0x7f7f7469f680: 10.206.10.2 - - [2020-09-25T22:26:04.708887+0200] "PUT /bucketname?lifecycle HTTP/1.1" 400 240 - "aws-cli/2.0.38 Python/3.8.5 Darwin/19.2.0 source/x86_64 command/s3api.put-bucket-lifecycle" -
```
The same request succeeds on the older versions.
This is causing issues with some of our Ansible playbooks which use the `s3_lifecycle` module.
I reviewed https://docs.ceph.com/en/latest/releases/octopus/ and found two changes which touch the LC (lifecycle) codebase:
- https://github.com/ceph/ceph/pull/36085
- https://github.com/ceph/ceph/pull/36018
I have minimal C coding experience - could either of these be responsible?
Updated by Matt Benjamin over 3 years ago
Thanks, Niko.
Neither of those commits would seem able to cause this. Will try your lifecycle policy and update.
Matt
Updated by Matt Benjamin over 3 years ago
no, but could you try using an alternate lifecycle document?
e.g., try:
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Rule>
<ID>delete-7-days</ID>
<Filter>
<Prefix></Prefix>
</Filter>
<Status>Enabled</Status>
<Expiration>
<Days>7</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
We accept a <Prefix> element at the top level, as well as in a filter. I think the problem might be that you didn't send an <ID>.
Matt6
Updated by Niko Smeds over 3 years ago
Okay - so I do have an update: while both client and server report errors, the lifecycle policies are still being updated.
i.e. if I specify the ID (or change the expiration days) in the JSON file, the change is stored by Ceph even though the command returns an error.
Matt I tried your example XML but might be doing something wrong:
```
$ aws s3api --endpoint-url=https://REDACTED put-bucket-lifecycle --bucket bucketname --lifecycle-configuration file://lifecycle-upstream.xml
Error parsing parameter '--lifecycle-configuration': Expected: '=', received: '<' for input:
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
```
We also updated all three Ceph RADOS gateways to the same latest version:
```
$ radosgw-admin --version
ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octopus (stable)
```
Oddly enough, we now experience the issue with two of the three.
Still uncertain if this is an issue on our side or an issue with Ceph.
Updated by Niko Smeds over 3 years ago
I don't believe the issue is related to format of the policy file. While testing I also ran `delete-bucket-lifecycle` multiple times to remove policies from the test bucket and experienced the same error.
Updated by lei cao over 3 years ago
Maybe you can improve the log level of rgw to 20, then try again, this will be helpful for positioning problem.
Updated by Casey Bodley over 3 years ago
Have you upgraded the osds to match? I would expect these errors to go away when all rgws and osds are running the latest octopus. Sorry for the inconvenience!
Updated by Niko Smeds over 3 years ago
Casey Bodley wrote:
Have you upgraded the osds to match? I would expect these errors to go away when all rgws and osds are running the latest octopus. Sorry for the inconvenience!
Sorry for the slow reply - I'm actually blocked by https://github.com/ceph/ceph-ansible/issues/5916 on updating the OSDs.
Right now the OSDs are a mixed bag.
$ sudo ceph tell osd.* version | grep version | awk '{print $2}' | sort | uniq -c
12 "15.2.2",
6 "15.2.5",
After resolving the above issue and updating all OSDs to 15.2.5 I'll try again.
Updated by Niko Smeds over 3 years ago
Casey Bodley wrote:
Have you upgraded the osds to match? I would expect these errors to go away when all rgws and osds are running the latest octopus. Sorry for the inconvenience!
Sorry for the slow progress here - we were blocked by https://github.com/ceph/ceph-ansible/issues/5819 for performing updates.
All OSDs are now running v15.2.8 and I can confirm this issue is resolved :) Thanks everyone.
(How does one close an issue, or is that action restricted to admins here?)