Bug #47655: AWS put-bucket-lifecycle command fails on the latest minor Octopus release - rgw - Ceph

Actions

Copy link

Bug #47655

open

AWS put-bucket-lifecycle command fails on the latest minor Octopus release

Added by Niko Smeds over 3 years ago. Updated over 3 years ago.

Status:

Triaged

Priority:

Normal

Assignee:

Target version:

Ceph - v15.2.5

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

ceph-ansible

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We are rebuilding some servers, and reinstalling Ceph using ceph-ansible also updated a RADOS gateway to Octopus version 15.2.5. The other gateways are running v15.2.4 and v15.2.2.

Since that update, requests to `put-bucket-lifecycle` which are received by the v15.2.5 backend fail:

```
$ cat lifecycle.json {
"Rules": [ {
"Expiration": {
"Days": 7
},
"Prefix": "",
"Status": "Enabled"
}
]
}
$ aws s3api --endpoint-url=https://REDACTED put-bucket-lifecycle --bucket bucketname --lifecycle-configuration file://lifecycle.json

An error occurred (InvalidArgument) when calling the PutBucketLifecycle operation: Unknown
```

In the RGW logs we see

```
root@REDACTED:/var/log/ceph# tail -f ceph-rgw-REDACTED.rgw0.log
2020-09-25T22:26:04.692+0200 7f7e767ec700 1 ====== starting new request req=0x7f7f7469f680 =====
2020-09-25T22:26:04.704+0200 7f7e767ec700 0 RGWLC::RGWPutLC() failed to set entry on lc.6, ret=-22
2020-09-25T22:26:04.708+0200 7f7e767ec700 1 ====== req done req=0x7f7f7469f680 op status=-22 http_status=400 latency=0.016000314s ======
2020-09-25T22:26:04.708+0200 7f7e767ec700 1 beast: 0x7f7f7469f680: 10.206.10.2 - - [2020-09-25T22:26:04.708887+0200] "PUT /bucketname?lifecycle HTTP/1.1" 400 240 - "aws-cli/2.0.38 Python/3.8.5 Darwin/19.2.0 source/x86_64 command/s3api.put-bucket-lifecycle" -
```

The same request succeeds on the older versions.

This is causing issues with some of our Ansible playbooks which use the `s3_lifecycle` module.

I reviewed https://docs.ceph.com/en/latest/releases/octopus/ and found two changes which touch the LC (lifecycle) codebase:

- https://github.com/ceph/ceph/pull/36085
- https://github.com/ceph/ceph/pull/36018

I have minimal C coding experience - could either of these be responsible?

Actions

Copy link

Updated by Matt Benjamin over 3 years ago

Thanks, Niko.

Neither of those commits would seem able to cause this. Will try your lifecycle policy and update.

Matt

Actions

Copy link

Updated by Matt Benjamin over 3 years ago

no, but could you try using an alternate lifecycle document?

e.g., try:

<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Rule>
<ID>delete-7-days</ID>
<Filter>
<Prefix></Prefix>
</Filter>
<Status>Enabled</Status>
<Expiration>
<Days>7</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>

We accept a <Prefix> element at the top level, as well as in a filter. I think the problem might be that you didn't send an <ID>.

Matt6

Actions

Copy link

Updated by Niko Smeds over 3 years ago

Okay - so I do have an update: while both client and server report errors, the lifecycle policies are still being updated.

i.e. if I specify the ID (or change the expiration days) in the JSON file, the change is stored by Ceph even though the command returns an error.

Matt I tried your example XML but might be doing something wrong:

```
$ aws s3api --endpoint-url=https://REDACTED put-bucket-lifecycle --bucket bucketname --lifecycle-configuration file://lifecycle-upstream.xml

Error parsing parameter '--lifecycle-configuration': Expected: '=', received: '<' for input:
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
```

We also updated all three Ceph RADOS gateways to the same latest version:

```
$ radosgw-admin --version
ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octopus (stable)
```

Oddly enough, we now experience the issue with two of the three.

Still uncertain if this is an issue on our side or an issue with Ceph.

Actions

Copy link

Updated by Niko Smeds over 3 years ago

I don't believe the issue is related to format of the policy file. While testing I also ran `delete-bucket-lifecycle` multiple times to remove policies from the test bucket and experienced the same error.

Actions

Copy link

Updated by lei cao over 3 years ago

Maybe you can improve the log level of rgw to 20, then try again, this will be helpful for positioning problem.

Actions

Copy link

Updated by Casey Bodley over 3 years ago

Have you upgraded the osds to match? I would expect these errors to go away when all rgws and osds are running the latest octopus. Sorry for the inconvenience!

Actions

Copy link

Updated by Matt Benjamin over 3 years ago

Status changed from New to Triaged

Actions

Copy link

Updated by Niko Smeds over 3 years ago

Casey Bodley wrote:

Have you upgraded the osds to match? I would expect these errors to go away when all rgws and osds are running the latest octopus. Sorry for the inconvenience!

Sorry for the slow reply - I'm actually blocked by https://github.com/ceph/ceph-ansible/issues/5916 on updating the OSDs.

Right now the OSDs are a mixed bag.

$ sudo ceph tell osd.* version | grep version | awk '{print $2}' | sort | uniq -c 12 "15.2.2", 6 "15.2.5",

After resolving the above issue and updating all OSDs to 15.2.5 I'll try again.

Actions

Copy link

Updated by Niko Smeds over 3 years ago

Casey Bodley wrote:

Have you upgraded the osds to match? I would expect these errors to go away when all rgws and osds are running the latest octopus. Sorry for the inconvenience!

Sorry for the slow progress here - we were blocked by https://github.com/ceph/ceph-ansible/issues/5819 for performing updates.

All OSDs are now running v15.2.8 and I can confirm this issue is resolved :) Thanks everyone.

(How does one close an issue, or is that action restricted to admins here?)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #47655

AWS put-bucket-lifecycle command fails on the latest minor Octopus release

Updated by Matt Benjamin over 3 years ago

Updated by Matt Benjamin over 3 years ago

Updated by Niko Smeds over 3 years ago

Updated by Niko Smeds over 3 years ago

Updated by lei cao over 3 years ago

Updated by Casey Bodley over 3 years ago

Updated by Matt Benjamin over 3 years ago

Updated by Niko Smeds over 3 years ago

Updated by Niko Smeds over 3 years ago