Bug #47451
openRGW appends control character to etags in bucket index
0%
Description
We've encountered a semi-rare bug where rgw appends a control character to objects' etag fields in the bucket index. When s3 clients try to list such an object, the control character invalidates the xml response. We are running both nautilus and luminous clusters, and this only seems to happening on nautilus (14.2.5 and 14.2.8).
As far as mitigation goes, `radosgw-admin bucket check --fix` removes the control character in the bucket index entry. However, I don't know how to reproduce this condition, as it seems to effect at most 1 in a million objects.
Example hex of the etag before:
00000090 25 0f 23 00 00 00 37 63 39 65 32 62 39 36 63 63 |%.#...7c9e2b96cc| 000000a0 37 65 63 36 31 35 39 37 38 65 35 66 63 61 39 33 |7ec615978e5fca93| 000000b0 35 31 35 61 33 61 2d 31 0e 07 00 00 00 36 34 34 |515a3a-1.....644|
And after running bucket check:
00000090 25 0f 22 00 00 00 37 63 39 65 32 62 39 36 63 63 |%."...7c9e2b96cc| 000000a0 37 65 63 36 31 35 39 37 38 65 35 66 63 61 39 33 |7ec615978e5fca93| 000000b0 35 31 35 61 33 61 2d 31 07 00 00 00 36 34 34 39 |515a3a-1....6449|
This report about a luminous bug, now closed, seems like it might be related: https://tracker.ceph.com/issues/23188
Updated by André Cruz almost 3 years ago
Was this issue ever fixed?
I have encountered this while trying to upgrade a Luminous cluster to Nautilus. We noticed it when we introduced a Nautilus OSD and RGW. The problem seems to have gone away after we disabled the Nautilus RGW, but kept the Nautilus OSD.
There is also a reference to the same issue in the mailing list: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/OWVCUXPO6U6EWKHBSGW7W5DQ6ANXT6GM/
Updated by Nick Janus almost 3 years ago
Andre, I haven't seen this issue for some time since upgrading. I suspect there was some backwards incompatibility during the upgrade the inserted these control characters. I don't know if a fix has been implemented, I haven't spent much time digging into root cause.
Updated by André Cruz almost 3 years ago
Hey Nick.
Originally you mentioned that you were seeing the issue on Nautilus. Which upgrade ended up fixing the issue?
Thanks.
Updated by Nick Janus almost 3 years ago
We stayed on those versions of Nautilus for 6+ months in various clusters, but our users stopped reporting the issue a couple weeks after the upgrades completed. Given the timing, I'm guessing the control characters were only written during the upgrade.
Updated by Ilsoo Byun over 2 years ago
I had the same issue. I found that getting 'user.rgw.etag' xattr from the rados object returns the exact 32bit-length etag. The control character was appended only when listing a bucket.
Updated by André Cruz over 2 years ago
Ilsoo Byun wrote:
I had the same issue. I found that getting 'user.rgw.etag' xattr from the rados object returns the exact 32bit-length etag. The control character was appended only when listing a bucket.
I am still having this issue when I introduce a Nautilus RGW (beast or civetweb) into a cluster with already Nautilus OSDs, MGRs and MONs (v14.2.16).
The object metadata returned by radosgw does not show anything out of the ordinary, but listing the bucket using goamz client library fails due to the invalid char in the etag. This only happens on the one Nautilus RGW (albeit rarely) and never on Luminous RGWs.
Updated by Casey Bodley over 2 years ago
- Assignee set to Marcus Watts
- Tags set to etag
Updated by André Cruz over 2 years ago
I just want to add that the issue only happened while there were Luminous and Nautulus RGW coexisting on the same cluster. We were upgrading the cluster and were doing it in phases. We ended up switching the RGW version to Nautilus on all of them at once and the issue hasn't happened since.