Bug #50415
closeddirectories with names starting with a non-ascii character disappear after reshard
0%
Description
If a directory name contains only traditional Chinese characters, after reshard the directory and its content is not seen in listing. The objects are still accessible with "get". If the directory contains at least one ascii character at the beginning it does not disappear after reshard.
Steps to reporoduce.
Deploy a vstart.sh cluster with rgw, create s3cmd config and make a test bucket:
adonis:~/ceph/ceph.ci/build% RGW=1 ../src/vstart.sh -n ... adonis:~/ceph/ceph.ci/build% cat >s3config <<EOF [default] host_base = localhost:8000 access_key = 0555b35654ad1656d804 secret_key = h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q== bucket_location = us-east-1 check_ssl_certificate = True check_ssl_hostname = True default_mime_type = binary/octet-stream delete_removed = False dry_run = False enable_multipart = True encoding = UTF-8 encrypt = False follow_symlinks = False force = False guess_mime_type = True host_bucket = anything.with.three.dots multipart_chunk_size_mb = 15 multipart_max_chunks = 10000 recursive = False recv_chunk = 65536 send_chunk = 65536 signature_v2 = False socket_timeout = 300 use_https = False use_mime_magic = True verbosity = WARNING EOF adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config mb s3://s3cmd-demo
Put files in a directory with ascii chars:
adonis:~/ceph/ceph.ci/build% mkdir /tmp/test adonis:~/ceph/ceph.ci/build% echo 1 > /tmp/test/1 adonis:~/ceph/ceph.ci/build% echo 2 > /tmp/test/2 adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/test/ upload: '/tmp/test/1' -> 's3://s3cmd-demo/test/1' [1 of 2] 2 of 2 100% in 1s 1.03 B/s done upload: '/tmp/test/2' -> 's3://s3cmd-demo/test/2' [2 of 2] 2 of 2 100% in 0s 43.98 B/s done
Put the files in a directory with chinese chars:
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/衛照存放區/ upload: '/tmp/test/1' -> 's3://s3cmd-demo/衛照存放區/1' [1 of 2] 2 of 2 100% in 0s 38.68 B/s done upload: '/tmp/test/2' -> 's3://s3cmd-demo/衛照存放區/2' [2 of 2] 2 of 2 100% in 0s 47.38 B/s done
Check that you can see the directory and its content in the listing:
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/ DIR s3://s3cmd-demo/test/ DIR s3://s3cmd-demo/衛照存放區/ adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/test/ 2021-04-19 03:21 2 s3://s3cmd-demo/test/1 2021-04-19 03:21 2 s3://s3cmd-demo/test/2 adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/衛照存放區/ 2021-04-19 03:21 2 s3://s3cmd-demo/衛照存放區/1 2021-04-19 03:21 2 s3://s3cmd-demo/衛照存放區/2
Reshard the bucket:
adonis:~/ceph/ceph.ci/build% radosgw-admin bucket reshard --bucket s3cmd-demo --num-shards 100 tenant: bucket name: s3cmd-demo old bucket instance id: 2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4411.1 new bucket instance id: 2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4441.1 total entries: 2 2021-04-19T04:23:58.428+0100 7fcacf75fb80 1 execute INFO: reshard of bucket "s3cmd-demo" from "s3cmd-demo:2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4411.1" to "s3cmd-demo:2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4441.1" completed successfully
Observe that the directory with Chinese characters only is not seen in the listing any more:
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/ DIR s3://s3cmd-demo/test/ adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/test/ 2021-04-19 03:21 2 s3://s3cmd-demo/test/1 2021-04-19 03:21 2 s3://s3cmd-demo/test/2 adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/衛照存放區/ adonis:~/ceph/ceph.ci/build%
You still can access the objects with "get":
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config get s3://s3cmd-demo/衛照存放區/1 download: 's3://s3cmd-demo/衛照存放區/1' -> './1' [1 of 1] 2 of 2 100% in 0s 60.56 B/s done
Updated by Mykola Golub about 3 years ago
If the directory stars with ascii char, it is not affected:
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/衛照存放區/ upload: '/tmp/test/1' -> 's3://s3cmd-demo/衛照存放區/1' [1 of 2] 2 of 2 100% in 0s 27.20 B/s done upload: '/tmp/test/2' -> 's3://s3cmd-demo/衛照存放區/2' [2 of 2] 2 of 2 100% in 0s 33.90 B/s done adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/1衛照存放區/ upload: '/tmp/test/1' -> 's3://s3cmd-demo/1衛照存放區/1' [1 of 2] 2 of 2 100% in 0s 38.06 B/s done upload: '/tmp/test/2' -> 's3://s3cmd-demo/1衛照存放區/2' [2 of 2] 2 of 2 100% in 0s 41.88 B/s done adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/a衛照存放區/ upload: '/tmp/test/1' -> 's3://s3cmd-demo/a衛照存放區/1' [1 of 2] 2 of 2 100% in 0s 37.87 B/s done upload: '/tmp/test/2' -> 's3://s3cmd-demo/a衛照存放區/2' [2 of 2] 2 of 2 100% in 0s 41.51 B/s done adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/衛照存放區b/ upload: '/tmp/test/1' -> 's3://s3cmd-demo/衛照存放區b/1' [1 of 2] 2 of 2 100% in 0s 38.94 B/s done upload: '/tmp/test/2' -> 's3://s3cmd-demo/衛照存放區b/2' [2 of 2] 2 of 2 100% in 0s 43.17 B/s done adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/ DIR s3://s3cmd-demo/1衛照存放區/ DIR s3://s3cmd-demo/a衛照存放區/ DIR s3://s3cmd-demo/test/ DIR s3://s3cmd-demo/衛照存放區/ DIR s3://s3cmd-demo/衛照存放區b/ adonis:~/ceph/ceph.ci/build% radosgw-admin bucket reshard --bucket s3cmd-demo --num-shards 110 tenant: bucket name: s3cmd-demo old bucket instance id: 2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4441.1 new bucket instance id: 2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4455.1 total entries: 6 2021-04-19T04:35:22.872+0100 7f0e5065ab80 1 execute INFO: reshard of bucket "s3cmd-demo" from "s3cmd-demo:2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4441.1" to "s3cmd-demo:2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4455.1" completed successfully adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/ DIR s3://s3cmd-demo/1衛照存放區/ DIR s3://s3cmd-demo/a衛照存放區/ DIR s3://s3cmd-demo/test/
Updated by lei cao almost 3 years ago
Its because bi list can't return obj entry which directory name contains only traditional Chinese characters, but bucket list can. Its should be a osd bug!
Updated by Mykola Golub almost 3 years ago
I think the issue is with cls_rgw.cc:list_plain_entries function. It assumes that all plain entry names less than "bi prefix" which starts with BI_PREFIX_CHAR (0x80). And this is not the case for plain entries that start with a non-ascii character: they are located above the "bi prefixed" region.
It looks like to make list_plain_entries work, it should call cls_cxx_map_get_vals twice: the first time for the ascii region as it does currently, and then for the non-ascii region, with start_after_key = string(BI_PREFIX_CHAR) + "9999_".
Updated by Mykola Golub almost 3 years ago
- Subject changed from directories with names in traditional Chinese characters only disappear after reshard to directories with names starting with a non-ascii character disappear after reshard
- Status changed from New to In Progress
- Assignee set to Mykola Golub
Updated by Mykola Golub almost 3 years ago
- Backport set to pacific,octopus,nautilus
- Pull request ID set to 40975
Updated by Mykola Golub almost 3 years ago
- Status changed from In Progress to Fix Under Review
Updated by J. Eric Ivancich almost 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot almost 3 years ago
- Copied to Backport #51142: octopus: directories with names starting with a non-ascii character disappear after reshard added
Updated by Backport Bot almost 3 years ago
- Copied to Backport #51143: pacific: directories with names starting with a non-ascii character disappear after reshard added
Updated by Backport Bot almost 3 years ago
- Copied to Backport #51144: nautilus: directories with names starting with a non-ascii character disappear after reshard added
Updated by Loïc Dachary almost 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".