Project

General

Profile

Bug #50415

directories with names starting with a non-ascii character disappear after reshard

Added by Mykola Golub 2 months ago. Updated 12 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If a directory name contains only traditional Chinese characters, after reshard the directory and its content is not seen in listing. The objects are still accessible with "get". If the directory contains at least one ascii character at the beginning it does not disappear after reshard.

Steps to reporoduce.

Deploy a vstart.sh cluster with rgw, create s3cmd config and make a test bucket:

adonis:~/ceph/ceph.ci/build% RGW=1 ../src/vstart.sh -n
...
adonis:~/ceph/ceph.ci/build% cat >s3config <<EOF
[default]
host_base = localhost:8000
access_key = 0555b35654ad1656d804
secret_key = h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q==
bucket_location = us-east-1
check_ssl_certificate = True
check_ssl_hostname = True
default_mime_type = binary/octet-stream
delete_removed = False
dry_run = False
enable_multipart = True
encoding = UTF-8
encrypt = False
follow_symlinks = False
force = False
guess_mime_type = True
host_bucket = anything.with.three.dots
multipart_chunk_size_mb = 15
multipart_max_chunks = 10000
recursive = False
recv_chunk = 65536
send_chunk = 65536
signature_v2 = False
socket_timeout = 300
use_https = False
use_mime_magic = True
verbosity = WARNING
EOF

adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config mb s3://s3cmd-demo

Put files in a directory with ascii chars:

adonis:~/ceph/ceph.ci/build% mkdir /tmp/test
adonis:~/ceph/ceph.ci/build% echo 1 > /tmp/test/1
adonis:~/ceph/ceph.ci/build% echo 2 > /tmp/test/2
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/test/      
upload: '/tmp/test/1' -> 's3://s3cmd-demo/test/1'  [1 of 2]
 2 of 2   100% in    1s     1.03 B/s  done
upload: '/tmp/test/2' -> 's3://s3cmd-demo/test/2'  [2 of 2]
 2 of 2   100% in    0s    43.98 B/s  done

Put the files in a directory with chinese chars:

adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/衛照存放區/
upload: '/tmp/test/1' -> 's3://s3cmd-demo/衛照存放區/1'  [1 of 2]
 2 of 2   100% in    0s    38.68 B/s  done
upload: '/tmp/test/2' -> 's3://s3cmd-demo/衛照存放區/2'  [2 of 2]
 2 of 2   100% in    0s    47.38 B/s  done

Check that you can see the directory and its content in the listing:

adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/         
                       DIR   s3://s3cmd-demo/test/
                       DIR   s3://s3cmd-demo/衛照存放區/
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/test/
2021-04-19 03:21         2   s3://s3cmd-demo/test/1
2021-04-19 03:21         2   s3://s3cmd-demo/test/2
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/衛照存放區/                                                                                                 
2021-04-19 03:21         2   s3://s3cmd-demo/衛照存放區/1
2021-04-19 03:21         2   s3://s3cmd-demo/衛照存放區/2

Reshard the bucket:

adonis:~/ceph/ceph.ci/build% radosgw-admin  bucket reshard --bucket s3cmd-demo --num-shards 100
tenant: 
bucket name: s3cmd-demo
old bucket instance id: 2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4411.1
new bucket instance id: 2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4441.1
total entries: 2
2021-04-19T04:23:58.428+0100 7fcacf75fb80  1 execute INFO: reshard of bucket "s3cmd-demo" from "s3cmd-demo:2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4411.1" to "s3cmd-demo:2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4441.1" completed successfully

Observe that the directory with Chinese characters only is not seen in the listing any more:

adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/                                                                                                            
                       DIR   s3://s3cmd-demo/test/
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/test/                                                                                                       
2021-04-19 03:21         2   s3://s3cmd-demo/test/1
2021-04-19 03:21         2   s3://s3cmd-demo/test/2
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/衛照存放區/           
adonis:~/ceph/ceph.ci/build% 

You still can access the objects with "get":

adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config get s3://s3cmd-demo/衛照存放區/1        
download: 's3://s3cmd-demo/衛照存放區/1' -> './1'  [1 of 1]
 2 of 2   100% in    0s    60.56 B/s  done


Related issues

Copied to rgw - Backport #51142: octopus: directories with names starting with a non-ascii character disappear after reshard In Progress
Copied to rgw - Backport #51143: pacific: directories with names starting with a non-ascii character disappear after reshard In Progress
Copied to rgw - Backport #51144: nautilus: directories with names starting with a non-ascii character disappear after reshard Resolved

History

#1 Updated by Mykola Golub 2 months ago

If the directory stars with ascii char, it is not affected:

adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/衛照存放區/                                                                        
upload: '/tmp/test/1' -> 's3://s3cmd-demo/衛照存放區/1'  [1 of 2]
 2 of 2   100% in    0s    27.20 B/s  done
upload: '/tmp/test/2' -> 's3://s3cmd-demo/衛照存放區/2'  [2 of 2]
 2 of 2   100% in    0s    33.90 B/s  done
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/1衛照存放區/
upload: '/tmp/test/1' -> 's3://s3cmd-demo/1衛照存放區/1'  [1 of 2]
 2 of 2   100% in    0s    38.06 B/s  done
upload: '/tmp/test/2' -> 's3://s3cmd-demo/1衛照存放區/2'  [2 of 2]
 2 of 2   100% in    0s    41.88 B/s  done
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/a衛照存放區/
upload: '/tmp/test/1' -> 's3://s3cmd-demo/a衛照存放區/1'  [1 of 2]
 2 of 2   100% in    0s    37.87 B/s  done
upload: '/tmp/test/2' -> 's3://s3cmd-demo/a衛照存放區/2'  [2 of 2]
 2 of 2   100% in    0s    41.51 B/s  done
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config put --recursive /tmp/test/ s3://s3cmd-demo/衛照存放區b/
upload: '/tmp/test/1' -> 's3://s3cmd-demo/衛照存放區b/1'  [1 of 2]
 2 of 2   100% in    0s    38.94 B/s  done
upload: '/tmp/test/2' -> 's3://s3cmd-demo/衛照存放區b/2'  [2 of 2]
 2 of 2   100% in    0s    43.17 B/s  done
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/                                                                                                            
                       DIR   s3://s3cmd-demo/1衛照存放區/
                       DIR   s3://s3cmd-demo/a衛照存放區/
                       DIR   s3://s3cmd-demo/test/
                       DIR   s3://s3cmd-demo/衛照存放區/
                       DIR   s3://s3cmd-demo/衛照存放區b/
adonis:~/ceph/ceph.ci/build% radosgw-admin  bucket reshard --bucket s3cmd-demo --num-shards 110             
tenant: 
bucket name: s3cmd-demo
old bucket instance id: 2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4441.1
new bucket instance id: 2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4455.1
total entries: 6
2021-04-19T04:35:22.872+0100 7f0e5065ab80  1 execute INFO: reshard of bucket "s3cmd-demo" from "s3cmd-demo:2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4441.1" to "s3cmd-demo:2d508e23-f38c-47ed-bd68-7ebb21d7d7c5.4455.1" completed successfully
adonis:~/ceph/ceph.ci/build% s3cmd --config=s3config ls s3://s3cmd-demo/                       
                       DIR   s3://s3cmd-demo/1衛照存放區/
                       DIR   s3://s3cmd-demo/a衛照存放區/
                       DIR   s3://s3cmd-demo/test/

#2 Updated by lei cao 2 months ago

Its because bi list can't return obj entry which directory name contains only traditional Chinese characters, but bucket list can. Its should be a osd bug!

#3 Updated by Mykola Golub 2 months ago

I think the issue is with cls_rgw.cc:list_plain_entries function. It assumes that all plain entry names less than "bi prefix" which starts with BI_PREFIX_CHAR (0x80). And this is not the case for plain entries that start with a non-ascii character: they are located above the "bi prefixed" region.

It looks like to make list_plain_entries work, it should call cls_cxx_map_get_vals twice: the first time for the ascii region as it does currently, and then for the non-ascii region, with start_after_key = string(BI_PREFIX_CHAR) + "9999_".

#4 Updated by Mykola Golub about 2 months ago

  • Subject changed from directories with names in traditional Chinese characters only disappear after reshard to directories with names starting with a non-ascii character disappear after reshard
  • Status changed from New to In Progress
  • Assignee set to Mykola Golub

#5 Updated by Mykola Golub about 2 months ago

  • Backport set to pacific,octopus,nautilus
  • Pull request ID set to 40975

#6 Updated by Mykola Golub about 2 months ago

  • Status changed from In Progress to Fix Under Review

#7 Updated by J. Eric Ivancich 12 days ago

  • Status changed from Fix Under Review to Pending Backport

#8 Updated by Backport Bot 12 days ago

  • Copied to Backport #51142: octopus: directories with names starting with a non-ascii character disappear after reshard added

#9 Updated by Backport Bot 12 days ago

  • Copied to Backport #51143: pacific: directories with names starting with a non-ascii character disappear after reshard added

#10 Updated by Backport Bot 12 days ago

  • Copied to Backport #51144: nautilus: directories with names starting with a non-ascii character disappear after reshard added

Also available in: Atom PDF