Bug #17040
Files incorrectly uploaded with swift api
0%
Description
Hello,
I am trying to upload a file with size=55042267 to an S3 bucket backed by EC pool using SWIFT api. I have approx 5TB used and around 250TB available, ceph/rgw version = 0.94.7. This particular file size is not the issue, I've seen it happen to other files of similar size (30-500MB).
The file upload goes through without any issues but I cannot download the file I just uploaded.
I am getting the following error messages in the logs:
ERROR: got unexpected error when trying to read object: -2
I've done a bit of digging and it turns out that the last stripe of the file is more than the max stripe size of rgw. How can it possibly be like that?
Also it is a bit suspicious that the head size is a bit smaller than 512kB (I saw it exactly 512kB on a different installation). As this is a production installation, what I've done so far is wrote a script to recover damaged files. The script downloads the metadata, parses it, then downloads the last stripe, splits it into to parts: [max_strip_size] and [remainder] and re-uploads them using rados CLI tool. File download works after that and md5 sum match.
I tried to set up an extra rgw and downgraded it to 0.94.5 but it did not help. I also tried to stand up another Ceph 0.94.7 cluster in a separate environment with EC pool or replicated pool for .rgw.buckets, but could not reproduce the issue there.
Interestingly, it does not happen for all files:
$ ls -l 456 457 -rw-r--r-- 1 eugene eugene 55042267 Aug 10 16:00 456 -rw-r--r-- 1 eugene eugene 55054267 Aug 11 17:16 457 $ swift -A http://ceph-rgw-1/auth -U evgeny:swift -K *** upload evgeny-test 456 456 $ swift -A http://ceph-rgw-1/auth -U evgeny:swift -K *** upload evgeny-test 457 457 $ s3cmd get s3://evgeny-test/456 s3://evgeny-test/456 -> ./456 [1 of 1] 4702208 of 55042267 8% in 0s 18.46 MB/s # GETS STUCK FOREVER and I have to CTRL+C $ s3cmd get s3://evgeny-test/457 s3://evgeny-test/457 -> ./457 [1 of 1] 55054267 of 55054267 100% in 2s 21.87 MB/s done # ALL GOOD
Failed object radosgw-admin object stat output:
{ "name": "456", "size": 55042267, "policy": { ... irrelevant ... }, "etag": "ec3a295eeae0197f43a46885cd3fdcf4\u0000", "tag": "default.547935.122010\u0000", "manifest": { "objs": [], "obj_size": 55042267, "explicit_objs": "false", "head_obj": { "bucket": { "name": "evgeny-test", "pool": ".rgw.buckets", "data_extra_pool": ".rgw.buckets.extra", "index_pool": ".rgw.buckets.index", "marker": "default.474762.1", "bucket_id": "default.474762.1" }, "key": "", "ns": "", "object": "456", "instance": "" }, "head_size": 510880, "max_head_size": 510880, "prefix": ".3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_", "tail_bucket": { "name": "evgeny-test", "pool": ".rgw.buckets", "data_extra_pool": ".rgw.buckets.extra", "index_pool": ".rgw.buckets.index", "marker": "default.474762.1", "bucket_id": "default.474762.1" }, "rules": [ { "key": 0, "val": { "start_part_num": 0, "start_ofs": 510880, "part_size": 0, "stripe_max_size": 4194304, "override_prefix": "" } } ] }, "attrs": { "user.rgw.x-amz-meta-mtime": "1470808854.547306\u0000" } }
Rados CLI tool stats (look at stripe #13):
# head stat $ rados -p .rgw.buckets stat default.474762.1_456 .rgw.buckets/default.474762.1_456 mtime 2016-08-16 18:20:11.000000, size 510880 # stripes stat $ i=1; while rados -p .rgw.buckets stat default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_$i; do i=$[i+1]; done; .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_1 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_2 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_3 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_4 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_5 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_6 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_7 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_8 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_9 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_10 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_11 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_12 mtime 2016-08-16 18:20:11.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_13 mtime 2016-08-16 18:20:11.000000, size 4199739 error stat-ing .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_14: (2) No such file or directory
Just out of interest, the 457 file that works fine has the following stats:
{ ... skipped irrelevant data... "etag": "b70585007ca1a95ddf89fe95550ccd85\u0000", "tag": "default.547935.122023\u0000", "manifest": { "objs": [], "obj_size": 55054267, "explicit_objs": "false", "head_obj": { "bucket": { "name": "evgeny-test", "pool": ".rgw.buckets", "data_extra_pool": ".rgw.buckets.extra", "index_pool": ".rgw.buckets.index", "marker": "default.474762.1", "bucket_id": "default.474762.1" }, "key": "", "ns": "", "object": "457", "instance": "" }, "head_size": 510880, "max_head_size": 510880, "prefix": ".ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_", "tail_bucket": { "name": "evgeny-test", "pool": ".rgw.buckets", "data_extra_pool": ".rgw.buckets.extra", "index_pool": ".rgw.buckets.index", "marker": "default.474762.1", "bucket_id": "default.474762.1" }, "rules": [ { "key": 0, "val": { "start_part_num": 0, "start_ofs": 510880, "part_size": 0, "stripe_max_size": 4194304, "override_prefix": "" } } ] }, "attrs": { "user.rgw.x-amz-meta-mtime": "1470899785.795187\u0000" } } # head $ rados -p .rgw.buckets stat default.474762.1_457 .rgw.buckets/default.474762.1_457 mtime 2016-08-16 18:20:36.000000, size 510880 # all good here. $ i=1; while rados -p .rgw.buckets stat default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_$i; do i=$[i+1]; done; .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_1 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_2 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_3 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_4 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_5 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_6 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_7 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_8 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_9 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_10 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_11 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_12 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_13 mtime 2016-08-16 18:20:36.000000, size 4194304 .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_14 mtime 2016-08-16 18:20:36.000000, size 17435 error stat-ing .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_15: (2) No such file or directory
Ceph configuration:
[global] fsid = *****48 mon_initial_members = ******52 mon_host = x.x.x.235 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true public_network = x.x.x.0/24 max_open_files = 131072 osd_pool_default_size = 3 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 rgw_gc_max_objs = 512 rgw_gc_obj_min_wait = 1200 rgw_gc_processor_max_time = 600 rgw_gc_processor_period = 600 rgw_override_bucket_index_max_shards = 8 osd_pool_erasure_code_stripe_width = 16384 mon_osd_min_down_reporters = 2 mon_osd_min_down_reports = 4 [osd] osd_journal_size = 10240 [client.radosgw.gw1] host = ******55 keyring = /etc/ceph/ceph.client.radosgw.keyring rgw_print_continue = false rgw_thread_pool_size = 300 rgw_frontends = "civetweb port=80" rgw_dns_name = "ceph-rgw-1" debug_rgw = 0/5 rgw_enable_ops_log = true rgw_enable_usage_log = true [client.radosgw.gw2] host = ******56 keyring = /etc/ceph/ceph.client.radosgw.keyring rgw_print_continue = false rgw_thread_pool_size = 300 rgw_frontends = "civetweb port=80" rgw_dns_name = "ceph-rgw-2" debug_rgw = 0/5 rgw_enable_ops_log = true rgw_enable_usage_log = true
Thanks for looking,
Evgeny.
History
#1 Updated by Evgeny Gridasov over 7 years ago
Apologies for the formatting: EC pool is 5,2 in production case and 2,1 in testing case.
#2 Updated by Evgeny Gridasov over 7 years ago
I am now able to reproduce it on a brand new installation of Ceph v.0.94.7, EC 5,2 and exactly same settings as prod (except for reduced journal size). I have 4VMs which sort of replicate our production environment with 90 OSDs:
VM1 = RGW
VM2 = MON + 30x OSD
VM3 = MON + 30x OSD
VM4 = MON + 30x OSD
I had to adjust the crush map to make it look like I have 9 physical hosts (i.e. changed from 30 osds per host to 10 osds per host and created additional fake hosts). I had to do that to get erasure coding 5,2 working. I tried both isa EC library (which is in prod) and jerasure, but it makes no difference. The behaviour is exactly as described above, the file seems to upload ok but I cannot download it.
#3 Updated by Evgeny Gridasov over 7 years ago
I have found out what is causing behaviour described above:
osd_pool_erasure_code_stripe_width = 16384
When I set it to default 4096, I am no longer able to reproduce the issue.
Cluster size does not seem to matter. I've deployed a smaller 18x OSD cluster with modified crush map, and still able to reproduce it with EC 5,2 coding and EC stripe set to 16384 and not able to reproduce with EC stripe set to 4096.
I am not really sure if this bug is RGW related anymore.
#4 Updated by Orit Wasserman over 7 years ago
can you read the objects using swift API?
#5 Updated by Orit Wasserman over 7 years ago
I think it is related to http://tracker.ceph.com/issues/15886
hammer fix (will be in 0.94.8): https://github.com/ceph/ceph/pull/9240
#6 Updated by Evgeny Gridasov over 7 years ago
Orit, No I can't. I get the same error with Swift API.
#7 Updated by Orit Wasserman over 7 years ago
- Status changed from New to Duplicate
duplicate of #15745
#8 Updated by Vikhyat Umrao over 7 years ago
Fixed in 0.94.8.
$ git tag --contains 4eea92b4a50e06d850c39009d122f8bbbd48c857
v0.94.8