Project

General

Profile

Bug #17040

Files incorrectly uploaded with swift api

Added by Evgeny Gridasov over 7 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

I am trying to upload a file with size=55042267 to an S3 bucket backed by EC pool using SWIFT api. I have approx 5TB used and around 250TB available, ceph/rgw version = 0.94.7. This particular file size is not the issue, I've seen it happen to other files of similar size (30-500MB).

The file upload goes through without any issues but I cannot download the file I just uploaded.
I am getting the following error messages in the logs:

ERROR: got unexpected error when trying to read object: -2

I've done a bit of digging and it turns out that the last stripe of the file is more than the max stripe size of rgw. How can it possibly be like that?
Also it is a bit suspicious that the head size is a bit smaller than 512kB (I saw it exactly 512kB on a different installation). As this is a production installation, what I've done so far is wrote a script to recover damaged files. The script downloads the metadata, parses it, then downloads the last stripe, splits it into to parts: [max_strip_size] and [remainder] and re-uploads them using rados CLI tool. File download works after that and md5 sum match.

I tried to set up an extra rgw and downgraded it to 0.94.5 but it did not help. I also tried to stand up another Ceph 0.94.7 cluster in a separate environment with EC pool or replicated pool for .rgw.buckets, but could not reproduce the issue there.

Interestingly, it does not happen for all files:

$ ls -l 456 457
-rw-r--r-- 1 eugene eugene 55042267 Aug 10 16:00 456
-rw-r--r-- 1 eugene eugene 55054267 Aug 11 17:16 457

$ swift -A http://ceph-rgw-1/auth -U evgeny:swift -K *** upload evgeny-test 456
456
$ swift -A http://ceph-rgw-1/auth -U evgeny:swift -K *** upload evgeny-test 457
457

$ s3cmd get s3://evgeny-test/456
s3://evgeny-test/456 -> ./456  [1 of 1]
  4702208 of 55042267     8% in    0s    18.46 MB/s  # GETS STUCK FOREVER and I have to CTRL+C

$ s3cmd get s3://evgeny-test/457
s3://evgeny-test/457 -> ./457  [1 of 1]
 55054267 of 55054267   100% in    2s    21.87 MB/s  done  # ALL GOOD

Failed object radosgw-admin object stat output:

{
    "name": "456",
    "size": 55042267,
    "policy": {
         ... irrelevant ...
    },
    "etag": "ec3a295eeae0197f43a46885cd3fdcf4\u0000",
    "tag": "default.547935.122010\u0000",
    "manifest": {
        "objs": [],
        "obj_size": 55042267,
        "explicit_objs": "false",
        "head_obj": {
            "bucket": {
                "name": "evgeny-test",
                "pool": ".rgw.buckets",
                "data_extra_pool": ".rgw.buckets.extra",
                "index_pool": ".rgw.buckets.index",
                "marker": "default.474762.1",
                "bucket_id": "default.474762.1" 
            },
            "key": "",
            "ns": "",
            "object": "456",
            "instance": "" 
        },
        "head_size": 510880,
        "max_head_size": 510880,
        "prefix": ".3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_",
        "tail_bucket": {
            "name": "evgeny-test",
            "pool": ".rgw.buckets",
            "data_extra_pool": ".rgw.buckets.extra",
            "index_pool": ".rgw.buckets.index",
            "marker": "default.474762.1",
            "bucket_id": "default.474762.1" 
        },
        "rules": [
            {
                "key": 0,
                "val": {
                    "start_part_num": 0,
                    "start_ofs": 510880,
                    "part_size": 0,
                    "stripe_max_size": 4194304,
                    "override_prefix": "" 
                }
            }
        ]
    },
    "attrs": {
        "user.rgw.x-amz-meta-mtime": "1470808854.547306\u0000" 
    }
}

Rados CLI tool stats (look at stripe #13):

# head stat
$ rados -p .rgw.buckets stat default.474762.1_456
.rgw.buckets/default.474762.1_456 mtime 2016-08-16 18:20:11.000000, size 510880

# stripes stat
$ i=1; while rados -p .rgw.buckets stat default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_$i; do i=$[i+1]; done;
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_1 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_2 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_3 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_4 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_5 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_6 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_7 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_8 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_9 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_10 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_11 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_12 mtime 2016-08-16 18:20:11.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_13 mtime 2016-08-16 18:20:11.000000, size 4199739
 error stat-ing .rgw.buckets/default.474762.1__shadow_.3fPWQ7jFRNpK8do-EvT8L4ZgDTkdIrx_14: (2) No such file or directory

Just out of interest, the 457 file that works fine has the following stats:

{
  ... skipped irrelevant data...

    "etag": "b70585007ca1a95ddf89fe95550ccd85\u0000",
    "tag": "default.547935.122023\u0000",
    "manifest": {
        "objs": [],
        "obj_size": 55054267,
        "explicit_objs": "false",
        "head_obj": {
            "bucket": {
                "name": "evgeny-test",
                "pool": ".rgw.buckets",
                "data_extra_pool": ".rgw.buckets.extra",
                "index_pool": ".rgw.buckets.index",
                "marker": "default.474762.1",
                "bucket_id": "default.474762.1" 
            },
            "key": "",
            "ns": "",
            "object": "457",
            "instance": "" 
        },
        "head_size": 510880,
        "max_head_size": 510880,
        "prefix": ".ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_",
        "tail_bucket": {
            "name": "evgeny-test",
            "pool": ".rgw.buckets",
            "data_extra_pool": ".rgw.buckets.extra",
            "index_pool": ".rgw.buckets.index",
            "marker": "default.474762.1",
            "bucket_id": "default.474762.1" 
        },
        "rules": [
            {
                "key": 0,
                "val": {
                    "start_part_num": 0,
                    "start_ofs": 510880,
                    "part_size": 0,
                    "stripe_max_size": 4194304,
                    "override_prefix": "" 
                }
            }
        ]
    },
    "attrs": {
        "user.rgw.x-amz-meta-mtime": "1470899785.795187\u0000" 
    }
}
# head
$ rados -p .rgw.buckets stat default.474762.1_457
.rgw.buckets/default.474762.1_457 mtime 2016-08-16 18:20:36.000000, size 510880
# all good here.
$ i=1; while rados -p .rgw.buckets stat default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_$i; do i=$[i+1]; done;
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_1 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_2 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_3 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_4 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_5 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_6 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_7 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_8 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_9 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_10 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_11 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_12 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_13 mtime 2016-08-16 18:20:36.000000, size 4194304
.rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_14 mtime 2016-08-16 18:20:36.000000, size 17435
 error stat-ing .rgw.buckets/default.474762.1__shadow_.ahiUY0BCABF0MB0vjcFbIi4YbZlSbqJ_15: (2) No such file or directory

Ceph configuration:

[global]
fsid = *****48
mon_initial_members = ******52
mon_host = x.x.x.235
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public_network = x.x.x.0/24
max_open_files = 131072
osd_pool_default_size = 3
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 128
osd_pool_default_pgp_num = 128
rgw_gc_max_objs = 512
rgw_gc_obj_min_wait = 1200
rgw_gc_processor_max_time = 600
rgw_gc_processor_period = 600
rgw_override_bucket_index_max_shards = 8
osd_pool_erasure_code_stripe_width = 16384
mon_osd_min_down_reporters = 2
mon_osd_min_down_reports = 4

[osd]
osd_journal_size = 10240

[client.radosgw.gw1]
host = ******55
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw_print_continue = false
rgw_thread_pool_size = 300
rgw_frontends = "civetweb port=80" 
rgw_dns_name = "ceph-rgw-1" 
debug_rgw = 0/5
rgw_enable_ops_log = true
rgw_enable_usage_log = true

[client.radosgw.gw2]
host = ******56
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw_print_continue = false
rgw_thread_pool_size = 300
rgw_frontends = "civetweb port=80" 
rgw_dns_name = "ceph-rgw-2" 
debug_rgw = 0/5
rgw_enable_ops_log = true
rgw_enable_usage_log = true

Thanks for looking,
Evgeny.

History

#1 Updated by Evgeny Gridasov over 7 years ago

Apologies for the formatting: EC pool is 5,2 in production case and 2,1 in testing case.

#2 Updated by Evgeny Gridasov over 7 years ago

I am now able to reproduce it on a brand new installation of Ceph v.0.94.7, EC 5,2 and exactly same settings as prod (except for reduced journal size). I have 4VMs which sort of replicate our production environment with 90 OSDs:

VM1 = RGW
VM2 = MON + 30x OSD
VM3 = MON + 30x OSD
VM4 = MON + 30x OSD

I had to adjust the crush map to make it look like I have 9 physical hosts (i.e. changed from 30 osds per host to 10 osds per host and created additional fake hosts). I had to do that to get erasure coding 5,2 working. I tried both isa EC library (which is in prod) and jerasure, but it makes no difference. The behaviour is exactly as described above, the file seems to upload ok but I cannot download it.

#3 Updated by Evgeny Gridasov over 7 years ago

I have found out what is causing behaviour described above:

osd_pool_erasure_code_stripe_width = 16384

When I set it to default 4096, I am no longer able to reproduce the issue.
Cluster size does not seem to matter. I've deployed a smaller 18x OSD cluster with modified crush map, and still able to reproduce it with EC 5,2 coding and EC stripe set to 16384 and not able to reproduce with EC stripe set to 4096.

I am not really sure if this bug is RGW related anymore.

#4 Updated by Orit Wasserman over 7 years ago

can you read the objects using swift API?

#5 Updated by Orit Wasserman over 7 years ago

I think it is related to http://tracker.ceph.com/issues/15886
hammer fix (will be in 0.94.8): https://github.com/ceph/ceph/pull/9240

#6 Updated by Evgeny Gridasov over 7 years ago

Orit, No I can't. I get the same error with Swift API.

#7 Updated by Orit Wasserman over 7 years ago

  • Status changed from New to Duplicate

duplicate of #15745

#8 Updated by Vikhyat Umrao over 7 years ago

Fixed in 0.94.8.

$ git tag --contains 4eea92b4a50e06d850c39009d122f8bbbd48c857
v0.94.8

Also available in: Atom PDF