Project

General

Profile

Actions

Bug #21772

closed

multisite: multipart uploads fail to sync

Added by Casey Bodley over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Reported on ceph-users. I added a test case to test_multi.py, and it reproduces the issue.


Files

multipart.bilog (7.73 KB) multipart.bilog output of bilog list on source zone Casey Bodley, 10/12/2017 01:58 PM

Related issues 3 (1 open2 closed)

Related to rgw - Bug #21800: multisite: avoid writing multipart parts to the bucket index logPending BackportJane Zhu

Actions
Related to rgw - Bug #21591: RGW multisite does not sync all objectsResolvedYehuda Sadeh09/28/2017

Actions
Copied to rgw - Backport #21816: luminous: multisite: multipart uploads fail to syncResolvedActions
Actions #1

Updated by Casey Bodley over 6 years ago

I've attached the output from `radosgw-admin bilog list` on the source bucket, after a 4-part upload to an object named MULTIPART.

Notable in the output are the entries from the multipart complete operation. Both entries have the same op_tag, but the first is a pending write to 'MULTIPART' and the second is a completed del on the last part object.

    {
        "op_id": "00000000011.11.1",
        "op_tag": "f23e6bbc-1ae8-4e7f-8a6f-5b79071c74c4.4109.422",
        "op": "write",
        "object": "MULTIPART",
        "instance": "",
        "state": "pending",
        "index_ver": 11,
        "timestamp": "0.000000",
        "ver": {
            "pool": -1,
            "epoch": 0
        },
        "bilog_flags": 0,
        "versioned": false,
        "owner": "",
        "owner_display_name": "",
        "zones_trace": [
            "f23e6bbc-1ae8-4e7f-8a6f-5b79071c74c4" 
        ]
    },
    {
        "op_id": "00000000012.12.1",
        "op_tag": "f23e6bbc-1ae8-4e7f-8a6f-5b79071c74c4.4109.422",
        "op": "del",
        "object": "_multipart_MULTIPART.2~9IIANYVJ4zyGiaT9YSl3x3ttxbcmKba.4",
        "instance": "",
        "state": "complete",
        "index_ver": 12,
        "timestamp": "2017-10-12 13:49:59.266862311Z",
        "ver": {
            "pool": 7,
            "epoch": 2
        },
        "bilog_flags": 0,
        "versioned": false,
        "owner": "",
        "owner_display_name": "",
        "zones_trace": []
    },

We don't attempt to sync MULTIPART, because we never see an entry with state=complete.

Actions #3

Updated by Casey Bodley over 6 years ago

This rgw_bucket_complete_op() cls call for the multipart complete also includes the 4 multipart parts in remove_objs, so they can be removed from the index at the same time.

We also call log_index_operation() to add bilog entries for those removes - but we don't increment the header.ver for each, which means that each of those operations writes to the same omap key, overwriting the previous entries.

Some osd log snippets to illustrate:

rgw_bucket_complete_op(): request: op=0 name=MULTIPART instance= ver=7:4 tag=e6094249-4ff0-4912-9ea3-08d0f1b200e3.4109.38
log_index_operation name=MULTIPART key=<80>0_00000000012.12.1 op=0 state=1 tag=e6094249-4ff0-4912-9ea3-08d0f1b200e3.4109.38
rgw_bucket_complete_op(): remove_objs.size()=4

rgw_bucket_complete_op(): removing entries, read_index_entry name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.1 instance=
rgw_bucket_complete_op(): entry.name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.1 entry.instance= entry.meta.category=1
log_index_operation name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.1 key=<80>0_00000000012.12.1 op=1 state=1 tag=e6094249-4ff0-4912-9ea3-08d0f1b200e3.4109.38

rgw_bucket_complete_op(): removing entries, read_index_entry name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.2 instance=
rgw_bucket_complete_op(): entry.name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.2 entry.instance= entry.meta.category=1
log_index_operation name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.2 key=<80>0_00000000012.12.1 op=1 state=1 tag=e6094249-4ff0-4912-9ea3-08d0f1b200e3.4109.38

rgw_bucket_complete_op(): removing entries, read_index_entry name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.3 instance=
rgw_bucket_complete_op(): entry.name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.3 entry.instance= entry.meta.category=1
log_index_operation name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.3 key=<80>0_00000000012.12.1 op=1 state=1 tag=e6094249-4ff0-4912-9ea3-08d0f1b200e3.4109.38

rgw_bucket_complete_op(): removing entries, read_index_entry name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.4 instance=
rgw_bucket_complete_op(): entry.name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.4 entry.instance= entry.meta.category=1
log_index_operation name=_multipart_MULTIPART.2~vtcUTJNUDdfY4O6pLxOUTXjJjoNlI_Y.4 key=<80>0_00000000012.12.1 op=1 state=1 tag=e6094249-4ff0-4912-9ea3-08d0f1b200e3.4109.38
Actions #4

Updated by Casey Bodley over 6 years ago

  • Status changed from 12 to Fix Under Review

Updated https://github.com/ceph/ceph/pull/18271 with a proposed fix.

Though we may want to go a step further, and avoid writing these multipart part entries to the bilog in the first place.

Actions #5

Updated by Casey Bodley over 6 years ago

  • Status changed from Fix Under Review to 7
Actions #6

Updated by Casey Bodley over 6 years ago

  • Related to Bug #21800: multisite: avoid writing multipart parts to the bucket index log added
Actions #7

Updated by Yuri Weinstein over 6 years ago

Casey Bodley wrote:

Updated https://github.com/ceph/ceph/pull/18271 with a proposed fix.

merged

Actions #8

Updated by Casey Bodley over 6 years ago

  • Status changed from 7 to Pending Backport
Actions #9

Updated by Anonymous over 6 years ago

  • Copied to Backport #21816: luminous: multisite: multipart uploads fail to sync added
Actions #10

Updated by Casey Bodley over 6 years ago

  • Related to Bug #21591: RGW multisite does not sync all objects added
Actions #11

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF