Project

General

Profile

Actions

Bug #20896

open

export_diff relies on clone_overlap, which is lost when cache tier is enabled

Added by Xuehan Xu over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Recently, we find that, under some circumstance, in the cache tier, the "HEAD" object's clone_overlap can lose some OPs' modification range if the most recent clone object is not in cache tier.

Say, there is an object A in the cache tier, and if its most recent clone object is not in cache tier, after doing some writes to A, we can find that the "ceph-objectstore-tool dump" show that the "HEAD" object's clone_overlap didn't record those writes' modification range.

We think that the reason for this should be that, in ReplicatedPG::make_writeable method, when determining whether to record the modification range, there is a condition: "is_present_clone". According to the comment "we need to check whether the most recent clone exists, if it's been evicted, it's not included in the stats", I think this condition should be used to determine whether to do "add_interval_usage", not whether to record the modification range.

Is this right?

Actions #1

Updated by Xuehan Xu over 6 years ago

I submitted a pr for this: https://github.com/ceph/ceph/pull/16790

Actions #2

Updated by Xuehan Xu over 6 years ago

The reason we are submitting the PR is that, when we do export-diff to an rbd image in a pool with a cache tier pool, if an object of that rbd image is in the cache iter without its most recent clone object, the "export-diff" would return a "diff" lacking some writes' result, which led to a wrong backup result. How should we fix this problem? I'm sorry I failed to describe this problem comprehensively.

Actions #3

Updated by Greg Farnum over 6 years ago

  • Project changed from Ceph to rbd
  • Subject changed from clone_overlap loss when cache tier is enabled to export_diff relies on clone_overlap, which is lost when cache tier is enabled

the reason we are submitting the PR is that, when we do export-diff to an rbd image in a pool with a cache tier pool, if an object of that rbd image is in the cache iter without its most recent clone object, the "export-diff" would return a "diff" lacking some writes' result, which led to a wrong backup result. How should we fix this problem?

Actions #4

Updated by Greg Farnum over 6 years ago

from irc:

<joshd>:

I'd suggest making rbd diff conservative when it's used with cache pools (if necessary, reporting every existing object as fully modified) so it doesn't produce incorrect results
...
I'm not sure if that pr is sufficient
it wouldn't affect existing incorrect clone_overlaps at least

<gregsfortytwo1>

I think we could also probably wire up the rbd tool to issue cache tier flushes and read off the base tier instead

Actions #5

Updated by Xuehan Xu over 6 years ago

Hi, grep:-)

I finally got what you mean in https://github.com/ceph/ceph/pull/16790..

I agree with you in that " clone overlap is supposed to be tracking
which data is the same on disk".

My thought is that, "ObjectContext::new_snapset.clones" is already an
indicator about whether there are clone objects on disk, so, in the
scenario of "cache tier", although a clone oid does not corresponds to
a "present clone" in cache tier, as long as
"ObjectContext::new_snapset.clones" is not empty, there must a one
such clone object in the base tier. And, as long as
"ObjectContext::new_snapset.clones" has a strict "one-to-one"
correspondence to "ObjectContext::new_snapset.clone_overlap", passing
the condition check "if (ctx->new_snapset.clones.size() > 0)" is
enough to make the judgement that the clone object exists.

So, if I'm right, passing the condition check "if
(ctx->new_snapset.clones.size() > 0)" is already enough for us to do
"newest_overlap.subtract(ctx->modified_ranges)", it doesn't have to
pass "is_present_clone".

Am I right about this? Or am I missing anything?

Please help us, thank you:-)

Actions #6

Updated by Xuehan Xu over 6 years ago

I mean I think it's the condition check "is_present_clone" that
prevent the clone overlap to record the client write operations
modified range when the target "HEAD" object exists without its most
recent clone object, and if I'm right, just move the clone overlap
modification out of the "is_present_clone" condition check block is
enough to solve this case, just like the PR
"https://github.com/ceph/ceph/pull/16790", and this fix wouldn't cause
other problems.

In our test, this fix solved the problem successfully, however, we
can't confirm it won't cause new problems yet.

So if anyone see this and knows the answer, please help us. Thank you:-)

Actions #7

Updated by Xuehan Xu over 6 years ago

I did another test: I did some writes to an object "rbd_data.1ebc6238e1f29.0000000000000000" to raise its "HEAD" object to the cache tier, after which I specifically write to its offset 0x40 with 4 bytes random data. Then I used "ceph-objectstore-tool" to dump its "HEAD" version in the base tier, the result is as follows(before I raise it to cache tier there is three snaps the latest of which is 26):

{
"id": {
"oid": "rbd_data.1ebc6238e1f29.0000000000000000",
"key": "",
"snapid": -2,
"hash": 1655893237,
"max": 0,
"pool": 3,
"namespace": "",
"max": 0
},
"info": {
"oid": {
"oid": "rbd_data.1ebc6238e1f29.0000000000000000",
"key": "",
"snapid": -2,
"hash": 1655893237,
"max": 0,
"pool": 3,
"namespace": ""
},
"version": "4219'16423",
"prior_version": "3978'16310",
"last_reqid": "osd.70.4213:2359",
"user_version": 17205,
"size": 4194304,
"mtime": "2017-08-03 22:07:34.656122",
"local_mtime": "2017-08-05 23:02:33.628734",
"lost": 0,
"flags": 52,
"snaps": [],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": 2822203961,
"omap_digest": 4294967295,
"watchers": {}
},
"stat": {
"size": 4194304,
"blksize": 4096,
"blocks": 8200,
"nlink": 1
},
"SnapSet": {
"snap_context": {
"seq": 26,
"snaps": [
26,
25,
16
]
},
"head_exists": 1,
"clones": [ {
"snap": 16,
"size": 4194304,
"overlap": "[4~4194300]"
}, {
"snap": 25,
"size": 4194304,
"overlap": "[]"
}, {
"snap": 26,
"size": 4194304,
"overlap": "[]"
}
]
}
}

As we can see, its clone_overlap for snap 26 is empty, which means the writes' "modified range" is neither recorded in the cache tier nor in the base tier.

I think maybe we really should move the clone overlap modification out of the IF block which has the condition check "is_present_clone". As for now, I can't see any other way to fix this problem.

Am I right about this?

Actions #8

Updated by Xuehan Xu over 6 years ago

I did another test: I did some writes to an object "rbd_data.1ebc6238e1f29.0000000000000000" to raise its "HEAD" object to the cache tier, after which I specifically write to its offset 0x40 with 4 bytes random data. Then I used "ceph-objectstore-tool" to dump its "HEAD" version in the base tier, the result is as follows(before I raise it to cache tier there is three snaps the latest of which is 26):

{
    "id": {
        "oid": "rbd_data.1ebc6238e1f29.0000000000000000",
        "key": "",
        "snapid": -2,
        "hash": 1655893237,
        "max": 0,
        "pool": 3,
        "namespace": "",
        "max": 0
    },
    "info": {
        "oid": {
            "oid": "rbd_data.1ebc6238e1f29.0000000000000000",
            "key": "",
            "snapid": -2,
            "hash": 1655893237,
            "max": 0,
            "pool": 3,
            "namespace": "" 
        },
        "version": "4219'16423",
        "prior_version": "3978'16310",
        "last_reqid": "osd.70.4213:2359",
        "user_version": 17205,
        "size": 4194304,
        "mtime": "2017-08-03 22:07:34.656122",
        "local_mtime": "2017-08-05 23:02:33.628734",
        "lost": 0,
        "flags": 52,
        "snaps": [],
        "truncate_seq": 0,
        "truncate_size": 0,
        "data_digest": 2822203961,
        "omap_digest": 4294967295,
        "watchers": {}
    },
    "stat": {
        "size": 4194304,
        "blksize": 4096,
        "blocks": 8200,
        "nlink": 1
    },
    "SnapSet": {
        "snap_context": {
            "seq": 26,
            "snaps": [
                26,
                25,
                16
            ]
        },
        "head_exists": 1,
        "clones": [
            {
                "snap": 16,
                "size": 4194304,
                "overlap": "[4~4194300]" 
            },
            {
                "snap": 25,
                "size": 4194304,
                "overlap": "[]" 
            },
            {
                "snap": 26,
                "size": 4194304,
                "overlap": "[]" 
            }
        ]
    }
}

As we can see, its clone_overlap for snap 26 is empty, which means the writes' "modified range" is neither recorded in the cache tier nor in the base tier.

I think maybe we really should move the clone overlap modification out of the IF block which has the condition check "is_present_clone". As for now, I can't see any other way to fix this problem.

Am I right about this?

Actions #9

Updated by Jason Dillaman over 6 years ago

  • Project changed from rbd to RADOS

Moving this back to RADOS -- changing librbd to force a full object diff if an object exists in the cache tier seems pretty racy.

Actions #10

Updated by Xuehan Xu over 6 years ago

Hi, everyone.

I've found that the reason that clone overlap modifications should pass "is_present_clone" condition check is to make cache stats right as demonstrated by http://tracker.ceph.com/issues/7964.

So I think if I add a similar clone overlap, say cache_clone_overlap, that tracks the disk space usage when it's in cache, leaving the original clone overlap track all the modifications for export-diff to be right, and make the "copy_get" operation also copys the object's extensive attributes which are "object_info" and "snapset", the problem should be solved. Does this sound reasonable to you?

Thank you:-)

Actions

Also available in: Atom PDF