Project

General

Profile

Actions

Feature #3763

closed

krbd: handle flattening of mapped image

Added by Alex Elder over 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
% Done:

100%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

An rbd client receives notice if the snapshot context for
a mapped rbd image has changed. It is possible for the
snapshot that is currently mapped to disappear from the
snapshot context. This should trigger tearing down of
various data structures built up to represent the image
and its parent images.

It's hard to know how intrusive this will be until more
of the layering support is complete, but I don't expect
it will be terribly difficult.


Subtasks 1 (0 open1 closed)

Subtask #5028: rbd: treat clones with zero parent overlap as non-layeredResolvedAlex Elder05/11/2013

Actions

Related issues 1 (0 open1 closed)

Related to rbd - Bug #5040: krbd: record that an parent info refresh has failedResolvedAlex Elder05/13/2013

Actions
Actions #1

Updated by Ian Colle almost 11 years ago

  • Target version set to v0.63
Actions #2

Updated by Ian Colle almost 11 years ago

  • Translation missing: en.field_story_points set to 5.00
Actions #3

Updated by Alex Elder almost 11 years ago

I'm trying to decide what to do with this issue.

In my mind, it has always had something to do
with dealing with a flatten occurring on a
mapped image.

A snapshot parent should never normally disappear,
because to be the parent of a clone it has to be
"protected" to prevent that from happening.

On the other hand, of a clone image gets flattened
then it will no longer have a parent, and in that
respect, the snapshot will have disappeared (or
at least no longer been relevant for the image).

I need to find out what exactly happens when a
clone image is flattened in order to know how
to proceed with this.

Actions #4

Updated by Alex Elder almost 11 years ago

  • Status changed from New to In Progress

I thought I'd already marked it such, but I've been working on this
for a few days. At this point I have some functioning code that
re-submits original requests when a parent image is discovered to
have disappeared. However after some testing I realized I need
to do some work to avoid killing off some data structures while
requests are in flight. My plan is to add a special reference
count for the parent structure, embedded in the child structure,
and the release function will free the parent structure and set
the child's parent pointer(s) to null.

Actions #5

Updated by Alex Elder almost 11 years ago

  • Subject changed from krbd: handle disappearance of mapped layered snapshot to krbd: handle flattening of mapped image

This work is mostly done, but I need to put it through some
more thorough tests before I'll post it for review. If
all goes well I'll do that today. Updated the subject to
make it more recognizable what I'm doing.

Actions #6

Updated by Alex Elder almost 11 years ago

I have evidence that handling a flatten of an image
works correctly when a read parent is underway, as
well as when a read full parent object for copyup is
underway.

This morning I finally caught it in the middle of
a stat existence check for a layered write and it
hit a problem. I quickly found a bug and am now
trying it again.

I am still seeing the UML crash after about 500
iterations, and this time it isn't related to
XFS. The crash is quite mysterious, never
leaving a stack trace that offers any clue about
what went wrong. I will try to catch it with
the debugger attached to see if that helps.

Actions #7

Updated by Alex Elder almost 11 years ago

I found that in the case that an existence check callback
(for a write) if the image had been flattened, I was
resubmitting a request but I resubmitted the wrong one.
Since fixing that have a UML run that has iterated over
900 times. Memory corruption due to submitting the wrong
request could explain the mysterious crash.

I also think I may know the cause of the other crash I
was looking at--I was dropping a reference to the parent
unconditionally. In many cases that was fine, but in
the particular case of a request in flight at the time
an image gets flattened, it is not.

Hopefully that's "the last bug."

Actions #8

Updated by Alex Elder almost 11 years ago

I've got over 2800 iterations on UML and over 5300 iterations
on "normal" Linux running flattens while writing 16 concurrent
4K blocks to different objects in an rbd image. I've also run
4 consecutive xfstests runs successfully, and a four sets of
file system tests (including kernel_untar, ffsb and fsstress).

I unfortunately have still not caught a STAT existence check
for write getting restarted due to a flatten. But at this
point I'm satisfied the code is OK.

I'm about to post the 13 patches I've got for review.

Actions #9

Updated by Alex Elder almost 11 years ago

  • Status changed from In Progress to Fix Under Review

The following patches have been posted for review. They
are available in the "review/wip-flatten" branch of the
ceph-client git repository.

This series of patches prepares some parent request
code to make handling the event of a mapped clone
image getting flattened easier.

-Alex

[PATCH 1/5] rbd: get parent info on refresh
[PATCH 2/5] rbd: don't release write request until necessary
[PATCH 3/5] rbd: define rbd_dev_unparent()
[PATCH 4/5] rbd: define parent image request routines
[PATCH 5/5] rbd: reference count parent requests

Actions #10

Updated by Alex Elder almost 11 years ago

The following patches have been posted for review. They
are available in the "review/wip-flatten" branch of the
ceph-client git repository.

This series detects when a mapped clone image gets flattened,
and if any requests were in flight when that occurs, causes
them to get resubmitted.

-Alex

[PATCH 1/4] rbd: detect when clone image is flattened
[PATCH 2/4] rbd: re-submit read request for flattened clone
[PATCH 3/4] rbd: re-submit write request for flattened clone
[PATCH 4/4] rbd: re-submit flattened write request (part 2)

Actions #11

Updated by Sage Weil almost 11 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF