Feature #91: mds: up:shadow mode - CephFS - Ceph

Actions

Copy link

Feature #91

closed

mds: up:shadow mode

Added by Sage Weil almost 14 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Greg Farnum

Category:

Target version:

% Done:

50%

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Component(FS):

Labels (FS):

Pull request ID:

Description

replay client while in standby, so we can take over immediately on failure.

Actions

Copy link

Updated by Sage Weil over 13 years ago

Assignee set to Greg Farnum
Priority changed from Low to Normal
Target version set to v0.24

Update the journaler interface to allow the MDS to 'tail' the journal... periodically check to see if it's been extended and read events as they are written. (Someday we can use the new watch/notify to make this efficient!)

The shadow mds will also need to 'trim' the expired part of the journal by periodically checking the jouranler header (and expire_pos) and trimming old LogSegments and associated metadata out of its cache. This will probably be tricky, but whoever does it will hopefully come out with a thorough grasp of how the replay works and can write some of it down!

Actions

Copy link

Updated by Sage Weil over 13 years ago

Estimated time set to 16:00 h
Source set to 5

Actions

Copy link

Updated by Greg Farnum over 13 years ago

Status changed from New to In Progress

I've been getting some proper time in on this on and off over the last few days. Pushed the Journaler changes to the branch standby_replay. Will be starting on making the MDS tail and update next!

Actions

Copy link

Updated by Greg Farnum over 13 years ago

% Done changed from 0 to 20

Updated Journaler to make new interface options asynchronous.
Presently working on how to disambiguate between a one-shot and continuous replay (probably a new state) on the MDS and Monitor. Then implement basic continuous replay without worrying much about evicting stuff from the cache. Finally figure out how to effectively evict stuff from the cache without breaking interaction with the other MDSes.

Actions

Copy link

Updated by Sage Weil over 13 years ago

Target version changed from v0.24 to v0.25

Actions

Copy link

Updated by Sage Weil over 13 years ago

Translation missing: en.field_position deleted (~~365~~)
Translation missing: en.field_position set to 1

Actions

Copy link

Updated by Sage Weil over 13 years ago

Translation missing: en.field_position deleted (1)
Translation missing: en.field_position set to 3

Actions

Copy link

Updated by Greg Farnum over 13 years ago

% Done changed from 20 to 50

I have yet to implement trimming, but the basic restarting-replay bits are now in place along with hooks to make it start. Testing is revealing a fair number of issues with the Journaler and MDLog, though -- they don't much like repeating this process!

Actions

Copy link

Updated by Greg Farnum over 13 years ago

Okay, this seems to be working now. Had to adjust how the Journaler treated read_pos and to fix a few of my new re-read functions as they weren't setting all variables properly, and now it loops happily.
Now I'm implementing the state change OUT of standby_replay, so these machines can take over. (Won't take long.)
Trimming will be the last thing to do, but it's starting to look simpler.

Actions

Copy link

#10

Updated by Greg Farnum over 13 years ago

Status changed from In Progress to Resolved

Well, this seems to be working as best I can tell.

There are some odd issues with virtual memory usage growing by leaps and bounds, but heap analyzer tools (tcmalloc heapdump, massif) indicate that it's not actually using that much memory, so....fragmentation?
Yehuda and Sage can't come up with anything so we decided to table it unless we hear about real problems. Merged into unstable!

Actions

Copy link

#11

Updated by John Spray over 7 years ago

Project changed from Ceph to CephFS
Category deleted (1)
Target version deleted (~~v0.25~~)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Feature #91

mds: up:shadow mode

Updated by Sage Weil over 13 years ago

Updated by Sage Weil over 13 years ago

Updated by Greg Farnum over 13 years ago

Updated by Greg Farnum over 13 years ago

Updated by Sage Weil over 13 years ago

Updated by Sage Weil over 13 years ago

Updated by Sage Weil over 13 years ago

Updated by Greg Farnum over 13 years ago

Updated by Greg Farnum over 13 years ago

Updated by Greg Farnum over 13 years ago

Updated by John Spray over 7 years ago