Project

General

Profile

Actions

Subtask #2621

closed

Feature #2611: mon: Single-Paxos

mon: Single-Paxos: synchronize the MonitorDBStore of oblivious monitor

Added by Joao Eduardo Luis almost 12 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
% Done:

100%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Objective: synchronize monitor stores over the network whenever a given monitor mon.X falls too far behind.

Solution: Replicate a given monitor mon.Y store on to mon.X up to the state at which mon.Y’s store is at the moment the synchronization began. This should allow mon.X’s Paxos to recover newer versions by itself.

Approach Overview:

  1. mon.X starts fresh
  2. mon.X probes mon.Y and verifies that fell behind to a point where no recovery is feasible
  3. mon.X requests to synchronize with mon.Y
  4. mon.X clears all its keys (if any) to avoid incurring in a potential inconsistent state, and sets a given special key ‘monitor_synchronizing’ to true -- this shall avoid that, if the monitor fails or dies during the synchronization process, it becomes aware of that fact once it restarts
  5. mon.Y requests the Leader not to trim its versions
  6. the leader acknowledges and informs mon.Y of the earliest paxos version available
  7. mon.Y has the same version (otherwise, assert) as the one mentioned by the leader; mon.Y will have to renew this temporary trimming-disabled request with the leader every now and then, so the leader doesn’t trim after a certain amount of time has passed (which is a failsafe, in case mon.Y fails)
  8. mon.Y snapshots its store, creating a read-only version that can be safely iterated over, while allowing Paxos to progress (i.e., applying newer proposals that may add or remove keys on the store)
  9. mon.Y sets a synchronization expiration timer, to be reset whenever a synchronization-related message is received from mon.X; timer expiration means mon.X may have died and the synchronization state should be discarded
  10. mon.Y starts iterating over its snapshot, creating transactions representing ‘put’ operations that shall recreate the store state, and encoding these transactions into bufferlists that are safe to share across the network
  11. mon.Y sends a messages to mon.X with encoded transactions; sending further messages is dependent on mon.X acking the reception of the last sent message
  12. On the last message containing encoded transactions, mon.Y will flag it as such, considering the synchronization finished once mon.X acks its reception.
  13. mon.X acks the last message, and is now considered in a consistent state up until Paxos version 10
  14. mon.X removes its special ‘monitor_synchronizing’ key
  15. mon.Y will set a timer to re-enable trimming after, say, 30 seconds have passed. This should give enough time for mon.X’s Paxos to start its recovery process without losing versions in-between finishing the synchronization and starting obtaining the reamining state from mon.Y
  16. mon.X will now probe mon.Y, in order to obtain the remaining Paxos versions that might have been applied in the mean time
  17. Once mon.Y shares all of its remaining state with mon.X, and mon.X applies it, the synchronization process may be considered finished

Work Plan:

  1. Make the LevelDBStore's iterator concurrency-safe by using snapshots (task #2756)
  2. Make the MonitorDBStore use the LevelDBStore's safe iterator to pack chunks of the key space into transactions (task #2757)
  3. Extend the KeyValueDBMemory in-memory mock-up with an Iterator responsible for iterating over all the store keys, instead of only iterating over a given prefix (task #2758)
  4. Create a new test using the KeyValueDBMemory mock-up in order to test concurrent modification and iteration over a MonitorDBStore (task #)
  5. Create a verification utility, to compare MonitorDBStores, and check if two stores differ and, if so, which keys/values differ (task #)
  6. Make the trimming go through Paxos in all the services (task #2737)
  7. Test the trimming by executing a longer run with several map updates, taking the monitors offline and compare their stores (task #)
  8. Test the store sync approach in offline mode (task #)
  9. Implement the Synchronization message passing (task #2745)
  10. Debug the message passing to make sure events are happening as they should (task #2746)
  11. Make the MonitorDBStore replay each operation on its LevelDBStore on a KeyValueDBStore in-memory mock (task #)
  12. Implement store synchronization into the monitors (task #2739)

Subtasks 13 (0 open13 closed)

Subtask #2736: mon: Single-Paxos: Sync: Implement message passingResolvedJoao Eduardo Luis07/06/2012

Actions
Subtask #2744: mon: Single-Paxos: Sync: Create new Message typeResolvedJoao Eduardo Luis07/06/2012

Actions
Subtask #2745: mon: Single-Paxos: Sync: Add new message support to the Monitor classClosedJoao Eduardo Luis07/06/2012

Actions
Subtask #2746: mon: Single-Paxos: Sync: Test message passingRejectedJoao Eduardo Luis07/06/2012

Actions
Subtask #3069: mon: Single-Paxos: messaging: log MMonSync messages for offline matchingRejectedJoao Eduardo Luis09/01/2012

Actions
Subtask #2737: mon: Single-Paxos: Sync: Force trimming to be proposed through PaxosResolvedJoao Eduardo Luis07/06/2012

Actions
Subtask #2738: mon: Single-Paxos: Sync: Add snapshot support to the monitor storeRejectedJoao Eduardo Luis07/06/2012

Actions
Subtask #2739: mon: Single-Paxos: Sync: Synchronize the store of a drifted monitorResolvedJoao Eduardo Luis07/09/201207/09/2012

Actions
Subtask #2757: mon: Single-Paxos: Sync: pack chunks of the MonitorDBStore into transactionsResolvedJoao Eduardo Luis07/09/201207/09/2012

Actions
Subtask #2756: mon: Single-Paxos: LevelDBStore: Make iterator thread-safeResolvedJoao Eduardo Luis07/09/201207/09/2012

Actions
Subtask #2758: mon: Single-Paxos: Sync: Extend the in-memory mock-up of KeyValueDB to support the safe iteratorResolvedJoao Eduardo Luis07/09/201207/09/2012

Actions
Subtask #2805: mon: Single-Paxos: Sync: Create a test unit to verify the correctness of the whole-space and snapshot iteratorsResolvedJoao Eduardo Luis07/20/2012

Actions
Subtask #2741: mon: Single-Paxos: Sync: Assess requirements for QA testsResolvedJoao Eduardo Luis07/06/2012

Actions
Actions #1

Updated by Joao Eduardo Luis almost 12 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Joao Eduardo Luis almost 12 years ago

  • Description updated (diff)
  • Estimated time set to 0:00 h
Actions #3

Updated by Joao Eduardo Luis almost 12 years ago

  • Description updated (diff)
  • Estimated time set to 0:00 h
Actions #4

Updated by Joao Eduardo Luis almost 12 years ago

  • Description updated (diff)
Actions #5

Updated by Joao Eduardo Luis almost 12 years ago

  • Description updated (diff)
Actions #6

Updated by Joao Eduardo Luis over 11 years ago

  • Description updated (diff)
Actions #7

Updated by Joao Eduardo Luis almost 11 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF