Project

General

Profile

Actions

Subtask #2745

closed

Feature #2611: mon: Single-Paxos

Subtask #2621: mon: Single-Paxos: synchronize the MonitorDBStore of oblivious monitor

Subtask #2736: mon: Single-Paxos: Sync: Implement message passing

mon: Single-Paxos: Sync: Add new message support to the Monitor class

Added by Joao Eduardo Luis almost 12 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Category:
Monitor
Target version:
% Done:

100%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Three different "roles" on a monitor cluster regarding synchronization:

  • Leader - responsible for disabling the Paxos trim while there's at least one on-going sync; also responsible to deny any new sync request until the Paxos state is trimmed once we go over some predetermined threshold.
  • Sync Requester - The monitor in need to be synchronized; will contact the Leader to request the green light to go on with the sync, and will obtain a consistent, up-to-date store state from a quorum member.
  • Sync Provider - A monitor belonging to the quorum that may be the Leader, against which the Sync Requester will be synchronized.

Synchronization Implementation

Role-independent

  set<string> get_sync_targets_names();
  void handle_sync(MMonSync *m);
  void handle_sync_abort(MMonSync *m);
  void reset_sync();

Leader-specific

  Mutex trim_lock;
  map<entity_inst_t, Context*> trim_timeouts;
  Context *trim_enable_timer;

  struct C_TrimTimeout;
  struct C_TrimEnable;

  void sync_send_heartbeat(entity_inst_t &other, bool reply);
  void handle_sync_start(MMonSync *m);
  void handle_sync_heartbeat(MMonSync *m);
  void handle_sync_finish(MMonSync *m);
  void sync_finish(entity_inst_t &entity, bool abort);
  void sync_finish_abort(entity_inst_t &entity);

Sync Provider-specific

  struct SyncEntity;
  SyncEntity get_sync_entity(entity_inst_t &entity, Monitor *mon);

  struct C_SyncTimeout;

  map<entity_inst_t, SyncEntity> sync_entities;

  void sync_provider_cleanup(entity_inst_t &entity);
  void handle_sync_start_chunks(MMonSync *m);
  void handle_sync_heartbeat_reply(MMonSync *m);
  void handle_sync_chunk_reply(MMonSync *m);
  void sync_send_chunks(SyncEntity sync, pair<string,string> &last_key);
  void sync_timeout(entity_inst_t &entity);

Sync Requester-specific

  struct C_SyncStartTimeout;
  struct C_SyncStartRetry;
  struct C_HeartbeatTimeout;
  struct C_SyncFinishReplyTimeout;

  SyncEntity sync_leader;
  SyncEntity sync_provider;

  void sync_requester_cleanup();
  void sync_requester_abort();
  void sync_start(entity_inst_t &entity);
  void handle_sync_start_reply(MMonSync *m);
  void handle_sync_chunk(MMonSync *m);
  void handle_sync_finish_reply(MMonSync *m);
  void sync_stop();
  void sync_abort();

Actions #1

Updated by Joao Eduardo Luis over 11 years ago

  • Description updated (diff)
  • Status changed from In Progress to 7
  • % Done changed from 40 to 90

Currently, most timeout callbacks simply assert. This has been allowing us to successfully debug some unforeseen situations that would otherwise led to recovery and we probably would never notice them (until something got really screwed up).

Actions #2

Updated by Joao Eduardo Luis over 11 years ago

  • Status changed from 7 to Closed
  • % Done changed from 90 to 100
Actions

Also available in: Atom PDF