Project

General

Profile

Actions

Fix #3188

open

osd: close read hole

Added by Sage Weil over 11 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
CY2012
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

client and now-marked-down osd with old map may continue to read.

solution probably goes something like this:

  • if the primary does not hear from the replicas (via the heartbeats) in a heartbeat_grace period, it will stop servicing reads.
  • any new primary already contacts 'up' osds, but ignores down osds. modify this behavior to also probe 'down' osds to make sure they are down. if that is successful, go active immediately. if not, wait until the heartbeat_grace period has expired to be sure the old primary is no longer servicing reads.
  • as a refinement of the above, we go active only only delay writes until the timer expires; reads are of course safe as no data has changed.
Actions #1

Updated by Sage Weil over 11 years ago

  • Priority changed from Urgent to High
Actions #2

Updated by Ian Colle over 11 years ago

  • Tracker changed from Bug to Feature
  • Priority changed from High to Normal
Actions #3

Updated by Sage Weil about 11 years ago

  • Tracker changed from Feature to Fix
Actions #4

Updated by Ian Colle almost 11 years ago

  • Translation missing: en.field_story_points set to 13.00
Actions #5

Updated by Ian Colle almost 11 years ago

  • Target version set to v0.65
Actions #6

Updated by Sage Weil almost 11 years ago

  • Status changed from New to 12

pushed wip-osd-readhole with some old incomplete work on this. here's a brain dump of where my thinking is/was on this.

the basic idea is that the primary will stop servicing reads once if it hasn't gotten an ping ack from its replicas in heartbeat_interval seconds. if the pg mapping changes but no osds go down, this is not really a problem; the peering messages that get exchanged ensure the peer has the latest map. but in the case that an osd goes down, we want to know that the peer osd saw that map, or the best upper bound on when the last replica ack it could have gotten was.

1. block reads after heartbeat_interval seconds without a replica ack.

2. after peering, wait heartbeat_interval seconds after our upper bound on when the last interval ended. initially assume this is now. that is technically sufficient to close the hole, but will introduce long delays each time a new peering interval starts.

down osds are the culprit. There are 3 cases to consider:

1. normal osd failure -- the last ack sent to the failed osd by old replicas needs to be communicated to the new primary.

- keep track of when last hb was acked for all hb peers
- share that with the primary during peering.
- make primary use that information to try to build a lower upper bound on the last ack the old primary could have received.

2. osd marks itself down -- new primaries needs to know it knew it was going down.

- mark this in the osdmap somehow?

3. 'ceph osd down NNN' or wrongly marked down.

- after we are (wrongly) marked down, keep answering pings on the old hb interface for heartbeat interval seconds.
- in hb acks, share our min(osdmap epoch) across pgs
- make replicas share old primary min_pg_epoch_consumed value with new primaries?

unfortunately lots of different threads here, all aiming to build a better upper bound on when the down osd must have stopped processing reads.

Actions #7

Updated by Samuel Just almost 11 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Samuel Just
Actions #8

Updated by Samuel Just almost 11 years ago

  • Target version deleted (v0.65)
Actions #9

Updated by Ian Colle almost 11 years ago

  • Target version set to v0.65
Actions #10

Updated by Samuel Just almost 11 years ago

  • Target version deleted (v0.65)
Actions #11

Updated by Sage Weil over 10 years ago

  • Status changed from In Progress to 12
Actions #12

Updated by Samuel Just over 7 years ago

  • Assignee deleted (Samuel Just)
Actions #13

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF