Project

General

Profile

Fix #6059

osd: block reads while repgather is writing across replicas

Added by Sage Weil over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Currently we use the ondisk_write/read locks to do mutual exclusion over the local filestore which avoids reading data that is being modified. We also need to block reads while we are waiting for replcias to commit. Otherwise object state can appear to go back in time.

- start a write locally, and on remotes
- finish local write
- another op reads (new) locally written value
- all osds crash
- replica osds restart, but old primary does not
- they recover
- client reads prior object value

we do not see this in our testing because on failure/peering our clients resubmit their requests.

History

#1 Updated by Ian Colle over 6 years ago

  • translation missing: en.field_story_points set to 8.00

#2 Updated by Samuel Just over 6 years ago

Note, just extending the obc->write_lock() region doesn't really work since it can cause the op_tp to be blocked preventing the op_wq threads from processing the OSDSubOpReply message.

#3 Updated by Ian Colle over 6 years ago

  • Assignee set to David Zafman

#4 Updated by Sage Weil over 6 years ago

  • Target version changed from v0.70 to v0.71

#5 Updated by Samuel Just over 6 years ago

  • Assignee changed from David Zafman to Samuel Just

#6 Updated by Ian Colle over 6 years ago

  • Status changed from New to In Progress

#7 Updated by Ian Colle over 6 years ago

  • Status changed from In Progress to Fix Under Review

#8 Updated by Sage Weil over 6 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF