Bug #19868
closedosd: fix osd balance reads might not get expected length of file content from replica osd
0%
Description
Steps to reproduce:
1. write a file with length len_1
2. try to read this file from replica osd (using CEPH_OSD_FLAG_BALANCE_READS flag)
3. continue appending some contents to this file, now file length is len_2 (len_2 > len_1)
4. try to read this file again from just above replica osd in step-2.
This issue happened after step-4. The content read back from replica osd is not correct, only first len_1 length's content is correct, while next (len_2 - len_1) length's content is not.
The root cause is that after reading file directly from replica osd, replica osd would have its object context in cache. When this file has been updated later, e.g, size is increased, primary osd will update this object context accordingly, such as size info in write_update_size_and_usage(), so next reading on primary osd will get correct length's content. But osd repop just queues filestore transactions and won't update this object context, so if next reading hits on this replica osd again, it will find object context already in cache and use the size info in object context (which is smaller than real size now) as op's extent length and pass this smaller size to filestore to read file content.
Updated by Zhi Zhang almost 7 years ago
The better fix might be letting osd repop to update object context accordingly, but it is difficult to check expected ops from a bunch of transactions and get real file size from them, and it also seems unworthy to do this. So the alternative here is to check if object context is in cache or not and remove it if exists when handle osd repop, which looks like invalidating this object context in cache. Then next reading will find no cache and recreate it.
Updated by Kefu Chai almost 7 years ago
- Status changed from New to Fix Under Review
- Assignee set to Zhi Zhang
Updated by Greg Farnum almost 7 years ago
- Is duplicate of Feature #10866: replicas need to track unstable objects to properly support replica reads added
Updated by Greg Farnum almost 7 years ago
- Status changed from Fix Under Review to Duplicate