Bug #13164
closed
librbd: reads larger than cache size hang
Added by Josh Durgin over 8 years ago.
Updated over 8 years ago.
Description
This can be triggered by using an image with order 26 (so objects are 64MB, larger than the default cache size), and trying to 'rbd cp' the image, since 'rbd cp' will copy a whole object at a time.
One solution would be restricting the maximum length of a single rados i/o that librbd sends, so we don't overfill the cache for large requests.
is this bug complex?i want to fix this bug,This process will deepen my understanding of CEPH.
"ObjectCacher::_readx" has logic to ensure that it won't flood the cache with concurrent reads, but it doesn't handle the case where a single read is larger than the actual cache size. The simple fix is to ensure that "cache is full due to concurrent reads" check only hits if other reads (besides the current read request) are in progress:
- if (!waitfor_read.empty() || rx_bytes > max_size) {
+ if (!waitfor_read.empty() || (stat_rx > 0 && rx_bytes > max_size)) {
Since a read request >= the cache size will effectively thrash the cache, it probably only hurts more having the cache on in cases like this.
Also need to fix the 'TestLibRBD.LargeCacheRead' test case since it failed to detect this issue.
- Status changed from New to Fix Under Review
- Status changed from Fix Under Review to Pending Backport
- Backport set to hammer
- Regression changed from No to Yes
- Backport changed from hammer to hammer, firefly
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF