Project

General

Profile

Bug #13164

librbd: reads larger than cache size hang

Added by Josh Durgin over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer, firefly
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This can be triggered by using an image with order 26 (so objects are 64MB, larger than the default cache size), and trying to 'rbd cp' the image, since 'rbd cp' will copy a whole object at a time.

One solution would be restricting the maximum length of a single rados i/o that librbd sends, so we don't overfill the cache for large requests.


Related issues

Related to rbd - Bug #13124: librbd order limits inconsistent with rbd cli Resolved 09/16/2015
Copied to rbd - Backport #13387: librbd: reads larger than cache size hang Resolved
Copied to rbd - Backport #13388: librbd: reads larger than cache size hang Rejected

Associated revisions

Revision 9c8200bb (diff)
Added by lu shi over 8 years ago

librbd:reads larger than cache size hang.

Fixes:#13164

Signed-off-by: Lu Shi <>

Revision 3f33ce61 (diff)
Added by lu shi over 8 years ago

librbd:reads larger than cache size hang.

Fixes:#13164

Signed-off-by: Lu Shi <>
(cherry picked from commit 9c8200bb5d1ac9359803a182df03298b565b8479)

History

#1 Updated by lu shi over 8 years ago

is this bug complex?i want to fix this bug,This process will deepen my understanding of CEPH.

#2 Updated by Jason Dillaman over 8 years ago

"ObjectCacher::_readx" has logic to ensure that it won't flood the cache with concurrent reads, but it doesn't handle the case where a single read is larger than the actual cache size. The simple fix is to ensure that "cache is full due to concurrent reads" check only hits if other reads (besides the current read request) are in progress:

-       if (!waitfor_read.empty() || rx_bytes > max_size) {
+       if (!waitfor_read.empty() || (stat_rx > 0 && rx_bytes > max_size)) {

Since a read request >= the cache size will effectively thrash the cache, it probably only hurts more having the cache on in cases like this.

Also need to fix the 'TestLibRBD.LargeCacheRead' test case since it failed to detect this issue.

#4 Updated by Jason Dillaman over 8 years ago

  • Status changed from New to Fix Under Review

#5 Updated by Josh Durgin over 8 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to hammer

#6 Updated by Josh Durgin over 8 years ago

  • Regression changed from No to Yes

#7 Updated by Josh Durgin over 8 years ago

  • Backport changed from hammer to hammer, firefly

#8 Updated by Loïc Dachary about 8 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF