Project

General

Profile

Bug #9285

osd: promoted object can get evicted before promotion completes

Added by Sage Weil over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Date: Fri, 29 Aug 2014 06:43:22 +0000
From: "Wang, Zhiqiang" <zhiqiang.wang@intel.com>
To: "'ceph-devel@vger.kernel.org'" <ceph-devel@vger.kernel.org>
Subject: Cache tiering slow request issue: currently waiting for rw locks
Parts/Attachments:
   1 Shown   ~35 lines  Text
   2          44 KB     Application, "slow_request.log" 
----------------------------------------

Hi all,

I've ran into this slow request issue some time ago. The problem is like this:
when running with cache tieing, there are 'slow request' warning messages in the
log file like below.

2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow requests, 1
included below; oldest blocked for > 30.996595 secs
2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request 30.996595
seconds old, received at 2014-08-29 10:17:53.673142:
osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read
3440640~4096] 45.cf45084b ack+read e26168) v4 currently waiting for rw locks

Recently I made some changes to the log, captured this problem, and finally
figured out its root cause. You can check the attachment for the logs.

Here is the root cause:
There is a cache miss when doing read. During promotion, after copying the data
from base tier osd, the cache tier primary osd replicates the data to other
cache tier osds. Some times this takes quite a long time. During this period of
time, the promoted object may be evicted because the cache tier is full. When
the primary osd finally gets the replication response and restarts the original
read request, it doesn't find the object in the cache tier, and do promotion
again. This loops for several times, and we'll see the 'slow request' in the
logs. Theoretically, this could loops forever, and the request from the client
would never be finished.

There is a simple fix for this:
Add a field in the object state, indicating the status of the promotion. It's
set to true after the copy of data from base tier and before the replication.
It's reset to false after the replication and the original client request starts
to execute. Evicting is not allowed when this field is true.

What do you think?

Associated revisions

Revision fa45ed81 (diff)
Added by Sage Weil over 9 years ago

osd/ReplicatedPG: do not evict blocked objects

If the object is blocked, that means we are either still promoting it, or
we are in the process of doing a blocking flush. In both cases, do not
evict it just yet. For the promotion case in particular this can cause a
very long op delay because we'll have to restart the promotion, and this
can potentially loop indefinitely.

For the flush case, flushes are generally only blocking when the user
explicitly requests it. In that case, we are not particularly concerned
that we may delay the objects eventual eviction as the user is probably
controlling that anyway (and even if they aren't, the agent will still
try this object again later).

Fixes: #9285
Reported-by: Wang, Zhiqiang <>
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 9 years ago

  • Status changed from New to Fix Under Review

#2 Updated by Zhiqiang Wang over 9 years ago

Tested Sage's the pull request, checking the object is blocked or not doesn't work. Actually this check is already done in function agent_work.

I tried to make a fix to add a field/flag to the object context. This is not a good idea for the following reasons:
1) If making this filed/flag to be a persistent one, when resetting/clearing this flag, we need to persist it. This is not good for read request.
2) If making this field/flag not to be a persistent one, when the object context is removed from the cache ' object_contexts', this field/flag is removed as well. This object is removed in the later evicting. The same issue still exists.

So, I came up with a fix to add a set in the class ReplicatedPG to hold all the promoting objects. This fix is at https://github.com/ceph/ceph/pull/2374. It is tested and works well. Pls review and comment, thx.

#3 Updated by Samuel Just over 9 years ago

This appears to be independent of promotion. Is the problem not simply that we don't guarantee that the object will not be evicted before serving the request which caused the promotion?

#4 Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to 4

#5 Updated by Sage Weil over 9 years ago

  • Status changed from 4 to Fix Under Review

#6 Updated by Samuel Just over 9 years ago

  • Status changed from Fix Under Review to 12

I left a comment on a simpler approach.

#7 Updated by Samuel Just over 9 years ago

  • Priority changed from Urgent to High

#8 Updated by Laurent GUERBY over 9 years ago

We probably have this issue on our ceph cluster (0.80.7 on commodity PC hardware + 10G ethernet) and that this is blocking us from using erasure coding and SSD caching, so would be nice to have a fix :).

2014-11-17 10:43:07.413310 mon.0 [INF] pgmap v5070792: 548 pgs: 548 active+clean; 2848 GB data, 7371 GB used, 18214 GB / 26955 GB avail; 95663 B/s rd, 42932 kB/s wr, 1279 op/s
2014-11-17 10:42:58.756375 osd.0 [WRN] 17 slow requests, 5 included below; oldest blocked for > 33.780159 secs
2014-11-17 10:42:58.756486 osd.0 [WRN] slow request 33.779614 seconds old, received at 2014-11-17 10:42:24.976051: osd_op(client.4425786.0:18905913 rbd_data.43b5aa2ae8944a.0000000000000764 [set-alloc-hint object_size 4194304 write_size 4194304,write 3952640~4096] 58.dc438225 ack+ondisk+write e15637) v4 currently no flag points reached
2014-11-17 10:42:58.756542 osd.0 [WRN] slow request 33.779579 seconds old, received at 2014-11-17 10:42:24.976086: osd_op(client.4425786.0:18905914 rbd_data.43b5aa2ae8944a.0000000000000764 [set-alloc-hint object_size 4194304 write_size 4194304,write 3964928~4096] 58.dc438225 ack+ondisk+write e15637) v4 currently no flag points reached
2014-11-17 10:42:58.756641 osd.0 [WRN] slow request 33.779445 seconds old, received at 2014-11-17 10:42:24.976220: osd_op(client.4425786.0:18905915 rbd_data.43b5aa2ae8944a.0000000000000764 [set-alloc-hint object_size 4194304 write_size 4194304,write 3973120~4096] 58.dc438225 ack+ondisk+write e15637) v4 currently no flag points reached
2014-11-17 10:42:58.756647 osd.0 [WRN] slow request 33.779410 seconds old, received at 2014-11-17 10:42:24.976255: osd_op(client.4425786.0:18905916 rbd_data.43b5aa2ae8944a.0000000000000764 [set-alloc-hint object_size 4194304 write_size 4194304,write 3985408~4096] 58.dc438225 ack+ondisk+write e15637) v4 currently no flag points reached
2014-11-17 10:42:58.756715 osd.0 [WRN] slow request 33.779378 seconds old, received at 2014-11-17 10:42:24.976287: osd_op(client.4425786.0:18905917 rbd_data.43b5aa2ae8944a.0000000000000764 [set-alloc-hint object_size 4194304 write_size 4194304,write 4001792~4096] 58.dc438225 ack+ondisk+write e15637) v4 currently no flag points reached
2014-11-17 10:43:01.951190 osd.0 [WRN] 6 slow requests, 1 included below; oldest blocked for > 36.974865 secs
2014-11-17 10:43:01.951468 osd.0 [WRN] slow request 30.382577 seconds old, received at 2014-11-17 10:42:31.568339: osd_op(client.4425786.0:18906066 rbd_data.43b5aa2ae8944a.0000000000000700 [set-alloc-hint object_size 4194304 write_size 4194304,write 2265088~4096] 58.5f8f578b ack+ondisk+write e15637) v4 currently no flag points reached
2014-11-17 10:43:08.431318 mon.0 [INF] pgmap v5070793: 548 pgs: 548 active+clean; 2848 GB data, 7371 GB used, 18214 GB / 26955 GB avail; 407 kB/s rd, 24839 kB/s wr, 802 op/s

#9 Updated by Laurent GUERBY over 9 years ago

Hi,

Any news on this "high" priority bug?

Sincerely,

Laurent

#10 Updated by Sage Weil over 9 years ago

Laurent: I'm skeptical this is the issue you are seeing. Can you capture an osd log (debug osd = 20, debug ms = 1) on an osd that throws a slow request warning like that?

#11 Updated by Laurent GUERBY over 9 years ago

We updated our ceph version to giant as described in #10399 and we no longer have slow requests when cache tiering is enabled (but we have #10399 to keep us busy :)

#12 Updated by Samuel Just about 9 years ago

  • Status changed from 12 to Resolved

Also available in: Atom PDF