Project

General

Profile

Actions

Bug #55515

open

[pwl] "rbd persistent-cache flush" is very slow

Added by Ilya Dryomov almost 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This report is mostly for rwl mode and cache files larger than 5G but I have seen extremely slow flushes with ssd mode and default sized cache files as well:

- enable the cache, open the image
- perform a bunch of writes with fio
- kill fio
- run "rbd persistent-cache flush"

Since fio wouldn't insert any flushes, it should be possible to destage all data from the cache file in any order, as fast as possible. Instead, the observation is that the flushing process is heavily throttled and takes significantly longer that it would have taken to write the same amount of data directly to the OSDs without any caching involved.

This comes up when gracefully releasing exclusive lock as well, not only with kill + "rbd persistent-cache flush".

Actions #1

Updated by CONGMIN YIN almost 2 years ago

Can you share your fio configuration? If the QD is high, it is normal to write back to OSD from the cache longer than directly writing to OSD. I tried to reproduce the issue.
The last sentence refers to "gracefully releasing exclusive lock", How is this done?

Actions #2

Updated by Ilya Dryomov almost 2 years ago

CONGMIN YIN wrote:

Can you share your fio configuration?

Preethi, please share your fio command.

If the QD is high, it is normal to write back to OSD from the cache longer than directly writing to OSD. I tried to reproduce the issue.

I'm not sure I agree. Even if the queue depth was high, if the workload didn't employ flushes (meaning that there are no user-initiated flushes recorded in the cache), destaging data from the cache file should be comparable to directly writing to the OSDs -- perhaps slightly slower. I don't see an inherent reason for it to be many times slower.

The last sentence refers to "gracefully releasing exclusive lock", How is this done?

It happens when some other process requests the exclusive lock. When the process that owns the lock notices the request, it releases the lock so that the other process can grab it, flushing and removing the cache file as part of the release sequence.
You can observe it by launching two I/O workloads (e.g. two different fio jobs) against the same image.

Actions #3

Updated by Preethi Nataraj almost 2 years ago

Ilya Dryomov wrote:

CONGMIN YIN wrote:

Can you share your fio configuration?

Preethi, please share your fio command.

If the QD is high, it is normal to write back to OSD from the cache longer than directly writing to OSD. I tried to reproduce the issue.

I'm not sure I agree. Even if the queue depth was high, if the workload didn't employ flushes (meaning that there are no user-initiated flushes recorded in the cache), destaging data from the cache file should be comparable to directly writing to the OSDs -- perhaps slightly slower. I don't see an inherent reason for it to be many times slower.

The last sentence refers to "gracefully releasing exclusive lock", How is this done?

It happens when some other process requests the exclusive lock. When the process that owns the lock notices the request, it releases the lock so that the other process can grab it, flushing and removing the cache file as part of the release sequence.
You can observe it by launching two I/O workloads (e.g. two different fio jobs) against the same image.

@congmin, Yes QD is high. Please find the below command used
fio --name=test-1 --ioengine=rbd --pool=pmem --rbdname=image --numjobs=4 --rw=write --bs=32k --iodepth=32 --fsync=32 --runtime=480 --time_based --group_reporting --ramp_time=120

Actions #4

Updated by Ilya Dryomov almost 2 years ago

Preethi Nataraj wrote:

@congmin, Yes QD is high. Please find the below command used
fio --name=test-1 --ioengine=rbd --pool=pmem --rbdname=image --numjobs=4 --rw=write --bs=32k --iodepth=32 --fsync=32 --runtime=480 --time_based --group_reporting --ramp_time=120

Ah, it looks like in Preethi's case user-initiated flushes took place (--fsync=32), so the slowness might be expected there.

But I definitely saw cache flushing taking too long on workloads with no flushes, particularly in ssd mode.

Actions #5

Updated by jianpeng ma almost 2 years ago

If no fsync, pwl will insert inner-sync. And i didn't understand " slowness might be expected there."
Also, i notice fio w/ numjobs=4. We talked this, for fio+librbd numjobs will make exclusive-lock race so the speed will slower.
So i think we should remove this paramter.

Actions #6

Updated by CONGMIN YIN almost 2 years ago

Ilya Dryomov wrote:

But I definitely saw cache flushing taking too long on workloads with no flushes, particularly in ssd mode.

No matter whether the user initiates a flush request or not, PWL will initiate a flush request after each certain number of requests and insert sync_point to ensure partial order.

When the depth is high and the cache is in a stable state(full), writing directly to OSD is faster than writing to cache and then writing back to OSD, especially when the performance of CEPH cluster is high. In clusters with poor performance, this value of io_depth may be about 32. In clusters with good performance, this value may be only 16.

When the cluster performance is good, the client is close to the upper limit performance at low depth. At this time, the increase of depth has little effect on the performance of writing to OSD. This has been verified in previous tests. Then the cache has additional overhead in stable state(full): write to the cache, read it out and write it back to OSD.

Actions

Also available in: Atom PDF