Bug #55515
open
[pwl] "rbd persistent-cache flush" is very slow
Added by Ilya Dryomov about 2 years ago.
Updated about 2 years ago.
Description
This report is mostly for rwl mode and cache files larger than 5G but I have seen extremely slow flushes with ssd mode and default sized cache files as well:
- enable the cache, open the image
- perform a bunch of writes with fio
- kill fio
- run "rbd persistent-cache flush"
Since fio wouldn't insert any flushes, it should be possible to destage all data from the cache file in any order, as fast as possible. Instead, the observation is that the flushing process is heavily throttled and takes significantly longer that it would have taken to write the same amount of data directly to the OSDs without any caching involved.
This comes up when gracefully releasing exclusive lock as well, not only with kill + "rbd persistent-cache flush".
Can you share your fio configuration? If the QD is high, it is normal to write back to OSD from the cache longer than directly writing to OSD. I tried to reproduce the issue.
The last sentence refers to "gracefully releasing exclusive lock", How is this done?
CONGMIN YIN wrote:
Can you share your fio configuration?
Preethi, please share your fio command.
If the QD is high, it is normal to write back to OSD from the cache longer than directly writing to OSD. I tried to reproduce the issue.
I'm not sure I agree. Even if the queue depth was high, if the workload didn't employ flushes (meaning that there are no user-initiated flushes recorded in the cache), destaging data from the cache file should be comparable to directly writing to the OSDs -- perhaps slightly slower. I don't see an inherent reason for it to be many times slower.
The last sentence refers to "gracefully releasing exclusive lock", How is this done?
It happens when some other process requests the exclusive lock. When the process that owns the lock notices the request, it releases the lock so that the other process can grab it, flushing and removing the cache file as part of the release sequence.
You can observe it by launching two I/O workloads (e.g. two different fio jobs) against the same image.
Ilya Dryomov wrote:
CONGMIN YIN wrote:
Can you share your fio configuration?
Preethi, please share your fio command.
If the QD is high, it is normal to write back to OSD from the cache longer than directly writing to OSD. I tried to reproduce the issue.
I'm not sure I agree. Even if the queue depth was high, if the workload didn't employ flushes (meaning that there are no user-initiated flushes recorded in the cache), destaging data from the cache file should be comparable to directly writing to the OSDs -- perhaps slightly slower. I don't see an inherent reason for it to be many times slower.
The last sentence refers to "gracefully releasing exclusive lock", How is this done?
It happens when some other process requests the exclusive lock. When the process that owns the lock notices the request, it releases the lock so that the other process can grab it, flushing and removing the cache file as part of the release sequence.
You can observe it by launching two I/O workloads (e.g. two different fio jobs) against the same image.
@congmin, Yes QD is high. Please find the below command used
fio --name=test-1 --ioengine=rbd --pool=pmem --rbdname=image --numjobs=4 --rw=write --bs=32k --iodepth=32 --fsync=32 --runtime=480 --time_based --group_reporting --ramp_time=120
Preethi Nataraj wrote:
@congmin, Yes QD is high. Please find the below command used
fio --name=test-1 --ioengine=rbd --pool=pmem --rbdname=image --numjobs=4 --rw=write --bs=32k --iodepth=32 --fsync=32 --runtime=480 --time_based --group_reporting --ramp_time=120
Ah, it looks like in Preethi's case user-initiated flushes took place (--fsync=32), so the slowness might be expected there.
But I definitely saw cache flushing taking too long on workloads with no flushes, particularly in ssd mode.
If no fsync, pwl will insert inner-sync. And i didn't understand " slowness might be expected there."
Also, i notice fio w/ numjobs=4. We talked this, for fio+librbd numjobs will make exclusive-lock race so the speed will slower.
So i think we should remove this paramter.
Ilya Dryomov wrote:
But I definitely saw cache flushing taking too long on workloads with no flushes, particularly in ssd mode.
No matter whether the user initiates a flush request or not, PWL will initiate a flush request after each certain number of requests and insert sync_point to ensure partial order.
When the depth is high and the cache is in a stable state(full), writing directly to OSD is faster than writing to cache and then writing back to OSD, especially when the performance of CEPH cluster is high. In clusters with poor performance, this value of io_depth may be about 32. In clusters with good performance, this value may be only 16.
When the cluster performance is good, the client is close to the upper limit performance at low depth. At this time, the increase of depth has little effect on the performance of writing to OSD. This has been verified in previous tests. Then the cache has additional overhead in stable state(full): write to the cache, read it out and write it back to OSD.
Also available in: Atom
PDF