Project

General

Profile

Actions

Bug #52258

closed

[pwl] The write back time of cache is too long

Added by CONGMIN YIN over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The write back time of cache is too long, and the shutdown time is too long. For example, after a fio test finish, it can't return in time. The image was not closed because the cache was being written back for a long time.


Related issues 1 (0 open1 closed)

Related to rbd - Bug #51418: [pwl] segment fault on syncpoint stackResolvedHualong Feng

Actions
Actions #1

Updated by CONGMIN YIN over 2 years ago

This is because the flush request is not executed in the correct position.

Actions #2

Updated by Deepika Upadhyay over 2 years ago

  • Related to Bug #51418: [pwl] segment fault on syncpoint stack added
Actions #3

Updated by CONGMIN YIN over 2 years ago

  • Status changed from New to Closed

If the bandwidth of write to cache is larger than that of write back to cluster(general case):

cache not full: fill_in_cache_time = cache_cap / (bw_write_to_cache - bw_write_back_to_cluster)

cache full, Stable_stage: bw_write_to_cache speed will decrease to bw_write_back_to_cluster

write end: write back time = cache_cap / bw_write_back_to_cluster

The time of writing back to the cluster is the time saved by high-speed writing when the cache has free space. For the time being, this is inevitable and logical. Writing back to the cluster is faster than writing directly to the cluster when the thread or depth is not high.

Actions #4

Updated by Ilya Dryomov over 2 years ago

  • Status changed from Closed to New

write end: write back time = cache_cap / bw_write_back_to_cluster

Is that always the case? I saw "write back 1G from cache" take a lot longer than "write 1G directly to the cluster" on multiple occasions. The workload that filled the cache didn't send any flushes so in theory the cache should have written that 1G out as fast as possible (i.e. there weren't any ordering constraints imposed by the workload there).

Actions #5

Updated by CONGMIN YIN over 2 years ago

Is that always the case? I saw "write back 1G from cache" take a lot longer than "write 1G directly to the cluster" on multiple occasions. The workload that filled the cache didn't send any flushes so in theory the cache should have written that 1G out as fast as possible (i.e. there weren't any ordering constraints imposed by the workload there).

Are you talking about the performance in SSD cache mode? I haven't tested the performance of SSD cache yet. I'm talking about the performance of RWL by default. In addition, RWL can also be tested on SSD without pmem, but the performance will be worse.

In persist on flush mode, write back need waiting for sync_point write down to cache. Therefore, it will wait for up to 256 entries to be written to the cache. However, this does not affect the final write back time, but only the performance of the transition from filling cache to stable stage.

bool GenericWriteLogEntry::can_writeback() const {
  return (this->completed &&
          (ram_entry.sequenced ||
           (sync_point_entry &&
            sync_point_entry->completed)));
}

In rwl mode, assuming cache = 1GB, cap = 800MB, writeback_BW = 25MB/s, the write back time is expected to be 32s. But sometimes, this time is not as expected in theory, it may be longer. I don't fully understand the wake-up mechanism of PWL. We can find the reasons in the subsequent optimization stage.

Actions #6

Updated by Ilya Dryomov over 2 years ago

Yes, most of my tests were performed on the ssd mode. But if you are saying that the rwl mode is also sometimes slow

But sometimes, this time is not as expected in theory, it may be longer. I don't fully understand the wake-up mechanism of PWL. We can find the reasons in the subsequent optimization stage.

why close this ticket?

Actions #7

Updated by Ilya Dryomov over 2 years ago

Linking https://github.com/ceph/ceph/pull/42775 that was supposed to address this for posterity.

Actions #8

Updated by CONGMIN YIN over 2 years ago

Linking https://github.com/ceph/ceph/pull/42775 that was supposed to address this for posterity.

In RWL mode, I have thought that there is no problem with flush requst in advance. It will only advance flush req by IO depth at most. After testing, no problem is found. Because RWL encounters few situations, I was thinking that when the block size >= 1MB, the writeback depth is only 1, which is a possible reason why I have encountered long writeback time. I haven't been tested in detail. When I encounter a long write back time, I should first test the write back performance in a stable state. Sometimes the cluster performance may be poor, and the write back performance to the cluster may also be poor.

In SSD mode, it is always waiting for 32 entries before writing to the cache. Similarly, it generally wait for 128 dirty entries before writing back. This may be the reason why it didn't write it back as soon as possible. If the write back time at the end is always long in ssd mode, exceeding cap / write_ back_ BW in stable state, this is a problem that needs to be solved. We can keep this tracker.

Actions #9

Updated by Deepika Upadhyay over 2 years ago

  • Status changed from New to Closed

Author comments:
Tracker: https://tracker.ceph.com/issues/52258 This is not a bug, write back should talk some time: write back time = cache_cap / bw_write_back_to_cluster

The cause of this problem in the past may be deadlock during shutdown, mistakenly thinking that the write back is very slow and cannot be written back. Dead lock fixed by https://github.com/ceph/ceph/pull/42950

If the bw_write_back_to_cluster is low, write back may spend a long time. From current observations in master, the write-back time is reasonable

Closing unless observed otherwise

Actions

Also available in: Atom PDF