Project

General

Profile

Actions

Fix #45140

open

osd/tiering: flush cache pool may lead to slow write requests

Added by Arvin Liang about 4 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Tiering
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In OSD tiering, when flushing objects from cache pool to base pool, there are two problems can lead to slow request:
1. Before flushing, list_objects first. But, a list_objects mostly only can flush only object to backend in now. Because list_objects operation will take PG lock, this can lead to slow requests.
2. OSD use a map to save the priority of PG pending in cache tier, the PGs in the same priority will be handled one by one. This will lead to disk utils imbalance for base pool osds.

How to trigger this Problem:
1.cache tier use ssd, base tier use sata
2.write lots of data to tier, greater to the cache_target_dirty_high_ratio. you can also change it to a small value after writing a large number data to trigger faster
3.use 'ceph -s' to show slow requests, then gcore the slow osd, can find list_objects take the PG lock for a long time.
4.use 'iostat -x 2' to base pool disks, can find that only a few of osds with very high disk utils, others are low utils.

Actions #1

Updated by Arvin Liang about 4 years ago

Pull request ID: 34623

Actions

Also available in: Atom PDF