Bug #8641: Cache tiering agent cannot flush or evict objects during the benchmark - Ceph - Ceph

Actions

Copy link

Bug #8641

closed

Cache tiering agent cannot flush or evict objects during the benchmark

Added by Sherry Shahbazi almost 10 years ago. Updated over 9 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I set target_max_objects to 1000, but it does not evict objects during creation of workload. It does not even start after the benchmark is finished. It only starts eviction when I execute "ceph osd pool set hot-storage target_max_objects 1000" again either during the test (that causes OSD to be down and I have to restart OSD) or after benchmark.

Files

CRUSHmap (2.8 KB) CRUSHmap

CRUSH map

Sherry Shahbazi, 06/23/2014 03:24 PM

Actions

Copy link

Updated by Samuel Just almost 10 years ago

Status changed from New to Need More Info

How did you initially set target_max_objects, please provide more detail.

Actions

Copy link

Updated by Sherry Shahbazi almost 10 years ago

File CRUSHmap CRUSHmap added

Samuel Just wrote:

How did you initially set target_max_objects, please provide more detail.

The attachment is my CRUSH map, ceph osd lspools shows:
0 metadata, 1 tier1-cache, 2 tier1
Then I set the ruleset for each pool as follows:
ceph osd pool set metadata crush_ruleset 0
ceph osd pool set tier1-cache crush_ruleset 1
ceph osd pool set tier1 crush_ruleset 2
After all I did the following steps to add tiering:
1) ceph osd tier add tier1 tier1-cache
2) ceph osd tier cache-mode tier1-cache writeback
3) ceph osd tier set-overlay tier1 tier1-cache
4) ceph osd pool set tier1-cache target_max_objects 1000
Then I set the CephFS in kernel:
ceph mds newfs 0 2 --yes-i-really-mean-it
sudo mkdir /mnt/oruafs
sudo mount -t ceph ceph-mon1:6789,ceph-mon2:6789,ceph-mon3:6789:/ /mnt/oruafs -o name=admin
sudo mkdir /mnt/oruafs/tier1
cephfs /mnt/oruafs/tier1 set_layout -p 2
sudo mount -t ceph ceph-mon1:6789,ceph-mon2:6789,ceph-mon3:6789:/tier1 /mnt/oruafs/tier1 -o name=admin
Once I start the test, I noticed that all the objects would stay at tier1-cache and will not be evicted. tier1-cache should evict objects when it reaches 1000 objects!

Actions

Copy link

Updated by David Zafman almost 10 years ago

When I was experimenting with tiering during development, I ran into this issue when the value of target_max_objects is smaller than the total PGs in the pool. The algorithm won't work in that case. So say you have 1024 PGs in the pool, I wouldn't set target_max_objects much lower than 10 times that or 10240, for example. For target_max_objects of 10240 each PG attempts to have 10 objects in it. This gives you a reasonable granularity in 10% increments in terms of osd_agent_min_evict_effort.

In a production environment this shouldn't be a problem since the maximum objects wouldn't be set that low.

Actions

Copy link

Updated by Sherry Shahbazi almost 10 years ago

I have only 128 in that tier1-cache pool. Based on what you are saying, setting target_max_objects to 10 times greater than 128 = 1280 should work. I should add that I first set the target_max_objects to 100,000 then 10,000 and neither worked. I also set the other parameters in cache tiering like cache_target_full_ratio to 0.8 before. But eventually my OSDs in cache tier pool become full and down as object agent could not flush objects.

Actions

Copy link

Updated by Sherry Shahbazi almost 10 years ago

David Zafman wrote:

When I was experimenting with tiering during development, I ran into this issue when the value of target_max_objects is smaller than the total PGs in the pool. The algorithm won't work in that case. So say you have 1024 PGs in the pool, I wouldn't set target_max_objects much lower than 10 times that or 10240, for example. For target_max_objects of 10240 each PG attempts to have 10 objects in it. This gives you a reasonable granularity in 10% increments in terms of osd_agent_min_evict_effort.

In a production environment this shouldn't be a problem since the maximum objects wouldn't be set that low.

I have only 128 PGs in tier1-cache pool. Based on what you are saying, setting target_max_objects to 10 times greater than 128 = 1280 should work. I should add that I first set the target_max_objects to 100,000 then 10,000 and neither worked. I also set the other parameters in cache tiering like cache_target_full_ratio to 0.8 before. But eventually my OSDs in cache tier pool become full and down as object agent could not flush objects.

Actions

Copy link

Updated by Samuel Just almost 10 years ago

I think you need add-cache rather than set-overlay.

Actions

Copy link

Updated by Samuel Just almost 10 years ago

Where in the docs did you see that bit?

Actions

Copy link

Updated by Sherry Shahbazi almost 10 years ago

Samuel Just wrote:

I think you need add-cache rather than set-overlay.

Based on the following link, I need to set-overlay when the cache-mode is writeback.
http://ceph.com/docs/master/rados/operations/cache-tiering/#creating-a-cache-tier

Actions

Copy link

Updated by Sherry Shahbazi almost 10 years ago

Samuel Just wrote:

Where in the docs did you see that bit?

I also followed what Greg told me in his reply to my email related to CephFS:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10764.html

But I think when the objects come from CephFS, RADOS is not able to handle that.

Actions

Copy link

#10

Updated by Samuel Just almost 10 years ago

What kernel version are you using?

Actions

Copy link

#11

Updated by Sherry Shahbazi almost 10 years ago

Samuel Just wrote:

What kernel version are you using?

It's 3.14 as Yan Zheng suggested since I couldn't mount CephFS with kernel version 3.12.

Actions

Copy link

#12

Updated by Sage Weil almost 10 years ago

Priority changed from Urgent to High

Actions

Copy link

#13

Updated by Szymon Zacher over 9 years ago

In my opinion problem affect also cache_min_evict_age cache_min_flush_age and others. It's impossible to force ceph cache to flush or evict objects regularly.

ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)

Actions

Copy link

#14

Updated by Sage Weil over 9 years ago

Status changed from Need More Info to Can't reproduce

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #8641

Cache tiering agent cannot flush or evict objects during the benchmark

Updated by Samuel Just almost 10 years ago

Updated by Sherry Shahbazi almost 10 years ago

Updated by David Zafman almost 10 years ago

Updated by Sherry Shahbazi almost 10 years ago

Updated by Sherry Shahbazi almost 10 years ago

Updated by Samuel Just almost 10 years ago

Updated by Samuel Just almost 10 years ago

Updated by Sherry Shahbazi almost 10 years ago

Updated by Sherry Shahbazi almost 10 years ago

Updated by Samuel Just almost 10 years ago

Updated by Sherry Shahbazi almost 10 years ago

Updated by Sage Weil almost 10 years ago

Updated by Szymon Zacher over 9 years ago

Updated by Sage Weil over 9 years ago