Project

General

Profile

Bug #8641

Cache tiering agent cannot flush or evict objects during the benchmark

Added by Sherry Shahbazi over 9 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I set target_max_objects to 1000, but it does not evict objects during creation of workload. It does not even start after the benchmark is finished. It only starts eviction when I execute "ceph osd pool set hot-storage target_max_objects 1000" again either during the test (that causes OSD to be down and I have to restart OSD) or after benchmark.

CRUSHmap - CRUSH map (2.8 KB) Sherry Shahbazi, 06/23/2014 03:24 PM

History

#1 Updated by Samuel Just over 9 years ago

  • Status changed from New to Need More Info

How did you initially set target_max_objects, please provide more detail.

#2 Updated by Sherry Shahbazi over 9 years ago

Samuel Just wrote:

How did you initially set target_max_objects, please provide more detail.

The attachment is my CRUSH map, ceph osd lspools shows:
0 metadata, 1 tier1-cache, 2 tier1
Then I set the ruleset for each pool as follows:
ceph osd pool set metadata crush_ruleset 0
ceph osd pool set tier1-cache crush_ruleset 1
ceph osd pool set tier1 crush_ruleset 2
After all I did the following steps to add tiering:
1) ceph osd tier add tier1 tier1-cache
2) ceph osd tier cache-mode tier1-cache writeback
3) ceph osd tier set-overlay tier1 tier1-cache
4) ceph osd pool set tier1-cache target_max_objects 1000
Then I set the CephFS in kernel:
ceph mds newfs 0 2 --yes-i-really-mean-it
sudo mkdir /mnt/oruafs
sudo mount -t ceph ceph-mon1:6789,ceph-mon2:6789,ceph-mon3:6789:/ /mnt/oruafs -o name=admin
sudo mkdir /mnt/oruafs/tier1
cephfs /mnt/oruafs/tier1 set_layout -p 2
sudo mount -t ceph ceph-mon1:6789,ceph-mon2:6789,ceph-mon3:6789:/tier1 /mnt/oruafs/tier1 -o name=admin
Once I start the test, I noticed that all the objects would stay at tier1-cache and will not be evicted. tier1-cache should evict objects when it reaches 1000 objects!

#3 Updated by David Zafman over 9 years ago

When I was experimenting with tiering during development, I ran into this issue when the value of target_max_objects is smaller than the total PGs in the pool. The algorithm won't work in that case. So say you have 1024 PGs in the pool, I wouldn't set target_max_objects much lower than 10 times that or 10240, for example. For target_max_objects of 10240 each PG attempts to have 10 objects in it. This gives you a reasonable granularity in 10% increments in terms of osd_agent_min_evict_effort.

In a production environment this shouldn't be a problem since the maximum objects wouldn't be set that low.

#4 Updated by Sherry Shahbazi over 9 years ago

I have only 128 in that tier1-cache pool. Based on what you are saying, setting target_max_objects to 10 times greater than 128 = 1280 should work. I should add that I first set the target_max_objects to 100,000 then 10,000 and neither worked. I also set the other parameters in cache tiering like cache_target_full_ratio to 0.8 before. But eventually my OSDs in cache tier pool become full and down as object agent could not flush objects.

#5 Updated by Sherry Shahbazi over 9 years ago

David Zafman wrote:

When I was experimenting with tiering during development, I ran into this issue when the value of target_max_objects is smaller than the total PGs in the pool. The algorithm won't work in that case. So say you have 1024 PGs in the pool, I wouldn't set target_max_objects much lower than 10 times that or 10240, for example. For target_max_objects of 10240 each PG attempts to have 10 objects in it. This gives you a reasonable granularity in 10% increments in terms of osd_agent_min_evict_effort.

In a production environment this shouldn't be a problem since the maximum objects wouldn't be set that low.

I have only 128 PGs in tier1-cache pool. Based on what you are saying, setting target_max_objects to 10 times greater than 128 = 1280 should work. I should add that I first set the target_max_objects to 100,000 then 10,000 and neither worked. I also set the other parameters in cache tiering like cache_target_full_ratio to 0.8 before. But eventually my OSDs in cache tier pool become full and down as object agent could not flush objects.

#6 Updated by Samuel Just over 9 years ago

I think you need add-cache rather than set-overlay.

#7 Updated by Samuel Just over 9 years ago

Where in the docs did you see that bit?

#8 Updated by Sherry Shahbazi over 9 years ago

Samuel Just wrote:

I think you need add-cache rather than set-overlay.

Based on the following link, I need to set-overlay when the cache-mode is writeback.
http://ceph.com/docs/master/rados/operations/cache-tiering/#creating-a-cache-tier

#9 Updated by Sherry Shahbazi over 9 years ago

Samuel Just wrote:

Where in the docs did you see that bit?

I also followed what Greg told me in his reply to my email related to CephFS:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10764.html

But I think when the objects come from CephFS, RADOS is not able to handle that.

#10 Updated by Samuel Just over 9 years ago

What kernel version are you using?

#11 Updated by Sherry Shahbazi over 9 years ago

Samuel Just wrote:

What kernel version are you using?

It's 3.14 as Yan Zheng suggested since I couldn't mount CephFS with kernel version 3.12.

#12 Updated by Sage Weil over 9 years ago

  • Priority changed from Urgent to High

#13 Updated by Szymon Zacher over 9 years ago

In my opinion problem affect also cache_min_evict_age cache_min_flush_age and others. It's impossible to force ceph cache to flush or evict objects regularly.

ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)

#14 Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF