Bug #8641
Cache tiering agent cannot flush or evict objects during the benchmark
0%
Description
I set target_max_objects to 1000, but it does not evict objects during creation of workload. It does not even start after the benchmark is finished. It only starts eviction when I execute "ceph osd pool set hot-storage target_max_objects 1000" again either during the test (that causes OSD to be down and I have to restart OSD) or after benchmark.
History
#1 Updated by Samuel Just over 9 years ago
- Status changed from New to Need More Info
How did you initially set target_max_objects, please provide more detail.
#2 Updated by Sherry Shahbazi over 9 years ago
- File CRUSHmap added
Samuel Just wrote:
How did you initially set target_max_objects, please provide more detail.
The attachment is my CRUSH map, ceph osd lspools shows:
0 metadata, 1 tier1-cache, 2 tier1
Then I set the ruleset for each pool as follows:
ceph osd pool set metadata crush_ruleset 0
ceph osd pool set tier1-cache crush_ruleset 1
ceph osd pool set tier1 crush_ruleset 2
After all I did the following steps to add tiering:
1) ceph osd tier add tier1 tier1-cache
2) ceph osd tier cache-mode tier1-cache writeback
3) ceph osd tier set-overlay tier1 tier1-cache
4) ceph osd pool set tier1-cache target_max_objects 1000
Then I set the CephFS in kernel:
ceph mds newfs 0 2 --yes-i-really-mean-it
sudo mkdir /mnt/oruafs
sudo mount -t ceph ceph-mon1:6789,ceph-mon2:6789,ceph-mon3:6789:/ /mnt/oruafs -o name=admin
sudo mkdir /mnt/oruafs/tier1
cephfs /mnt/oruafs/tier1 set_layout -p 2
sudo mount -t ceph ceph-mon1:6789,ceph-mon2:6789,ceph-mon3:6789:/tier1 /mnt/oruafs/tier1 -o name=admin
Once I start the test, I noticed that all the objects would stay at tier1-cache and will not be evicted. tier1-cache should evict objects when it reaches 1000 objects!
#3 Updated by David Zafman over 9 years ago
When I was experimenting with tiering during development, I ran into this issue when the value of target_max_objects is smaller than the total PGs in the pool. The algorithm won't work in that case. So say you have 1024 PGs in the pool, I wouldn't set target_max_objects much lower than 10 times that or 10240, for example. For target_max_objects of 10240 each PG attempts to have 10 objects in it. This gives you a reasonable granularity in 10% increments in terms of osd_agent_min_evict_effort.
In a production environment this shouldn't be a problem since the maximum objects wouldn't be set that low.
#4 Updated by Sherry Shahbazi over 9 years ago
I have only 128 in that tier1-cache pool. Based on what you are saying, setting target_max_objects to 10 times greater than 128 = 1280 should work. I should add that I first set the target_max_objects to 100,000 then 10,000 and neither worked. I also set the other parameters in cache tiering like cache_target_full_ratio to 0.8 before. But eventually my OSDs in cache tier pool become full and down as object agent could not flush objects.
#5 Updated by Sherry Shahbazi over 9 years ago
David Zafman wrote:
When I was experimenting with tiering during development, I ran into this issue when the value of target_max_objects is smaller than the total PGs in the pool. The algorithm won't work in that case. So say you have 1024 PGs in the pool, I wouldn't set target_max_objects much lower than 10 times that or 10240, for example. For target_max_objects of 10240 each PG attempts to have 10 objects in it. This gives you a reasonable granularity in 10% increments in terms of osd_agent_min_evict_effort.
In a production environment this shouldn't be a problem since the maximum objects wouldn't be set that low.
I have only 128 PGs in tier1-cache pool. Based on what you are saying, setting target_max_objects to 10 times greater than 128 = 1280 should work. I should add that I first set the target_max_objects to 100,000 then 10,000 and neither worked. I also set the other parameters in cache tiering like cache_target_full_ratio to 0.8 before. But eventually my OSDs in cache tier pool become full and down as object agent could not flush objects.
#6 Updated by Samuel Just over 9 years ago
I think you need add-cache rather than set-overlay.
#7 Updated by Samuel Just over 9 years ago
Where in the docs did you see that bit?
#8 Updated by Sherry Shahbazi over 9 years ago
Samuel Just wrote:
I think you need add-cache rather than set-overlay.
Based on the following link, I need to set-overlay when the cache-mode is writeback.
http://ceph.com/docs/master/rados/operations/cache-tiering/#creating-a-cache-tier
#9 Updated by Sherry Shahbazi over 9 years ago
Samuel Just wrote:
Where in the docs did you see that bit?
I also followed what Greg told me in his reply to my email related to CephFS:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10764.html
But I think when the objects come from CephFS, RADOS is not able to handle that.
#10 Updated by Samuel Just over 9 years ago
What kernel version are you using?
#11 Updated by Sherry Shahbazi over 9 years ago
Samuel Just wrote:
What kernel version are you using?
It's 3.14 as Yan Zheng suggested since I couldn't mount CephFS with kernel version 3.12.
#12 Updated by Sage Weil over 9 years ago
- Priority changed from Urgent to High
#13 Updated by Szymon Zacher over 9 years ago
In my opinion problem affect also cache_min_evict_age cache_min_flush_age and others. It's impossible to force ceph cache to flush or evict objects regularly.
ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
#14 Updated by Sage Weil over 9 years ago
- Status changed from Need More Info to Can't reproduce