Project

General

Profile

Actions

Bug #8641

closed

Cache tiering agent cannot flush or evict objects during the benchmark

Added by Sherry Shahbazi almost 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I set target_max_objects to 1000, but it does not evict objects during creation of workload. It does not even start after the benchmark is finished. It only starts eviction when I execute "ceph osd pool set hot-storage target_max_objects 1000" again either during the test (that causes OSD to be down and I have to restart OSD) or after benchmark.


Files

CRUSHmap (2.8 KB) CRUSHmap CRUSH map Sherry Shahbazi, 06/23/2014 03:24 PM
Actions #1

Updated by Samuel Just almost 10 years ago

  • Status changed from New to Need More Info

How did you initially set target_max_objects, please provide more detail.

Actions #2

Updated by Sherry Shahbazi almost 10 years ago

Samuel Just wrote:

How did you initially set target_max_objects, please provide more detail.

The attachment is my CRUSH map, ceph osd lspools shows:
0 metadata, 1 tier1-cache, 2 tier1
Then I set the ruleset for each pool as follows:
ceph osd pool set metadata crush_ruleset 0
ceph osd pool set tier1-cache crush_ruleset 1
ceph osd pool set tier1 crush_ruleset 2
After all I did the following steps to add tiering:
1) ceph osd tier add tier1 tier1-cache
2) ceph osd tier cache-mode tier1-cache writeback
3) ceph osd tier set-overlay tier1 tier1-cache
4) ceph osd pool set tier1-cache target_max_objects 1000
Then I set the CephFS in kernel:
ceph mds newfs 0 2 --yes-i-really-mean-it
sudo mkdir /mnt/oruafs
sudo mount -t ceph ceph-mon1:6789,ceph-mon2:6789,ceph-mon3:6789:/ /mnt/oruafs -o name=admin
sudo mkdir /mnt/oruafs/tier1
cephfs /mnt/oruafs/tier1 set_layout -p 2
sudo mount -t ceph ceph-mon1:6789,ceph-mon2:6789,ceph-mon3:6789:/tier1 /mnt/oruafs/tier1 -o name=admin
Once I start the test, I noticed that all the objects would stay at tier1-cache and will not be evicted. tier1-cache should evict objects when it reaches 1000 objects!

Actions #3

Updated by David Zafman almost 10 years ago

When I was experimenting with tiering during development, I ran into this issue when the value of target_max_objects is smaller than the total PGs in the pool. The algorithm won't work in that case. So say you have 1024 PGs in the pool, I wouldn't set target_max_objects much lower than 10 times that or 10240, for example. For target_max_objects of 10240 each PG attempts to have 10 objects in it. This gives you a reasonable granularity in 10% increments in terms of osd_agent_min_evict_effort.

In a production environment this shouldn't be a problem since the maximum objects wouldn't be set that low.

Actions #4

Updated by Sherry Shahbazi almost 10 years ago

I have only 128 in that tier1-cache pool. Based on what you are saying, setting target_max_objects to 10 times greater than 128 = 1280 should work. I should add that I first set the target_max_objects to 100,000 then 10,000 and neither worked. I also set the other parameters in cache tiering like cache_target_full_ratio to 0.8 before. But eventually my OSDs in cache tier pool become full and down as object agent could not flush objects.

Actions #5

Updated by Sherry Shahbazi almost 10 years ago

David Zafman wrote:

When I was experimenting with tiering during development, I ran into this issue when the value of target_max_objects is smaller than the total PGs in the pool. The algorithm won't work in that case. So say you have 1024 PGs in the pool, I wouldn't set target_max_objects much lower than 10 times that or 10240, for example. For target_max_objects of 10240 each PG attempts to have 10 objects in it. This gives you a reasonable granularity in 10% increments in terms of osd_agent_min_evict_effort.

In a production environment this shouldn't be a problem since the maximum objects wouldn't be set that low.

I have only 128 PGs in tier1-cache pool. Based on what you are saying, setting target_max_objects to 10 times greater than 128 = 1280 should work. I should add that I first set the target_max_objects to 100,000 then 10,000 and neither worked. I also set the other parameters in cache tiering like cache_target_full_ratio to 0.8 before. But eventually my OSDs in cache tier pool become full and down as object agent could not flush objects.

Actions #6

Updated by Samuel Just almost 10 years ago

I think you need add-cache rather than set-overlay.

Actions #7

Updated by Samuel Just almost 10 years ago

Where in the docs did you see that bit?

Actions #8

Updated by Sherry Shahbazi almost 10 years ago

Samuel Just wrote:

I think you need add-cache rather than set-overlay.

Based on the following link, I need to set-overlay when the cache-mode is writeback.
http://ceph.com/docs/master/rados/operations/cache-tiering/#creating-a-cache-tier

Actions #9

Updated by Sherry Shahbazi almost 10 years ago

Samuel Just wrote:

Where in the docs did you see that bit?

I also followed what Greg told me in his reply to my email related to CephFS:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10764.html

But I think when the objects come from CephFS, RADOS is not able to handle that.

Actions #10

Updated by Samuel Just almost 10 years ago

What kernel version are you using?

Actions #11

Updated by Sherry Shahbazi almost 10 years ago

Samuel Just wrote:

What kernel version are you using?

It's 3.14 as Yan Zheng suggested since I couldn't mount CephFS with kernel version 3.12.

Actions #12

Updated by Sage Weil almost 10 years ago

  • Priority changed from Urgent to High
Actions #13

Updated by Szymon Zacher over 9 years ago

In my opinion problem affect also cache_min_evict_age cache_min_flush_age and others. It's impossible to force ceph cache to flush or evict objects regularly.

ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)

Actions #14

Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF