Feature #42286
Introduction of tier local mode
0%
Description
Introduction¶
Based on kiizawa’s patch(#18211), we implemented a new cache tier mode - local mode. In this mode, an osd is configured to manage two data devices, one is fast device, one is slow device. Hot objects are promoted from slow device to fast device, and demoted from fast device to slow device when they become cold.
This work is based on ceph v12.2.5. I'm glad to port it to master branch if needed.
https://github.com/yanghonggang/ceph/commits/wip-tier-new
Advantages of tier local mode¶
local mode tier has the following advantages:
objects migaration can be accomplished inside the osd without network traffic overhead.
don't need to create extra cache pool like what we did in pool tier
there is only one copy of object whether on fast device or on slow device. So the total capacity of an osd is the sum of fast device’s size and slow device’s size.
fast devices can be used to accelerate all pools build upon the fast + slow devices.
user/caller can use hint request to indicate the position of the object.
Introduction to related modules¶
A. Object access statistics
We can reuse the existed HitSet mechanism.
B. demote agent
We can modify the existed pool tier’s demote agent to fit our purpose.
C. migration
PrimaryLogPG layer triggers a demotion by issue a set_alloc_hint request to os layer
do_op can call trigger a promote op by issue a set_alloc_hint request to os layer
promte: set_alloc_hint(..., fast_flag)
demote: set_alloc_hint(..., flags_with_fast_flag_cleared)
The migration action is accomplished by the bluestore/filestore layer. For now, only bluestore migration is supported. This work is based on kiizawa’s patch(#18211), and I fixed some serious problems.
D. fast device usage info statistics
int64_tnum_bytes_fast; // objects in bytes on fast tier
int64_t num_objects_fast; // number of objects on fast tier
Apart from the work above, we also need:
E: rados tool support
- rados put: add a --fast parameter to place object on the fast device
- rados ls: add a --more parameter to list the objects' position
- cache-demote-all: demote all objects to slow device
F: deploy tool support
I add a --block.fast option to specify the fast device.
# ceph-disk prepare --osd-id 1 --block.db /dev/nvme0n1 --block.wal /dev/nvme0n1 --block.fast /dev/nvme0n1 /dev/sdi
sdi 8:128 0 558.4G 0 disk
|-sdi1 8:129 0 100M 0 part /var/lib/ceph/osd/ceph-1
`-sdi2 8:130 0 558.3G 0 part
sdj 8:144 0 558.4G 0 disk
|-sdj1 8:145 0 100M 0 part /var/lib/ceph/osd/ceph-0
`-sdj2 8:146 0 558.3G 0 part
sdk 8:160 0 558.4G 0 disk
|-sdk1 8:161 0 100M 0 part
`-sdk2 8:162 0 558.3G 0 part
nvme0n1 259:0 0 349.3G 0 disk
|-nvme0n1p1 259:1 0 8G 0 part <------<<< db
|-nvme0n1p2 259:2 0 576M 0 part <------<<< wal
`-nvme0n1p3 259:3 0 1G 0 part <-------<<< fast
how to use local mode tier¶
Setup ceph cluster with vstart.sh:
$ CEPH_NUM_MON=1 CEPH_NUM_OSD=1 CEPH_NUM_MDS=0 CEPH_NUM_MGR=1 CEPH_NUM_RGW=0 ../src/vstart.sh -X -l -b -n --create_fast_dev
$ ls dev/osd0/ -l
total 360
-rw-r--r-- 1 ubuntu ubuntu 10737418240 Sep 15 21:47 block
lrwxrwxrwx 1 ubuntu ubuntu 54 Sep 15 21:47 block.db -> /home/ubuntu/work/my-tier/build/dev/osd0/block.db.file
-rw-r--r-- 1 ubuntu ubuntu 67108864 Sep 15 21:47 block.db.file
-rw-r--r-- 1 ubuntu ubuntu 1073741824 Sep 15 21:47 block.fast
lrwxrwxrwx 1 ubuntu ubuntu 55 Sep 15 21:47 block.wal -> /home/ubuntu/work/my-tier/build/dev/osd0/block.wal.file
-rw-r--r-- 1 ubuntu ubuntu 1048576000 Sep 15 21:47 block.wal.file
-rw-r--r-- 1 ubuntu ubuntu 2 Sep 15 21:47 bluefs
Create a pool:
$ ceph osd pool create testpool 8 8
Enable tier local mode:
$ ceph osd tier cache-mode testpool local --yes-i-really-mean-it
set cache-mode for pool 'testpool' to local
Hitset settings:
$ ceph osd pool set testpool hit_set_type bloom
$ ceph osd pool set testpool hit_set_count 4
$ ceph osd pool set testpool hit_set_period 10
$ ceph osd pool set testpool min_read_recency_for_promote 3
Put an object and trigger a promotion:
$ rados -p testpool put myobj Makefile
$ rados -p testpool ls —more
myobj slow
$ for i in {0..2}; do rados -p testpool stat myobj; rados -p testpool ls --more; sleep 8; done 2>/dev/null
testpool/myobj mtime 2019-09-15 22:23:05.000000, size 251749 on_fast 0
myobj slow
testpool/myobj mtime 2019-09-15 22:23:05.000000, size 251749 on_fast 0
myobj slow
testpool/myobj mtime 2019-09-15 22:23:49.000000, size 251749 on_fast 1
myobj fast
Check pool usage info:
$ rados df 2>/dev/null
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR TUSED TOBJECTS
testpool 247k 17 0 17 0 0 0 25 0 17 1722k 245k 1
total_objects 17
total_used 1027M
total_avail 9212M
total_space 10240M
Performance evaluation
In order to evaluate the performance of the tier local mode, I set up a mysql db based on a rbd volume and use sysbench to test its performance.
local mode tier:
block: 560G hdd
db: 20G ssd
fast: 30G ssd
cache_target_dirty_ratio 0.7
default:
block: 560G hdd
db: 50GB ssd
bcache(writeback mode):
writeback_percent 40(I want set it to 70, but its max available value is 40 :( )
block: bcache0(560G hdd + 30G ssd)
db: 20G
bench scripts:
# cat rw-bench.sh
sysbench /usr/share/sysbench/oltp_read_write.lua \
--threads=20 \
--mysql_storage_engine=innodb \
--mysql_host=localhost \
--mysql_db=test \
--mysql_user=root \
--mysql_password= --db_driver=mysql \
--tables=200 \
--table_size=1000000 \
--time=7200 \
$1
History
#1 Updated by Honggang Yang over 4 years ago
kiizawa’s patch(https://github.com/ceph/ceph/pull/18211)
#2 Updated by Honggang Yang over 4 years ago
After the sysbench prepare operation is completed, about 48883MB of db data is generated.
So in the sysbench run stage, eviction was taking place.
#3 Updated by Honggang Yang over 4 years ago
- File cas-vs-local.jpeg View added
I also compared local mode tier with intel CAS¶
- default: no tier
- tiering: local tier mode
- CAS: intel CAS
#4 Updated by Honggang Yang over 4 years ago
Honggang Yang wrote:
I also compared local mode tier with intel CAS¶
- default: no tier
- tiering: local tier mode
- CAS: intel CAS
random_distribution=zipf:1.1