Project

General

Profile

Actions

Bug #21757

open

snapshotted RBD objects can't be automatically evicted from a cache tier when cluster in near full state

Added by Xiaojun Liao over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Tiering
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

[enviroment]
1, ceph version:Jewel 10.2.6 or firefly 0.80.7
2, kernel: 3.10.0-229.14.1.el7.x86_64

[procedure to produce problem]
1, create a base pool and overlay a cache pool with write-back mode;
$ ceph osd pool create basepool 128
$ ceph osd pool create cachepool 128
$ ceph osd pool set cachepool size 1
$ ceph osd pool set cachepool crush_ruleset 1
$ ceph osd tier add basepool cachepool
$ ceph osd tier cache-mode cachepool writeback
$ ceph osd tier set-overlay basepool cachepool
$ ceph osd pool set cachepool hit_set_type bloom
$ ceph osd pool set cachepool hit_set_count 1
$ ceph osd pool set cachepool hit_set_period 1800
$ ceph osd pool set cachepool target_max_bytes 6388535296
$ ceph osd pool set cachepool target_max_objects 6388535296
$ ceph osd pool set cachepool cache_target_dirty_ratio 0.4
$ ceph osd pool set cachepool cache_target_full_ratio 0.8
2, create a 10GB rbd in basepool;
$ rbd -p basepool create -s 10240 test_10g --image-format 2
3, write the rbd fully firstly;
4, make five disk partition on the rbd;
5, create snapshot "snap1" for the rbd;
6, write the five partition furiously by 4K-randomly mode, script like this:

fio -filename=/dev/vdb1 -direct=1  -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest1 &
fio -filename=/dev/vdb2 -direct=1  -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest2 &
fio -filename=/dev/vdb3 -direct=1  -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest3 &
fio -filename=/dev/vdb4 -direct=1  -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest4 &
fio -filename=/dev/vdb5 -direct=1  -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest 

7, the cachepool may be full because of large rbd objects clones, cachepool osd filesytem usage reached to 96%, if not, make a second snapshot and keep writing;
8, change the cluster to a near full state:
$ ceph pg set_full_ratio 0.98
$ ceph daemon osd.X config set osd_failsafe_full_ratio 0.98

then cachepool start to do only a little evict, cachepool osd filesystem usage come to 95%.

Becase of large objects clones when furiously writing a snapshotted RBD, cluster may come to a full state, after improving full_ratio, cachepool should evict objects automatically and
the usage should reduce to a near cache_target_full_ratio level,but it only do a little evict.

I found that cachepool osd pg bytes do not contain objects clones size, it just only equal to the sum of head objects. The tier agent will not make a evict effort.

[root@obs1 ceph-1]# ceph osd map cachepool rbd_data.30ee74b0dc51.000000000000027e
osdmap e1249 pool 'cachepool' (39) object 'rbd_data.30ee74b0dc51.000000000000027e' -> pg 39.7cd073ff (39.7f) -> up ([1], p1) acting ([1], p1)
[root@obs1 ceph-1]# ceph pg dump |grep -e 39.7f
dumped all in format plain
39.7f    17    0    0    0    33599579    38    38    active+clean    2017-10-10 17:27:55.056515    1249'38    1249:83    [1]    1    [1]    1    0'0    2017-10-10 16:08:07.786197    0'0    2017-10-10 16:08:07.786197
[root@obs1 ceph-1]# cd current/39.7f_head/
[root@obs1 39.7f_head]# ls -l
total 65576
-rw-r--r-- 1 root root      91 10月 11 11:12 hit\uset\u39.7f\uarchive\u2017-10-10 17:30:55.579224\u2017-10-11 11:12:45.547154__head_0000007F_.ceph-internal_27
-rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000000af__12_189FAAFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000000af__head_189FAAFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000024b__12_C36BB5FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000024b__head_C36BB5FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000027e__12_7CD073FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000027e__head_7CD073FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000310__12_44278CFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000310__head_44278CFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000380__12_AE7AC2FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000380__head_AE7AC2FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.00000000000003bd__12_95C723FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.00000000000003bd__head_95C723FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.000000000000055d__12_3157327F__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.000000000000055d__head_3157327F__27
-rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000005f2__12_C3B0DCFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000005f2__head_C3B0DCFF__27
[root@obs1 39.7f_head]# ls -l |grep head
-rw-r--r-- 1 root root      91 10月 11 11:12 hit\uset\u39.7f\uarchive\u2017-10-10 17:30:55.579224\u2017-10-11 11:12:45.547154__head_0000007F_.ceph-internal_27
-rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000000af__head_189FAAFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000024b__head_C36BB5FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000027e__head_7CD073FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000310__head_44278CFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000380__head_AE7AC2FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.00000000000003bd__head_95C723FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.000000000000055d__head_3157327F__27
-rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000005f2__head_C3B0DCFF__27
[root@obs1 39.7f_head]# ls -l |grep -v head
total 65576
-rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000000af__12_189FAAFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000024b__12_C36BB5FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000027e__12_7CD073FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000310__12_44278CFF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000380__12_AE7AC2FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.00000000000003bd__12_95C723FF__27
-rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.000000000000055d__12_3157327F__27
-rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000005f2__12_C3B0DCFF__27

But when i create a 4MB RBD in basepool, write fully, make a snapshot then write it again, i found pg bytes is the sum of head and clone.

Actions #1

Updated by Jason Dillaman over 6 years ago

  • Project changed from Ceph to RADOS
  • Category changed from OSD to Tiering
Actions

Also available in: Atom PDF