Bug #21757
opensnapshotted RBD objects can't be automatically evicted from a cache tier when cluster in near full state
0%
Description
[enviroment]
1, ceph version:Jewel 10.2.6 or firefly 0.80.7
2, kernel: 3.10.0-229.14.1.el7.x86_64
[procedure to produce problem]
1, create a base pool and overlay a cache pool with write-back mode;
$ ceph osd pool create basepool 128
$ ceph osd pool create cachepool 128
$ ceph osd pool set cachepool size 1
$ ceph osd pool set cachepool crush_ruleset 1
$ ceph osd tier add basepool cachepool
$ ceph osd tier cache-mode cachepool writeback
$ ceph osd tier set-overlay basepool cachepool
$ ceph osd pool set cachepool hit_set_type bloom
$ ceph osd pool set cachepool hit_set_count 1
$ ceph osd pool set cachepool hit_set_period 1800
$ ceph osd pool set cachepool target_max_bytes 6388535296
$ ceph osd pool set cachepool target_max_objects 6388535296
$ ceph osd pool set cachepool cache_target_dirty_ratio 0.4
$ ceph osd pool set cachepool cache_target_full_ratio 0.8
2, create a 10GB rbd in basepool;
$ rbd -p basepool create -s 10240 test_10g --image-format 2
3, write the rbd fully firstly;
4, make five disk partition on the rbd;
5, create snapshot "snap1" for the rbd;
6, write the five partition furiously by 4K-randomly mode, script like this:
fio -filename=/dev/vdb1 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest1 & fio -filename=/dev/vdb2 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest2 & fio -filename=/dev/vdb3 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest3 & fio -filename=/dev/vdb4 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest4 & fio -filename=/dev/vdb5 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4k -numjobs=10 -runtime=1000 -group_reporting -name=mytest
7, the cachepool may be full because of large rbd objects clones, cachepool osd filesytem usage reached to 96%, if not, make a second snapshot and keep writing;
8, change the cluster to a near full state:
$ ceph pg set_full_ratio 0.98 $ ceph daemon osd.X config set osd_failsafe_full_ratio 0.98
then cachepool start to do only a little evict, cachepool osd filesystem usage come to 95%.
Becase of large objects clones when furiously writing a snapshotted RBD, cluster may come to a full state, after improving full_ratio, cachepool should evict objects automatically and
the usage should reduce to a near cache_target_full_ratio level,but it only do a little evict.
I found that cachepool osd pg bytes do not contain objects clones size, it just only equal to the sum of head objects. The tier agent will not make a evict effort.
[root@obs1 ceph-1]# ceph osd map cachepool rbd_data.30ee74b0dc51.000000000000027e osdmap e1249 pool 'cachepool' (39) object 'rbd_data.30ee74b0dc51.000000000000027e' -> pg 39.7cd073ff (39.7f) -> up ([1], p1) acting ([1], p1) [root@obs1 ceph-1]# ceph pg dump |grep -e 39.7f dumped all in format plain 39.7f 17 0 0 0 33599579 38 38 active+clean 2017-10-10 17:27:55.056515 1249'38 1249:83 [1] 1 [1] 1 0'0 2017-10-10 16:08:07.786197 0'0 2017-10-10 16:08:07.786197 [root@obs1 ceph-1]# cd current/39.7f_head/ [root@obs1 39.7f_head]# ls -l total 65576 -rw-r--r-- 1 root root 91 10月 11 11:12 hit\uset\u39.7f\uarchive\u2017-10-10 17:30:55.579224\u2017-10-11 11:12:45.547154__head_0000007F_.ceph-internal_27 -rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000000af__12_189FAAFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000000af__head_189FAAFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000024b__12_C36BB5FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000024b__head_C36BB5FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000027e__12_7CD073FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000027e__head_7CD073FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000310__12_44278CFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000310__head_44278CFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000380__12_AE7AC2FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000380__head_AE7AC2FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.00000000000003bd__12_95C723FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.00000000000003bd__head_95C723FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.000000000000055d__12_3157327F__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.000000000000055d__head_3157327F__27 -rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000005f2__12_C3B0DCFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000005f2__head_C3B0DCFF__27 [root@obs1 39.7f_head]# ls -l |grep head -rw-r--r-- 1 root root 91 10月 11 11:12 hit\uset\u39.7f\uarchive\u2017-10-10 17:30:55.579224\u2017-10-11 11:12:45.547154__head_0000007F_.ceph-internal_27 -rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000000af__head_189FAAFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000024b__head_C36BB5FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000027e__head_7CD073FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000310__head_44278CFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000380__head_AE7AC2FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.00000000000003bd__head_95C723FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.000000000000055d__head_3157327F__27 -rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000005f2__head_C3B0DCFF__27 [root@obs1 39.7f_head]# ls -l |grep -v head total 65576 -rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000000af__12_189FAAFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000024b__12_C36BB5FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:32 rbd\udata.30ee74b0dc51.000000000000027e__12_7CD073FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000310__12_44278CFF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.0000000000000380__12_AE7AC2FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.00000000000003bd__12_95C723FF__27 -rw-r--r-- 1 root root 4194304 10月 10 17:31 rbd\udata.30ee74b0dc51.000000000000055d__12_3157327F__27 -rw-r--r-- 1 root root 4194304 10月 10 17:36 rbd\udata.30ee74b0dc51.00000000000005f2__12_C3B0DCFF__27
But when i create a 4MB RBD in basepool, write fully, make a snapshot then write it again, i found pg bytes is the sum of head and clone.
Updated by Jason Dillaman over 6 years ago
- Project changed from Ceph to RADOS
- Category changed from OSD to Tiering