Project

General

Profile

Bug #13982

tiering: target_max_bytes won't work as expected once you use 'rbd rm'

Added by Mehdi Abaakouk over 8 years ago. Updated about 8 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When 'rbd rm' has been used, "rados df" reports the same objects as before the 'rm' while the used size is decreased because the cache pool have already deleted the file on the disk.

The objects number doesn't decrease because the tier cache still track objects that must be delete on the "data pool".

But this has a really bad effect on 'target_max_bytes' for the automatic flushing/evicting system of the tiercacheagent.
The evict code will compute a eronemous "full_micro" [1] and the tieragent will not respect the target_max_bytes anymore.
The cache pool can now easily exceed target_max_bytes if no other evict rules are set the cache pool can never evict objects (and so never delete the data on the data pool), as noted here:

http://tracker.ceph.com/issues/13894#note-15

[1] https://github.com/ceph/ceph/blob/b7eb16786156a37f185315234bea4e51379d0343/src/osd/ReplicatedPG.cc#L11903

Steps to reproduce:

$ sudo ceph osd crush rule dump
[ { "rule_id": 0,
"rule_name": "replicated_ruleset",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [ { "op": "take",
"item": -1,
"item_name": "default"}, { "op": "choose_firstn",
"num": 0,
"type": "osd"}, { "op": "emit"}]}]

$ sudo ceph osd pool create data 8 8 replicated replicated_ruleset
$ sudo ceph osd pool create cache 8 8 replicated replicated_ruleset
$ sudo ceph osd tier add data cache
$ sudo ceph osd tier cache-mode cache writeback
$ sudo ceph osd tier set-overlay data cache
$ sudo ceph osd pool set data size 1
$ sudo ceph osd pool set data min_size 1
$ sudo ceph osd pool set cache size 1
$ sudo ceph osd pool set cache min_size 1
$ sudo ceph osd pool set cache cache_target_dirty_ratio 0.4
$ sudo ceph osd pool set cache cache_target_full_ratio 0.8
$ sudo ceph osd pool set cache target_max_bytes 100000000000
$ sudo ceph osd pool set cache target_max_objects 100000000000
$ sudo ceph osd pool set cache hit_set_type bloom
$ sudo ceph osd pool set cache hit_set_count 1
$ sudo ceph osd pool set cache hit_set_period 3600
$ sudo ceph osd dump | grep ^pool

$ sudo ceph osd dump | grep ^pool
pool 7 'data' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 30 lfor 28 flags hashpspool tiers 8 read_tier 8 write_tier 8 stripe_width 0
pool 8 'cache' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 39 flags hashpspool,incomplete_clones tier_of 7 cache_mode writeback target_bytes 100000000000 target_objects 100000000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0

$ dd if=/dev/urandom of=file bs=1M count=1000

$ find /var/lib/ceph | grep rbd.*data -c
0

Creates a rbd image and force flush and repopulation of the cache

$ sudo rbd -p data import --image-format 2 file file
$ sudo rados -p cache cache-flush-evict-all
$ sudo rbd -p data export file - > /dev/null

$ find /var/lib/ceph | grep rbd.*data -c
500

$ df -h /var/lib/ceph
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 320G 2.4G 318G 1% /var/lib/ceph

$ sudo rados df
pool name KB objects clones degraded unfound rd rd KB wr wr KB
cache 1024001 256 0 0 0 169 3706 4066 4096004
data 1024001 253 0 0 0 261 4 761 2048002
total used 1696000 509
total avail 333684480
total space 335380480

Here, everything is correct the space usage is 2.4G, half for the cache and half for the data.

$ sudo rbd -p data rm file
Removing image: 100% complete...done.
$ find /var/lib/ceph | grep rbd.*data -c
250

$ df -h /var/lib/ceph
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 320G 1.2G 319G 1% /var/lib/ceph

$ sudo rados df
pool name KB objects clones degraded unfound rd rd KB wr wr KB
cache 1 260 0 0 0 188 3720 4320 4096004
data 1024001 253 0 0 0 261 4 761 2048002
total used 1174816 513
total avail 334205664
total space 335380480

After the 'rbd rm' the data use 1.2G on the disk because the cache have not yet been flushed.
But 'rados df' report looks wierd. The "cache pool" have the 260 objects but uses only 1k.

History

#1 Updated by Samuel Just over 8 years ago

  • Priority changed from Normal to High

#2 Updated by Sage Weil about 8 years ago

  • Status changed from New to Won't Fix

THis is as expected. Once the cache flushes out the whiteouts the space will be reclaimed.

We could make rbd rm do a hint of WONTNEED or NOCACHE so that it writes these through the cache, perhaps...

#3 Updated by Mehdi Abaakouk about 8 years ago

I understand that's the expected behavior but this bug report is about fixing this formula [1] to have a working dirty_micro and full_micro, that currently computes wrong number, when this behavior occurs.

[1] https://github.com/ceph/ceph/blob/b7eb16786156a37f185315234bea4e51379d0343/src/osd/ReplicatedPG.cc#L11903

Also available in: Atom PDF