Bug #16890
openrbd diff outputs nothing when the image is layered and with a writeback cache tier
0%
Description
I found this problem with v0.80.7, but in master branch ,it's still exists.
Here are the steps to reproduce:
- environment
# ./ceph --version *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) # ./ceph osd tree *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** # id weight type name up/down reweight -3 3 root cache -4 3 host host-cache 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 -1 3 root default -5 3 host ceph-test 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 # ./ceph osd crush rule dump *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** [ { "rule_id": 0, "rule_name": "replicated_ruleset", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default"}, { "op": "choose_firstn", "num": 0, "type": "osd"}, { "op": "emit"}]}, { "rule_id": 1, "rule_name": "cache_ruleset", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -3, "item_name": "cache"}, { "op": "choose_firstn", "num": 0, "type": "osd"}, { "op": "emit"}]}] # ./ceph osd pool create base 8 # ./ceph osd pool create cache 8 # ./ceph osd pool set cache crush_ruleset 1
- with cache tier
# echo 'abc' > test_rbd # ./rbd -p base import --image-format 2 test_rbd test_rbd # ./rbd snap create base/test_rbd@snap # ./rbd snap protect base/test_rbd@snap # ./rbd clone base/test_rbd@snap rbd/test_rbd_clone # ./rbd snap create rbd/test_rbd_clone@snap # ./rbd diff rbd/test_rbd_clone@snap Offset Length Type 0 4 data
- with writeback cache tier
# ./ceph osd tier add rbd cache # ./ceph osd tier cache-mode cache writeback # ./ceph osd pool set cache hit_set_type bloom # ./ceph osd pool set cache hit_set_count 1 # ./ceph osd pool set cache hit_set_period 3600 # ./ceph osd pool set cache target_max_bytes 10737418240 # ./ceph osd tier set-overlay rbd cache # ./rbd diff rbd/test_rbd_clone@snap
HERE IS NOTHING.
- rbd object is not promoted immediately, so when run rbd diff for the first time, the result is correct
# ./rbd diff rbd/test_rbd_clone@snap Offset Length Type 0 4 data
- after that, rbd object is promoted, rbd diff outputs nothing
# ./rados -p cache ls rbd_object_map.5e3f2ae8944a.0000000000000004 rbd_data.5e3f2ae8944a.0000000000000000 rbd_id.test_rbd_clone rbd_header.5e3f2ae8944a # ./rbd diff rbd/test_rbd_clone@snap
- if I call cache-flush-evict-all, rbd diff works correctly again:
# ./rados -p cache cache-flush-evict-all # ./rbd diff rbd/test_rbd_clone@snap Offset Length Type 0 4 data
I looked into the source code, rbd diff would look parent diff only if calling list-snaps on the rbd object returns ENOENT.
Wth a writeback cache tier, when doing list-snaps on a base pool object(non-exists), a whiteout object will be created in cache pool.
Calling list-snaps on this whiteouted object will normally return , then rbd diff will not work correctly.
Updated by Chao Zhao almost 8 years ago
sorry, typo in the steps to reproduce:
2. with cache tier
-->
2. without cache tier
Updated by Samuel Just almost 8 years ago
This looks like a bug with the object we choose to promote. The snap list operation is logically on SNAPDIR or something -- I bet we don't handle that case right.
Updated by Chao Zhao almost 8 years ago
Yes, the snap list operation is on SNAPDIR,but in ReplicatedPG::do_op, find_object_context finds out the head and snapdir are not exist in cache tier neither, so it makes the decision to promote the head. Then in promote_object, start_copy does mirroring snapset which in turn makes another list_snaps call (this time list_snaps ignores cache tier) but ignores the return value.
I'm trying to fix this problem, but haven't got a clear and easy solution by now.- failing maybe_handle_cache is logically straightforward, but we have to extra oprations (on object in base pool) which are usually done by promote_object;
- failing promote_object, when doing list_snaps in promote_object(_copy_some), if list_snaps returns -ENOENT, complete this copy operation. But it seems some more work should be done in finish_promote or finish_ctx.
Updated by Josh Durgin almost 7 years ago
- Status changed from New to Fix Under Review
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to rbd
- Assignee changed from Kefu Chai to Jason Dillaman
Jason, can you make sure you expect this to work from an RBD perspective and throw it into the RADPS project if so? :)
Updated by Jason Dillaman almost 7 years ago
- Project changed from rbd to RADOS
- Assignee deleted (
Jason Dillaman)
RBD isn't doing anything special with regard to cache tiering. It sounds like the whiteout in the cache tier is not returning -ENOENT when the list-snaps op is invoked.
Updated by Kefu Chai about 5 years ago
rebased PR posted at https://github.com/ceph/ceph/pull/26542