Bug #38184
osd: recovery does not preserve copy-on-write allocations between object clones after 'rbd revert'
0%
Description
Hi. I've already reported it in issue 36614, but here is a more concrete case.
- Start with a bluestore Ceph cluster
- Create an RBD image
- Fill it with data
- Remember disk space used by the image as X
- Create a snapshot of it
- Immediately revert to it (rbd snap revert)
- After revert finishes you'll see that there was still X space used, but object count in the cluster is doubled
- Trigger a massive rebalance in the cluster
- After rebalance finishes you'll see that the image's objects residing in moved PGs now use 2*X disk space. This is because virtual clones stop being virtual after their data is moved
- Now run rbd snap revert again
- You'll see the space usage drop. This is because "virtual clones" become "virtual" again.
I think it's a bug and should be fixed. It had led to a bad situation in our cluster once, described in issue 36614.
History
#1 Updated by Vitaliy Filippov about 5 years ago
Anyone?
#2 Updated by Sage Weil about 5 years ago
- Project changed from bluestore to RADOS
- Subject changed from Virtual clones break and begin to eat space after rebalancing to osd: recovery does not preserve copy-on-write allocations between object clones after 'rbd revert'
- Status changed from New to 12
This is indeed the current behavior. The OSD isn't clever enough to preserve the shared allocations across recovery. It is a large effort to change this.
#3 Updated by Patrick Donnelly over 4 years ago
- Status changed from 12 to New