Bug #44731
closedSpace leak in Bluestore
0%
Description
Hi.
I'm experiencing some kind of a space leak in Bluestore. I use EC, compression and snapshots. First I thought that the leak was caused by "virtual clones" (issue #38184). However, then I got rid of most of the snapshots, but continued to experience the problem.
I suspected something when I added a new disk to the cluster and free space in the cluster didn't increase (!).
So to track down the issue I moved one PG (34.1a) using upmaps from osd11,6,0 to osd6,0,7 and then back to osd11,6,0.
It ate +59 GB after the first move and +51 GB after the second. As I understand this proves that it's not #38184. Devirtualizaton of virtual clones couldn't eat additional space after SECOND rebalance of the same PG.
The PG has ~39000 objects, it is EC 2+1 and the compression is enabled. Compression ratio is about ~2.7 in my setup, so the PG should use ~90 GB raw space.
Before and after moving the PG I stopped osd0, mounted it with ceph-objectstore-tool with debug bluestore = 20/20 and opened the 34.1a***/all directory. It seems to dump all object extents into the log in that case. So now I have two logs with all allocated extents for osd0 (I hope all extents are there). I parsed both logs and added all compressed blob sizes together ("get_ref Blob ... 0x20000 -> 0x... compressed"). But they add up to ~39 GB before first rebalance (34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the second move (34.1as2) which doesn't indicate a leak.
But the raw space usage still exceeds initial by a lot. So it's clear that there's a leak somewhere.
What additional details can I provide for you to identify the bug?
Updated by Vitaliy Filippov over 3 years ago
An update here: this was caused by broken compression in Ubuntu builds around 14.2.6-7-8 or so. Old data was compressed, but it was becoming uncompressed when buggy OSDs were rebalancing it. This resulted in an apparent "leak". Now the issue is gone.
Updated by Neha Ojha over 3 years ago
- Status changed from New to Closed
Doesn't look like a Ceph issue.