Project

General

Profile

Bug #44731

Space leak in Bluestore

Added by Vitaliy Filippov 6 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Hi.

I'm experiencing some kind of a space leak in Bluestore. I use EC, compression and snapshots. First I thought that the leak was caused by "virtual clones" (issue #38184). However, then I got rid of most of the snapshots, but continued to experience the problem.

I suspected something when I added a new disk to the cluster and free space in the cluster didn't increase (!).

So to track down the issue I moved one PG (34.1a) using upmaps from osd11,6,0 to osd6,0,7 and then back to osd11,6,0.

It ate +59 GB after the first move and +51 GB after the second. As I understand this proves that it's not #38184. Devirtualizaton of virtual clones couldn't eat additional space after SECOND rebalance of the same PG.

The PG has ~39000 objects, it is EC 2+1 and the compression is enabled. Compression ratio is about ~2.7 in my setup, so the PG should use ~90 GB raw space.

Before and after moving the PG I stopped osd0, mounted it with ceph-objectstore-tool with debug bluestore = 20/20 and opened the 34.1a***/all directory. It seems to dump all object extents into the log in that case. So now I have two logs with all allocated extents for osd0 (I hope all extents are there). I parsed both logs and added all compressed blob sizes together ("get_ref Blob ... 0x20000 -> 0x... compressed"). But they add up to ~39 GB before first rebalance (34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the second move (34.1as2) which doesn't indicate a leak.

But the raw space usage still exceeds initial by a lot. So it's clear that there's a leak somewhere.

What additional details can I provide for you to identify the bug?

History

#1 Updated by Greg Farnum 6 months ago

  • Project changed from Ceph to RADOS

#2 Updated by Neha Ojha 6 months ago

  • Project changed from RADOS to bluestore

#3 Updated by Vitaliy Filippov about 1 month ago

An update here: this was caused by broken compression in Ubuntu builds around 14.2.6-7-8 or so. Old data was compressed, but it was becoming uncompressed when buggy OSDs were rebalancing it. This resulted in an apparent "leak". Now the issue is gone.

#4 Updated by Neha Ojha about 1 month ago

  • Status changed from New to Closed

Doesn't look like a Ceph issue.

Also available in: Atom PDF