Project

General

Profile

Bug #18331

RGW leaking data

Added by Matas Tvarijonas 3 months ago. Updated 29 days ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
-
Target version:
-
Start date:
12/22/2016
Due date:
% Done:

0%

Source:
Community (user)
Tags:
RGW leak data
Backport:
jewel, kraken
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
jewel
Needs Doc:
No

Description

Hello, we have leaking data at 50GB / hour on our ceph cluster 10.2.2

We use this command to calculate real usage
  1. radosgw-admin bucket stats | grep '"size_kb":' | awk '{print $2}' | sed 's/.$//' | paste -sd+ | bc

24912659802

This is about 28TB of used storage and our replication factor is 3, so it is near 90TB of RAW storage.

And we are checking pool size via:

  1. ceph df detail
    GLOBAL:
    SIZE AVAIL RAW USED %RAW USED OBJECTS
    507T 194T 312T 61.63 41756k
    POOLS:
    NAME ID CATEGORY QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
    data 0 - N/A N/A 240G 0.14 15492G 61613 61613 411k 194k 722G
    .rgw.root 13 - N/A N/A 3397 0 15492G 16 15 26007k 928 10191
    .rgw.control 14 - N/A N/A 0 0 15492G 8 3 0 0 0
    .rgw 15 - N/A N/A 297k 0 15492G 1273 1273 106M 21640k 891k
    .rgw.gc 16 - N/A N/A 0 0 15492G 512 492 374M 450M 0
    .users.uid 17 - N/A N/A 31695 0 15492G 145 145 72228k 22828k 95085
    .users 18 - N/A N/A 2425 0 15492G 80 79 81628 224 7275
    .users.email 19 - N/A N/A 1861 0 15492G 49 49 21 88 5583
    .rgw.buckets.index 20 - N/A N/A 0 0 15492G 979 979 1595M 744M 0
    .log 21 - N/A N/A 2423 0 15492G 321 320 39877k 103M 7269
    .rgw.buckets 22 - N/A N/A 74569G 43.09 15492G 34667278 33854k 4485M 1336M 218T
    cinder_backups 25 - N/A N/A 0 0 15492G 0 0 0 0 0
    cinder_volumes 26 - N/A N/A 18664G 10.78 15492G 4720798 4610k 15087M 335G 55992G
    nova_root 30 - N/A N/A 9349G 5.40 15492G 1259549 1230k 27544M 161G 28047G
    glance_images 33 - N/A N/A 3102G 1.79 15492G 397580 388k 58006k 947k 9306G
    migration 35 - N/A N/A 136 0 15492G 2 2 5 2 408
    .users.swift 37 - N/A N/A 107 0 15492G 8 8 6 14 321
    default.rgw.meta 38 - N/A N/A 475M 0 15492G 1648259 1609k 0 5494k 1425M
    default.rgw.buckets.non-ec 39 - N/A N/A 0 0 15492G 15 15 18064 315k 0
    .rgw.root.161220 40 - N/A N/A 7315 0 15492G 21 21 0 21 21945

So:

.rgw.buckets 22 - N/A N/A 74569G 43.09 15492G 34667278 33854k 4485M 1336M 218T

This is 218 TB used RAW space, so we are missing ~130 TB of RAW space.

Currently we leak 50GB /hour.

Very similar case is discussed here: http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-November/045037.html

Please look at this case.


Related issues

Duplicated by Bug #18258: rgw: radosgw-admin orphan find goes into infinite loop Duplicate 12/15/2016
Copied to Backport #18827: jewel: RGW leaking data Resolved
Copied to Backport #19047: kraken: RGW leaking data New

History

#1 Updated by Loic Dachary 3 months ago

  • Target version deleted (v10.2.6)

#2 Updated by Marius Vaitiekunas 3 months ago

Some more details about an issue.

All our leaking buckets have on thing in common - hadoop S3A client (https://wiki.apache.org/hadoop/AmazonS3) is used. And some of the objects have long names with many underscores. For example:
dt=20160814-060014-911/_temporary/0/_temporary/attempt_201608140600_0001_m_000003_339/part-00003.gz
dt=20160814-083014-948/_temporary/0/_temporary/attempt_201608140830_0001_m_000006_294/part-00006.gz

#3 Updated by Samuel Just 2 months ago

  • Project changed from Ceph to rgw
  • Category deleted (radosgw)

#4 Updated by Wido den Hollander about 2 months ago

This issue is still active and happening on Jewel clusters.

The problem is that the orphan scan tool hangs in a loop on some systems which makes it very difficult to debug this.

Setting rados debug to 20 doesn't yield anything additional, it just keeps scanning the same RADOS objects over and over.

See #18258

Any hints to investigate this further? On the long run this becomes a problem for people since you can't keep adding hardware.

#5 Updated by Yehuda Sadeh about 2 months ago

If there's a scenario that reproduces the data leak, then if you could provide a log with 'debug rgw = 20' and 'debug ms = 1', and point at the leaking rados objects. We are also working on fixing the orphan tool.

#6 Updated by Yehuda Sadeh about 2 months ago

@mvaitiekunas @wido can you try running

 $ radosgw-admin -p <pool> ls

and see if it finishes? Also, is the infinite loop happening at the stage where it says 'storing X entries'?

#7 Updated by Yehuda Sadeh about 2 months ago

@mvaitiekunas @wido actually nevermind. I was able to reproduce the issue.

#8 Updated by Yehuda Sadeh about 2 months ago

  • Status changed from New to Need Review

#9 Updated by Wido den Hollander about 2 months ago

Yehuda Sadeh wrote:

https://github.com/ceph/ceph/pull/13147

Great! We will get testing.

Btw, this command went just fine:

rados -p <pool> ls

All PGs are active+clean. The logs just showed the tool kept going over and over repeating the same steps.

#10 Updated by Matas Tvarijonas about 2 months ago

Yehuda Sadeh wrote:

@mvaitiekunas @wido actually nevermind. I was able to reproduce the issue.

Hi, @Yehuda, was you able to reproduce data leak, or orphan find loop issue ? As we understood you fixed orphan find loop issue. What about data leak ? do you need logs with 'debug rgw = 20' and 'debug ms = 1' ?

#11 Updated by Yehuda Sadeh about 2 months ago

Was able to reproduce the orphans find loop issue. With regard to the leak, it could be a known issue related to multiple uploads of the same parts in multipart upload. If you could provide logs it'd be great. Thanks.

#12 Updated by Yehuda Sadeh about 2 months ago

@wido @mvaitiekunas I found an issue with the fix (breaks listing of multipart uploads). I'll update when it's cleared.

#13 Updated by Nathan Cutler about 2 months ago

  • Backport set to jewel

#14 Updated by Yehuda Sadeh about 2 months ago

@wido @mvaitiekunas current code cleared teuthology.

#15 Updated by Alexey Sheplyakov about 2 months ago

#16 Updated by Nathan Cutler about 2 months ago

  • Status changed from Need Review to Pending Backport

#17 Updated by Nathan Cutler about 1 month ago

  • Status changed from Pending Backport to Resolved

#18 Updated by Nathan Cutler about 1 month ago

  • Backport changed from jewel to jewel, kraken

#19 Updated by Nathan Cutler about 1 month ago

  • Duplicated by Bug #18258: rgw: radosgw-admin orphan find goes into infinite loop added

#20 Updated by Nathan Cutler about 1 month ago

  • Status changed from Resolved to Pending Backport

#21 Updated by Loic Dachary about 1 month ago

#22 Updated by George Mihaiescu 29 days ago

Hi,

I have the same problem and I think the leaked objects come from some failed or interrupted multipart uploads that happened a long time ago.

Our cluster has almost 38 TB of leaked data and I would like to recover the space:

root@controller1:~# radosgw-admin bucket stats | grep '"size_kb":' | awk '{print $2}' | sed 's/.$//' | paste -sd+ | bc
568528022778

root@controller1:~# ceph df detail | egrep -v "index|extra"| grep .rgw.buckets
.rgw.buckets 25 - N/A N/A 530T 47.71 581T 9309201 9091k 415M 28277k 1590T

I've ran the "radosgw-admin orphans find --pool=.rgw.buckets --job-id=orphans" (which took around 12 hours), and now there are 376 objects in the ".log" pool:

root@controller1:~# rados -p .log ls | head
orphan.scan.bck1.rados.10
obj_delete_at_hint.0000000078
orphan.scan.bck1.rados.1
orphan.scan.orphans.rados.17
orphan.scan.orphans.linked.41
obj_delete_at_hint.0000000068
obj_delete_at_hint.0000000085
orphan.scan.orphans.buckets.0
obj_delete_at_hint.0000000094
orphan.scan.orphans.buckets.58
root@controller1:~# rados -p .log ls | wc -l
376

I have also listed all the rados objects in the ".rgw.buckets" pool and they match the number reported by "ceph df" (9309201 objects):

root@controller1:~# ceph df | egrep -v "index|extra" | grep .rgw.buckets
.rgw.buckets 25 530T 47.71 581T 9309201

root@controller1:~# wc -l objects_in_all_buckets
9309201 objects_in_all_buckets

The question I have now is what to do next.

How should I use the data generated by the "radosgw-admin orphans find" command to clean up these files, and how to make sure I don't delete good data as well?

Thank you for help.

#23 Updated by George Mihaiescu 29 days ago

I forgot to mention that this is a cluster that was first deployed as Giant, then upgraded to Hammer and finally to Jewel.

root@controller1:~# radosgw-admin --version
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

Also available in: Atom PDF