Project

General

Profile

Actions

Bug #18331

closed

RGW leaking data

Added by Matas Tvarijonas over 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
RGW leak data
Backport:
jewel, kraken
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello, we have leaking data at 50GB / hour on our ceph cluster 10.2.2

We use this command to calculate real usage
  1. radosgw-admin bucket stats | grep '"size_kb":' | awk '{print $2}' | sed 's/.$//' | paste -sd+ | bc

24912659802

This is about 28TB of used storage and our replication factor is 3, so it is near 90TB of RAW storage.

And we are checking pool size via:

  1. ceph df detail
    GLOBAL:
    SIZE AVAIL RAW USED %RAW USED OBJECTS
    507T 194T 312T 61.63 41756k
    POOLS:
    NAME ID CATEGORY QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
    data 0 - N/A N/A 240G 0.14 15492G 61613 61613 411k 194k 722G
    .rgw.root 13 - N/A N/A 3397 0 15492G 16 15 26007k 928 10191
    .rgw.control 14 - N/A N/A 0 0 15492G 8 3 0 0 0
    .rgw 15 - N/A N/A 297k 0 15492G 1273 1273 106M 21640k 891k
    .rgw.gc 16 - N/A N/A 0 0 15492G 512 492 374M 450M 0
    .users.uid 17 - N/A N/A 31695 0 15492G 145 145 72228k 22828k 95085
    .users 18 - N/A N/A 2425 0 15492G 80 79 81628 224 7275
    .users.email 19 - N/A N/A 1861 0 15492G 49 49 21 88 5583
    .rgw.buckets.index 20 - N/A N/A 0 0 15492G 979 979 1595M 744M 0
    .log 21 - N/A N/A 2423 0 15492G 321 320 39877k 103M 7269
    .rgw.buckets 22 - N/A N/A 74569G 43.09 15492G 34667278 33854k 4485M 1336M 218T
    cinder_backups 25 - N/A N/A 0 0 15492G 0 0 0 0 0
    cinder_volumes 26 - N/A N/A 18664G 10.78 15492G 4720798 4610k 15087M 335G 55992G
    nova_root 30 - N/A N/A 9349G 5.40 15492G 1259549 1230k 27544M 161G 28047G
    glance_images 33 - N/A N/A 3102G 1.79 15492G 397580 388k 58006k 947k 9306G
    migration 35 - N/A N/A 136 0 15492G 2 2 5 2 408
    .users.swift 37 - N/A N/A 107 0 15492G 8 8 6 14 321
    default.rgw.meta 38 - N/A N/A 475M 0 15492G 1648259 1609k 0 5494k 1425M
    default.rgw.buckets.non-ec 39 - N/A N/A 0 0 15492G 15 15 18064 315k 0
    .rgw.root.161220 40 - N/A N/A 7315 0 15492G 21 21 0 21 21945

So:

.rgw.buckets 22 - N/A N/A 74569G 43.09 15492G 34667278 33854k 4485M 1336M 218T

This is 218 TB used RAW space, so we are missing ~130 TB of RAW space.

Currently we leak 50GB /hour.

Very similar case is discussed here: http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-November/045037.html

Please look at this case.


Related issues 3 (0 open3 closed)

Has duplicate rgw - Bug #18258: rgw: radosgw-admin orphan find goes into infinite loopDuplicateMatt Benjamin12/15/2016

Actions
Copied to rgw - Backport #18827: jewel: RGW leaking dataResolvedMatt BenjaminActions
Copied to rgw - Backport #19047: kraken: RGW leaking dataResolvedNathan CutlerActions
Actions #1

Updated by Loïc Dachary over 7 years ago

  • Target version deleted (v10.2.6)
Actions #2

Updated by Marius Vaitiekunas over 7 years ago

Some more details about an issue.

All our leaking buckets have on thing in common - hadoop S3A client (https://wiki.apache.org/hadoop/AmazonS3) is used. And some of the objects have long names with many underscores. For example:
dt=20160814-060014-911/_temporary/0/_temporary/attempt_201608140600_0001_m_000003_339/part-00003.gz
dt=20160814-083014-948/_temporary/0/_temporary/attempt_201608140830_0001_m_000006_294/part-00006.gz

Actions #3

Updated by Samuel Just over 7 years ago

  • Project changed from Ceph to rgw
  • Category deleted (22)
Actions #4

Updated by Wido den Hollander about 7 years ago

This issue is still active and happening on Jewel clusters.

The problem is that the orphan scan tool hangs in a loop on some systems which makes it very difficult to debug this.

Setting rados debug to 20 doesn't yield anything additional, it just keeps scanning the same RADOS objects over and over.

See #18258

Any hints to investigate this further? On the long run this becomes a problem for people since you can't keep adding hardware.

Actions #5

Updated by Yehuda Sadeh about 7 years ago

If there's a scenario that reproduces the data leak, then if you could provide a log with 'debug rgw = 20' and 'debug ms = 1', and point at the leaking rados objects. We are also working on fixing the orphan tool.

Actions #6

Updated by Yehuda Sadeh about 7 years ago

@Marius Vaitiekunas @Wido den Hollander can you try running

 $ radosgw-admin -p <pool> ls

and see if it finishes? Also, is the infinite loop happening at the stage where it says 'storing X entries'?

Actions #7

Updated by Yehuda Sadeh about 7 years ago

@Marius Vaitiekunas @Wido den Hollander actually nevermind. I was able to reproduce the issue.

Actions #8

Updated by Yehuda Sadeh about 7 years ago

  • Status changed from New to Fix Under Review
Actions #9

Updated by Wido den Hollander about 7 years ago

Yehuda Sadeh wrote:

https://github.com/ceph/ceph/pull/13147

Great! We will get testing.

Btw, this command went just fine:

rados -p <pool> ls

All PGs are active+clean. The logs just showed the tool kept going over and over repeating the same steps.

Actions #10

Updated by Matas Tvarijonas about 7 years ago

Yehuda Sadeh wrote:

@Marius Vaitiekunas @Wido den Hollander actually nevermind. I was able to reproduce the issue.

Hi, @Yehuda Sadeh, was you able to reproduce data leak, or orphan find loop issue ? As we understood you fixed orphan find loop issue. What about data leak ? do you need logs with 'debug rgw = 20' and 'debug ms = 1' ?

Actions #11

Updated by Yehuda Sadeh about 7 years ago

Was able to reproduce the orphans find loop issue. With regard to the leak, it could be a known issue related to multiple uploads of the same parts in multipart upload. If you could provide logs it'd be great. Thanks.

Actions #12

Updated by Yehuda Sadeh about 7 years ago

@Wido den Hollander @Marius Vaitiekunas I found an issue with the fix (breaks listing of multipart uploads). I'll update when it's cleared.

Actions #13

Updated by Nathan Cutler about 7 years ago

  • Backport set to jewel
Actions #14

Updated by Yehuda Sadeh about 7 years ago

@Wido den Hollander @Marius Vaitiekunas current code cleared teuthology.

Actions #15

Updated by Alexey Sheplyakov about 7 years ago

Actions #16

Updated by Nathan Cutler about 7 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #17

Updated by Nathan Cutler about 7 years ago

  • Status changed from Pending Backport to Resolved
Actions #18

Updated by Nathan Cutler about 7 years ago

  • Backport changed from jewel to jewel, kraken
Actions #19

Updated by Nathan Cutler about 7 years ago

  • Has duplicate Bug #18258: rgw: radosgw-admin orphan find goes into infinite loop added
Actions #20

Updated by Nathan Cutler about 7 years ago

  • Status changed from Resolved to Pending Backport
Actions #21

Updated by Loïc Dachary about 7 years ago

Actions #22

Updated by George Mihaiescu about 7 years ago

Hi,

I have the same problem and I think the leaked objects come from some failed or interrupted multipart uploads that happened a long time ago.

Our cluster has almost 38 TB of leaked data and I would like to recover the space:

root@controller1:~# radosgw-admin bucket stats | grep '"size_kb":' | awk '{print $2}' | sed 's/.$//' | paste -sd+ | bc
568528022778

root@controller1:~# ceph df detail | egrep -v "index|extra"| grep .rgw.buckets
.rgw.buckets 25 - N/A N/A 530T 47.71 581T 9309201 9091k 415M 28277k 1590T

I've ran the "radosgw-admin orphans find --pool=.rgw.buckets --job-id=orphans" (which took around 12 hours), and now there are 376 objects in the ".log" pool:

root@controller1:~# rados -p .log ls | head
orphan.scan.bck1.rados.10
obj_delete_at_hint.0000000078
orphan.scan.bck1.rados.1
orphan.scan.orphans.rados.17
orphan.scan.orphans.linked.41
obj_delete_at_hint.0000000068
obj_delete_at_hint.0000000085
orphan.scan.orphans.buckets.0
obj_delete_at_hint.0000000094
orphan.scan.orphans.buckets.58
root@controller1:~# rados -p .log ls | wc -l
376

I have also listed all the rados objects in the ".rgw.buckets" pool and they match the number reported by "ceph df" (9309201 objects):

root@controller1:~# ceph df | egrep -v "index|extra" | grep .rgw.buckets
.rgw.buckets 25 530T 47.71 581T 9309201

root@controller1:~# wc -l objects_in_all_buckets
9309201 objects_in_all_buckets

The question I have now is what to do next.

How should I use the data generated by the "radosgw-admin orphans find" command to clean up these files, and how to make sure I don't delete good data as well?

Thank you for help.

Actions #23

Updated by George Mihaiescu about 7 years ago

I forgot to mention that this is a cluster that was first deployed as Giant, then upgraded to Hammer and finally to Jewel.

root@controller1:~# radosgw-admin --version
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

Actions #24

Updated by Nathan Cutler almost 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF