Bug #18331: RGW leaking data - rgw - Ceph

Actions

Copy link

Bug #18331

closed

RGW leaking data

Added by Matas Tvarijonas over 7 years ago. Updated almost 7 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Target version:

% Done:

Source:

Community (user)

Tags:

RGW leak data

Backport:

jewel, kraken

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v10.2.2

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hello, we have leaking data at 50GB / hour on our ceph cluster 10.2.2

We use this command to calculate real usage

radosgw-admin bucket stats | grep '"size_kb":' | awk '{print $2}' | sed 's/.$//' | paste -sd+ | bc

24912659802

This is about 28TB of used storage and our replication factor is 3, so it is near 90TB of RAW storage.

And we are checking pool size via:

ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
507T 194T 312T 61.63 41756k
POOLS:
NAME ID CATEGORY QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
data 0 - N/A N/A 240G 0.14 15492G 61613 61613 411k 194k 722G
.rgw.root 13 - N/A N/A 3397 0 15492G 16 15 26007k 928 10191
.rgw.control 14 - N/A N/A 0 0 15492G 8 3 0 0 0
.rgw 15 - N/A N/A 297k 0 15492G 1273 1273 106M 21640k 891k
.rgw.gc 16 - N/A N/A 0 0 15492G 512 492 374M 450M 0
.users.uid 17 - N/A N/A 31695 0 15492G 145 145 72228k 22828k 95085
.users 18 - N/A N/A 2425 0 15492G 80 79 81628 224 7275
.users.email 19 - N/A N/A 1861 0 15492G 49 49 21 88 5583
.rgw.buckets.index 20 - N/A N/A 0 0 15492G 979 979 1595M 744M 0
.log 21 - N/A N/A 2423 0 15492G 321 320 39877k 103M 7269
.rgw.buckets 22 - N/A N/A 74569G 43.09 15492G 34667278 33854k 4485M 1336M 218T
cinder_backups 25 - N/A N/A 0 0 15492G 0 0 0 0 0
cinder_volumes 26 - N/A N/A 18664G 10.78 15492G 4720798 4610k 15087M 335G 55992G
nova_root 30 - N/A N/A 9349G 5.40 15492G 1259549 1230k 27544M 161G 28047G
glance_images 33 - N/A N/A 3102G 1.79 15492G 397580 388k 58006k 947k 9306G
migration 35 - N/A N/A 136 0 15492G 2 2 5 2 408
.users.swift 37 - N/A N/A 107 0 15492G 8 8 6 14 321
default.rgw.meta 38 - N/A N/A 475M 0 15492G 1648259 1609k 0 5494k 1425M
default.rgw.buckets.non-ec 39 - N/A N/A 0 0 15492G 15 15 18064 315k 0
.rgw.root.161220 40 - N/A N/A 7315 0 15492G 21 21 0 21 21945

So:

.rgw.buckets 22 - N/A N/A 74569G 43.09 15492G 34667278 33854k 4485M 1336M 218T

This is 218 TB used RAW space, so we are missing ~130 TB of RAW space.

Currently we leak 50GB /hour.

Very similar case is discussed here: http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-November/045037.html

Please look at this case.

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Loïc Dachary over 7 years ago

Target version deleted (~~v10.2.6~~)

Actions

Copy link

Updated by Marius Vaitiekunas over 7 years ago

Some more details about an issue.

All our leaking buckets have on thing in common - hadoop S3A client (https://wiki.apache.org/hadoop/AmazonS3) is used. And some of the objects have long names with many underscores. For example:
dt=20160814-060014-911/_temporary/0/_temporary/attempt_201608140600_0001_m_000003_339/part-00003.gz
dt=20160814-083014-948/_temporary/0/_temporary/attempt_201608140830_0001_m_000006_294/part-00006.gz

Actions

Copy link

Updated by Samuel Just over 7 years ago

Project changed from Ceph to rgw
Category deleted (22)

Actions

Copy link

Updated by Wido den Hollander about 7 years ago

This issue is still active and happening on Jewel clusters.

The problem is that the orphan scan tool hangs in a loop on some systems which makes it very difficult to debug this.

Setting rados debug to 20 doesn't yield anything additional, it just keeps scanning the same RADOS objects over and over.

See #18258

Any hints to investigate this further? On the long run this becomes a problem for people since you can't keep adding hardware.

Actions

Copy link

Updated by Yehuda Sadeh about 7 years ago

If there's a scenario that reproduces the data leak, then if you could provide a log with 'debug rgw = 20' and 'debug ms = 1', and point at the leaking rados objects. We are also working on fixing the orphan tool.

Actions

Copy link

Updated by Yehuda Sadeh about 7 years ago

@Marius Vaitiekunas @Wido den Hollander can you try running

 $ radosgw-admin -p <pool> ls

and see if it finishes? Also, is the infinite loop happening at the stage where it says 'storing X entries'?

Actions

Copy link

Updated by Yehuda Sadeh about 7 years ago

@Marius Vaitiekunas @Wido den Hollander actually nevermind. I was able to reproduce the issue.

Actions

Copy link

Updated by Yehuda Sadeh about 7 years ago

Status changed from New to Fix Under Review

https://github.com/ceph/ceph/pull/13147

Actions

Copy link

Updated by Wido den Hollander about 7 years ago

Yehuda Sadeh wrote:

https://github.com/ceph/ceph/pull/13147

Great! We will get testing.

Btw, this command went just fine:

rados -p <pool> ls

All PGs are active+clean. The logs just showed the tool kept going over and over repeating the same steps.

Actions

Copy link

#10

Updated by Matas Tvarijonas about 7 years ago

Yehuda Sadeh wrote:

@Marius Vaitiekunas @Wido den Hollander actually nevermind. I was able to reproduce the issue.

Hi, @Yehuda Sadeh, was you able to reproduce data leak, or orphan find loop issue ? As we understood you fixed orphan find loop issue. What about data leak ? do you need logs with 'debug rgw = 20' and 'debug ms = 1' ?

Actions

Copy link

#11

Updated by Yehuda Sadeh about 7 years ago

Was able to reproduce the orphans find loop issue. With regard to the leak, it could be a known issue related to multiple uploads of the same parts in multipart upload. If you could provide logs it'd be great. Thanks.

Actions

Copy link

#12

Updated by Yehuda Sadeh about 7 years ago

@Wido den Hollander @Marius Vaitiekunas I found an issue with the fix (breaks listing of multipart uploads). I'll update when it's cleared.

Actions

Copy link

#13

Updated by Nathan Cutler about 7 years ago

Backport set to jewel

Actions

Copy link

#14

Updated by Yehuda Sadeh about 7 years ago

@Wido den Hollander @Marius Vaitiekunas current code cleared teuthology.

Actions

Copy link

#15

Updated by Alexey Sheplyakov about 7 years ago

Copied to Backport #18827: jewel: RGW leaking data added

Actions

Copy link

#16

Updated by Nathan Cutler about 7 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#17

Updated by Nathan Cutler about 7 years ago

Status changed from Pending Backport to Resolved

Actions

Copy link

#18

Updated by Nathan Cutler about 7 years ago

Backport changed from jewel to jewel, kraken

Actions

Copy link

#19

Updated by Nathan Cutler about 7 years ago

Has duplicate Bug #18258: rgw: radosgw-admin orphan find goes into infinite loop added

Actions

Copy link

#20

Updated by Nathan Cutler about 7 years ago

Status changed from Resolved to Pending Backport

Actions

Copy link

#21

Updated by Loïc Dachary about 7 years ago

Copied to Backport #19047: kraken: RGW leaking data added

Actions

Copy link

#22

Updated by George Mihaiescu about 7 years ago

Hi,

I have the same problem and I think the leaked objects come from some failed or interrupted multipart uploads that happened a long time ago.

Our cluster has almost 38 TB of leaked data and I would like to recover the space:

root@controller1:~# radosgw-admin bucket stats | grep '"size_kb":' | awk '{print $2}' | sed 's/.$//' | paste -sd+ | bc
568528022778

root@controller1:~# ceph df detail | egrep -v "index|extra"| grep .rgw.buckets
.rgw.buckets 25 - N/A N/A 530T 47.71 581T 9309201 9091k 415M 28277k 1590T

I've ran the "radosgw-admin orphans find --pool=.rgw.buckets --job-id=orphans" (which took around 12 hours), and now there are 376 objects in the ".log" pool:

root@controller1:~# rados -p .log ls | head
orphan.scan.bck1.rados.10
obj_delete_at_hint.0000000078
orphan.scan.bck1.rados.1
orphan.scan.orphans.rados.17
orphan.scan.orphans.linked.41
obj_delete_at_hint.0000000068
obj_delete_at_hint.0000000085
orphan.scan.orphans.buckets.0
obj_delete_at_hint.0000000094
orphan.scan.orphans.buckets.58
root@controller1:~# rados -p .log ls | wc -l
376

I have also listed all the rados objects in the ".rgw.buckets" pool and they match the number reported by "ceph df" (9309201 objects):

root@controller1:~# ceph df | egrep -v "index|extra" | grep .rgw.buckets
.rgw.buckets 25 530T 47.71 581T 9309201

root@controller1:~# wc -l objects_in_all_buckets
9309201 objects_in_all_buckets

The question I have now is what to do next.

How should I use the data generated by the "radosgw-admin orphans find" command to clean up these files, and how to make sure I don't delete good data as well?

Thank you for help.

Actions

Copy link

#23

Updated by George Mihaiescu about 7 years ago

I forgot to mention that this is a cluster that was first deployed as Giant, then upgraded to Hammer and finally to Jewel.

root@controller1:~# radosgw-admin --version
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

Actions

Copy link

#24

Updated by Nathan Cutler almost 7 years ago

Status changed from Pending Backport to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #18331

RGW leaking data

Updated by Loïc Dachary over 7 years ago

Updated by Marius Vaitiekunas over 7 years ago

Updated by Samuel Just over 7 years ago

Updated by Wido den Hollander about 7 years ago

Updated by Yehuda Sadeh about 7 years ago

Updated by Yehuda Sadeh about 7 years ago

Updated by Yehuda Sadeh about 7 years ago

Updated by Yehuda Sadeh about 7 years ago

Updated by Wido den Hollander about 7 years ago

Updated by Matas Tvarijonas about 7 years ago

Updated by Yehuda Sadeh about 7 years ago

Updated by Yehuda Sadeh about 7 years ago

Updated by Nathan Cutler about 7 years ago

Updated by Yehuda Sadeh about 7 years ago

Updated by Alexey Sheplyakov about 7 years ago

Updated by Nathan Cutler about 7 years ago

Updated by Nathan Cutler about 7 years ago

Updated by Nathan Cutler about 7 years ago

Updated by Nathan Cutler about 7 years ago

Updated by Nathan Cutler about 7 years ago

Updated by Loïc Dachary about 7 years ago

Updated by George Mihaiescu about 7 years ago

Updated by George Mihaiescu about 7 years ago

Updated by Nathan Cutler almost 7 years ago