Project

General

Profile

Actions

Bug #17163

open

Rados object leak

Added by Praveen Kumar G T over 7 years ago. Updated over 7 years ago.

Status:
In Progress
Priority:
Normal
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Rados objects gets leaked when a file is written with same name across multiple sessions at the same time. Rados objects does not end up in GC list when we overwrite files across
multiple systems at the same time.

*Following is the step to reproduce and the corresponding logs *
1) Take a snapshot of "rados df" command. Make sure no other clients are writing into the cluster
2) Make sure there are no items (both expired and non expired) pending in GC list
3) Create a bucket
4) Using multiple process [We used four threads and created object with same name for around 100 times each] create an object with same name at the same time. Since this is a overwrite the previous version of the file will be moved for garbage collection
5) After the completion of the above process. Check whether all the PUT command have passed.
6) Remove the bucket.
7) Now we should have 400 objects in our gc list. But we ended up see much less value. Sometimes less than 390 (so effectively more than 10 objects have been leaked)
8) After GC process is completed. Take a snapshot of "rados df" command of the cluster, You can clearly see that the rados object count in increased to the number of value that is leaked.

*Below are the logs *
*Step 1 [ No items in rgw.buckets ] *
[dev--mon~/] sudo rados df | grep ".in-chennai-1.rgw.buckets "
.in-chennai-1.rgw.buckets 806822521 200387 0 0 0 12671 9685 1792357 806823209

*Step 2 [ No items in GC list ] *
[dev--mon~/] sudo radosgw-admin gc list --include-all
[]

*Step 3 [ Creating bucket ] *
[deb~/] s3cmd mb s3://repro
Bucket 's3://repro/' created

*Step 4 & 5 [ Writing Data simultaneously across four systems, All 400 PUTs have passed ] *
[10.32.209.176~/stage-rw.ar.onename] grep -i "Finished putting the data" repro | wc -l
100
[10.32.233.205~/stage-rw.ar.onename] grep -i "Finished putting the data" repro | wc -l
100
[10.32.29.186~/stage-rw.ar.onename] grep -i "Finished putting the data" repro | wc -l
100
[10.32.97.159~/stage-rw.ar.onename] grep -i "Finished putting the data" repro | wc -l
100

[dev--mon~/] sudo rados df | grep ".in-chennai-1.rgw.buckets "
.in-chennai-1.rgw.buckets 810811377 201587 0 0 0 14231 10845 1803617 811006669

You can see we have 1200 objects more. Our file size is 10MB. So each PUT will result in 3 rados objects. We had 100 threads across four systems each. Which explains 1200 objects (3*400).

*Step 6 *
[deb~/] s3cmd rb -r s3://repro
WARNING: Bucket is not empty. Removing all the objects from it first. This may take some time...
delete: 's3://repro/onefile'
Bucket 's3://repro/' removed

*Step 7 *
[dev--mon~/] sudo radosgw-admin gc list --include-all | grep tag | wc -l
380

You can see only 380 objects ended up in GC list the other 20 is leaked.

Actions #1

Updated by Josh Durgin over 7 years ago

  • Project changed from Ceph to rgw
  • Category deleted (22)
Actions #2

Updated by Matt Benjamin over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Orit Wasserman

Folks, could you retest with (at earliest) the latest Hammer version (there have been multiple leak fixes)?

Actions

Also available in: Atom PDF