Bug #15446
opens3tests multipart tests leak objects: orphan tool claims still linked
0%
Description
TL;DR: In some cases, objects uploaded with multipart are still present after bucket removal AND cleanups AND orphan removal. The orphan tool claims they are still linked (to the remaining bucket.instance in at least once case).
Reproduction:
0. Optionally apply the improved nuke patch to s3tests (https://github.com/ceph/s3-tests/pull/106)
1. Run cleanup-rados.py script to empty most RGW pools (keep the user metadata only)
2. Wait for 10 seconds & confirm pools empty
3. Run single s3-test.
4. Wait 10 seconds for stat convergence
5. Run cleanup-rgw.sh script
5.1. Count how many Orphan leaks were found or claimed as still linked
6. Look for any remaining objects in pools
7. Reset for next test
Test results, Bad:
test_multipart_upload_resend_part: 71 obj in data pool; 49 caught by orphan-find, 22 remain, claimed as linked
test_multipart_upload_size_too_small: 10 objs in data pool + 1 obj in non-ec, NONE caught by orphan-find;
test_multipart_upload_missing_part: 1 obj in data pool, 1 obj in extra pool; data pool claimed linked; extra pool caught by orphan-find
test_multipart_upload_incorrect_etag: 1 obj in data pool, 1 obj in extra pool; data pool claimed linked; extra pool caught by orphan-find
Test results, Good:
test_multipart_upload_empty: 1 obj in non-ec pool, ALL caught by orphan find on non-ec pool
test_multipart_upload_small: 1 obj in data pool, ALL caught by orphan find on data pool
test_multipart_upload: 12 objs in data pool, ALL caught by orphan find on data pool
test_multipart_upload_multiple_sizes: 22 objs in data pool, ALL caught by orphan find on data pool
test_multipart_upload_contents: 7 objs in data pool, ALL caught by orphan-find
test_multipart_upload_overwrite_existing_object: 6 in data pool; ALL caught by orphan-find
test_abort_multipart_upload: 4 objs in data pool, ALL caught by orphan-find
test_abort_multipart_upload_not_found: 0 objs in data pool
test_list_multipart_upload: 7 objs in data pool; ALL caught by orphan-find
test_atomic_multipart_upload_write: 0 objs
test_versioning_obj_create_overwrite_multipart: 21 objs in data pool; all claimed linked
Scripts; cleanup-rgw.sh:
#!/bin/sh RGWADM="radosgw-admin 2>/dev/null" DATES='--start-date=2016-01-01 --end-date=2016-12-31' JOBID=$(date +%s) docmd() { echo "=== Running $*" eval $RGWADM "$@" } docmd objects expire $DATES docmd usage trim $DATES docmd mdlog trim $DATES docmd bilog trim $DATES docmd datalog trim $DATES docmd gc process --include-all docmd orphans find --job-id ${JOBID}-1 --orphan-stale-secs=1 --num-shards=1 --pool default.rgw.buckets.data docmd orphans find --job-id ${JOBID}-2 --orphan-stale-secs=1 --num-shards=1 --pool default.rgw.buckets.non-ec
Scripts, cleanup-rados.py:
#!/usr/bin/python import rados, sys cluster = rados.Rados(conffile='/etc/ceph/ceph.conf') cluster.connect() cluster_stats = cluster.get_cluster_stats() pools = cluster.list_pools() SAFE_RM_POOLS = [ 'default.rgw.buckets.data', 'default.rgw.buckets.index', 'default.rgw.buckets.non-ec', 'default.rgw.data.root', 'default.rgw.log', 'default.rgw.meta', ] for pool in pools: ioctx = cluster.open_ioctx(pool) object_iterator = ioctx.list_objects() while True : try: rados_object = object_iterator.next() print "Pool="+pool+" Object "+str(rados_object) if rados_object.key.startswith('.meta:user'): continue if pool in SAFE_RM_POOLS: rados_object.remove() except StopIteration : break # vim:ts=4 sts=4 sw=4 et:
Updated by Orit Wasserman about 8 years ago
RGW uses garbage collection which means that the deletion is not done immediately but after the gc runs.
Can you check if the objects are there after awhile (a few hours)?
I don't think you need to use the orphan tool for the cleanup, it should happen automatically