Bug #15446: s3tests multipart tests leak objects: orphan tool claims still linked - rgw - Ceph

Actions

Copy link

Bug #15446

open

s3tests multipart tests leak objects: orphan tool claims still linked

Added by Robin Johnson about 8 years ago. Updated 21 days ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Community (dev)

Tags:

rgw, leak, multipart, orphan

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

TL;DR: In some cases, objects uploaded with multipart are still present after bucket removal AND cleanups AND orphan removal. The orphan tool claims they are still linked (to the remaining bucket.instance in at least once case).

Reproduction:
0. Optionally apply the improved nuke patch to s3tests (https://github.com/ceph/s3-tests/pull/106)
1. Run cleanup-rados.py script to empty most RGW pools (keep the user metadata only)
2. Wait for 10 seconds & confirm pools empty
3. Run single s3-test.
4. Wait 10 seconds for stat convergence
5. Run cleanup-rgw.sh script
5.1. Count how many Orphan leaks were found or claimed as still linked
6. Look for any remaining objects in pools
7. Reset for next test

Test results, Bad:
test_multipart_upload_resend_part: 71 obj in data pool; 49 caught by orphan-find, 22 remain, claimed as linked
test_multipart_upload_size_too_small: 10 objs in data pool + 1 obj in non-ec, NONE caught by orphan-find;
test_multipart_upload_missing_part: 1 obj in data pool, 1 obj in extra pool; data pool claimed linked; extra pool caught by orphan-find
test_multipart_upload_incorrect_etag: 1 obj in data pool, 1 obj in extra pool; data pool claimed linked; extra pool caught by orphan-find

Test results, Good:
test_multipart_upload_empty: 1 obj in non-ec pool, ALL caught by orphan find on non-ec pool
test_multipart_upload_small: 1 obj in data pool, ALL caught by orphan find on data pool
test_multipart_upload: 12 objs in data pool, ALL caught by orphan find on data pool
test_multipart_upload_multiple_sizes: 22 objs in data pool, ALL caught by orphan find on data pool
test_multipart_upload_contents: 7 objs in data pool, ALL caught by orphan-find
test_multipart_upload_overwrite_existing_object: 6 in data pool; ALL caught by orphan-find
test_abort_multipart_upload: 4 objs in data pool, ALL caught by orphan-find
test_abort_multipart_upload_not_found: 0 objs in data pool
test_list_multipart_upload: 7 objs in data pool; ALL caught by orphan-find
test_atomic_multipart_upload_write: 0 objs
test_versioning_obj_create_overwrite_multipart: 21 objs in data pool; all claimed linked

Scripts; cleanup-rgw.sh:

#!/bin/sh
RGWADM="radosgw-admin 2>/dev/null" 
DATES='--start-date=2016-01-01 --end-date=2016-12-31'
JOBID=$(date +%s)
docmd() {
  echo "=== Running $*" 
  eval $RGWADM "$@" 
}
docmd objects expire $DATES
docmd usage trim $DATES
docmd mdlog trim $DATES
docmd bilog trim $DATES
docmd datalog trim $DATES
docmd gc process --include-all
docmd orphans find --job-id ${JOBID}-1 --orphan-stale-secs=1 --num-shards=1 --pool default.rgw.buckets.data
docmd orphans find --job-id ${JOBID}-2 --orphan-stale-secs=1 --num-shards=1 --pool default.rgw.buckets.non-ec

Scripts, cleanup-rados.py:

#!/usr/bin/python
import rados, sys
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
cluster.connect()
cluster_stats = cluster.get_cluster_stats()
pools = cluster.list_pools()
SAFE_RM_POOLS = [
    'default.rgw.buckets.data',
    'default.rgw.buckets.index',
    'default.rgw.buckets.non-ec',
    'default.rgw.data.root',
    'default.rgw.log',
    'default.rgw.meta',
    ]

for pool in pools:
    ioctx = cluster.open_ioctx(pool)
    object_iterator = ioctx.list_objects()
    while True :
        try:
            rados_object = object_iterator.next()
            print "Pool="+pool+" Object "+str(rados_object)
            if rados_object.key.startswith('.meta:user'):
                continue
            if pool in SAFE_RM_POOLS:
                rados_object.remove()
        except StopIteration :
            break

# vim:ts=4 sts=4 sw=4 et:

Actions

Copy link

Updated by Orit Wasserman about 8 years ago

RGW uses garbage collection which means that the deletion is not done immediately but after the gc runs.
Can you check if the objects are there after awhile (a few hours)?
I don't think you need to use the orphan tool for the cleanup, it should happen automatically

Actions

Copy link

Updated by Konstantin Shalygin 21 days ago

Backport deleted (~~hammer~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #15446

s3tests multipart tests leak objects: orphan tool claims still linked

Updated by Orit Wasserman about 8 years ago

Updated by Konstantin Shalygin 21 days ago