Project

General

Profile

Actions

Bug #20380

closed

Stale bucket index entries are left over after object deletions

Added by Pavan Rallabhandi almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
kraken, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On many of our clusters running Jewel (10.2.5+), am running into a strange problem of having stale bucket index entries left over for (some of the) objects deleted.

The symptoms are that the actual delete operation of an object is reported successful in the RGW logs, but a bucket list on the container would still show the deleted object. An attempt to download/stat of the object appropriately results in a failure. No failures are seen in the respective OSDs where the bucket index object is located. And rebuilding the bucket index by running ‘radosgw-admin bucket check –fix’ would fix the issue.

Though I could simulate the problem by instrumenting the code, to not to have invoked `complete_del` on the bucket index op https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that call is always seem to be made unless there is a cascading error from the actual delete operation of the object, which doesn’t seem to be the case here.

A simple script like below would easily reproduce the issue:


$# cat delete_create_object 

#!/bin/sh
i=1
objects=100
prefix=$1

while [ $i -lt $objects ]
do
    object=$prefix.`date +%Y-%m-%d:%H:%M:%S`
    `touch $object`
    swift -A http://localhost:8000/auth -U test:tester -K testing upload test $object
    swift -A http://localhost:8000/auth -U test:tester -K testing delete test $object
    swift -A http://localhost:8000/auth -U test:tester -K testing list test

i=`expr $i + 1`
done

$# cat script 
./delete_create_object one&
./delete_create_object two&
./delete_create_object three&
./delete_create_object four&
./delete_create_object five&
./delete_create_object six&
./delete_create_object seven&
./delete_create_object eight&
./delete_create_object nine&
./delete_create_object ten&

In one such run of the above script, I ended up with the below on my test single node (vstart) cluster running 1 RGW, 3 OSDs. A bucket-check with --fix resolves the issue.


$# swift -A http://localhost:8000/auth -U test:tester -K testing list test
eight.2017-06-22:15:25:59
eight.2017-06-22:15:26:31
eight.2017-06-22:15:26:39
eight.2017-06-22:15:27:22
eight.2017-06-22:15:27:56
eight.2017-06-22:15:29:20
eight.2017-06-22:15:30:10
eight.2017-06-22:15:31:24
eight.2017-06-22:15:31:31
eight.2017-06-22:15:32:17
eight.2017-06-22:15:33:26
eight.2017-06-22:15:33:53
eight.2017-06-22:15:34:20
five.2017-06-22:15:24:41
five.2017-06-22:15:25:27
five.2017-06-22:15:25:53
five.2017-06-22:15:27:07
five.2017-06-22:15:28:14
five.2017-06-22:15:28:18
five.2017-06-22:15:29:20
five.2017-06-22:15:31:04
five.2017-06-22:15:31:18
five.2017-06-22:15:31:31
five.2017-06-22:15:32:17
five.2017-06-22:15:33:40
five.2017-06-22:15:34:27
five.2017-06-22:15:35:05
five.2017-06-22:15:35:21
four.2017-06-22:15:24:48
four.2017-06-22:15:25:33
four.2017-06-22:15:26:06
four.2017-06-22:15:26:18
four.2017-06-22:15:26:25
four.2017-06-22:15:27:43
four.2017-06-22:15:28:18
four.2017-06-22:15:28:35
four.2017-06-22:15:32:31
four.2017-06-22:15:33:40
four.2017-06-22:15:33:53
four.2017-06-22:15:34:26
four.2017-06-22:15:35:05
four.2017-06-22:15:35:40
nine.2017-06-22:15:30:50
nine.2017-06-22:15:31:37
nine.2017-06-22:15:31:51
nine.2017-06-22:15:31:57
nine.2017-06-22:15:32:17
nine.2017-06-22:15:33:26
nine.2017-06-22:15:34:27
nine.2017-06-22:15:35:21
nine.2017-06-22:15:35:40
one.2017-06-22:15:25:53
one.2017-06-22:15:26:25
one.2017-06-22:15:26:39
one.2017-06-22:15:27:00
one.2017-06-22:15:27:07
one.2017-06-22:15:31:40
one.2017-06-22:15:31:51
one.2017-06-22:15:31:57
one.2017-06-22:15:32:37
one.2017-06-22:15:32:45
one.2017-06-22:15:34:00
one.2017-06-22:15:34:52
one.2017-06-22:15:34:58
seven.2017-06-22:15:25:20
seven.2017-06-22:15:25:27
seven.2017-06-22:15:25:33
seven.2017-06-22:15:26:49
seven.2017-06-22:15:27:00
seven.2017-06-22:15:27:13
seven.2017-06-22:15:27:22
seven.2017-06-22:15:28:18
seven.2017-06-22:15:28:35
seven.2017-06-22:15:31:04
seven.2017-06-22:15:31:11
seven.2017-06-22:15:31:24
seven.2017-06-22:15:31:44
seven.2017-06-22:15:34:20
seven.2017-06-22:15:34:27
six.2017-06-22:15:25:20
six.2017-06-22:15:26:06
six.2017-06-22:15:26:12
six.2017-06-22:15:27:13
six.2017-06-22:15:28:35
six.2017-06-22:15:28:42
six.2017-06-22:15:28:59
six.2017-06-22:15:29:35
six.2017-06-22:15:30:30
ten.2017-06-22:15:26:06
ten.2017-06-22:15:27:56
ten.2017-06-22:15:28:27
ten.2017-06-22:15:28:42
ten.2017-06-22:15:29:57
ten.2017-06-22:15:30:17
ten.2017-06-22:15:31:18
ten.2017-06-22:15:31:24
ten.2017-06-22:15:31:44
ten.2017-06-22:15:32:41
ten.2017-06-22:15:33:26
ten.2017-06-22:15:34:32
three.2017-06-22:15:24:41
three.2017-06-22:15:25:01
three.2017-06-22:15:25:53
three.2017-06-22:15:26:25
three.2017-06-22:15:27:43
three.2017-06-22:15:28:13
three.2017-06-22:15:28:59
three.2017-06-22:15:29:27
three.2017-06-22:15:30:04
three.2017-06-22:15:30:44
three.2017-06-22:15:30:57
three.2017-06-22:15:31:18
three.2017-06-22:15:32:31
three.2017-06-22:15:32:37
three.2017-06-22:15:32:52
three.2017-06-22:15:33:06
two.2017-06-22:15:25:27
two.2017-06-22:15:26:12
two.2017-06-22:15:26:18
two.2017-06-22:15:28:21
two.2017-06-22:15:31:44
two.2017-06-22:15:34:27
two.2017-06-22:15:34:52
two.2017-06-22:15:34:58

$# ./bin/radosgw-admin object stat --object three.2017-06-22:15:30:57 --bucket test
ERROR: failed to stat object, returned error: (2) No such file or directory

$# ./bin/radosgw-admin object stat --object ten.2017-06-22:15:30:17 --bucket test
ERROR: failed to stat object, returned error: (2) No such file or directory

$# ./bin/radosgw-admin bucket check --bucket test --check-objects
[]

$# ./bin/radosgw-admin bucket check --bucket test --check-objects --fix
[]
{}
{
    "existing_header": {
        "usage": {
            "rgw.main": {
                "size": 0,
                "size_actual": 0,
                "size_utilized": 0,
                "size_kb": 0,
                "size_kb_actual": 0,
                "size_kb_utilized": 0,
                "num_objects": 0
            }
        }
    },
    "calculated_header": {
        "usage": {}
    }
}

$# swift  -A http://localhost:8000/auth -U test:tester -K testing list test 


Related issues 2 (0 open2 closed)

Copied to rgw - Backport #20513: jewel: Stale bucket index entries are left over after object deletionsResolvedCasey BodleyActions
Copied to rgw - Backport #20519: kraken: Stale bucket index entries are left over after object deletionsRejectedActions
Actions #1

Updated by Pavan Rallabhandi almost 7 years ago

  • Source set to Community (dev)
Actions #2

Updated by Pavan Rallabhandi almost 7 years ago

  • Description updated (diff)
Actions #3

Updated by Pavan Rallabhandi almost 7 years ago

  • Description updated (diff)
Actions #4

Updated by Orit Wasserman almost 7 years ago

  • Priority changed from Normal to High
Actions #5

Updated by Orit Wasserman almost 7 years ago

Pavan, are you using versioning?

Actions #6

Updated by Pavan Rallabhandi almost 7 years ago

Orit Wasserman wrote:

Pavan, are you using versioning?

No Orit, we are not using object versioning. One interesting observation from the repro steps is that, am not able to repro if the list operation is not present (out of create, delete, list). Only with those steps mentioned in the script am able to 100% repro.

Actions #7

Updated by Matt Benjamin almost 7 years ago

  • Status changed from New to In Progress
Actions #8

Updated by Matt Benjamin almost 7 years ago

  • Backport set to jewel kraken

Hi Pavan,

I haven't proved this resolves, but I did prove (in a non-reproducer scenario) that the tag_timeout values being produced did not compare equal with ones scaled as std::chrono::seconds. Assuming that's the race window, I'm not sure how large it is, maybe millis.

https://github.com/ceph/ceph/pull/16026

Actions #9

Updated by Pavan Rallabhandi almost 7 years ago

  • Assignee changed from Pavan Rallabhandi to Matt Benjamin

Yes Matt, the fix seems to take effect. Thanks for the quick turnaround.

I was going through the commit where the resolution time had been changed, https://github.com/ceph/ceph/commit/416234b38be77ebb2decabc91a74055387a4b10a. Would this also mean we need to look at other places where `timespan` has been introduced and may possibly lead to other races?

Actions #10

Updated by Matt Benjamin almost 7 years ago

Yes, I think follows that there could be other ceph::timespan problems. I have so far grepped through the Ceph source and looked at all the instances.

From what I can tell, all the interesting instances use the make_timespan(const double d) inline function in common/ceph_time.h. Naively, it looks like this inline might induce the same problem, but for some reason, it appears as if that's not the case. I've proved that on my x86_64 system running F23, the following code does not assert:

auto t1 = make_timespan(20);
auto t2 = ceph::timespan(std::chrono::seconds(20));
assert(t1 == t2);

I intend to run this by Adam Emerson to see exactly how the correct unit is derived.

Matt

Actions #11

Updated by Matt Benjamin almost 7 years ago

  • Status changed from In Progress to Pending Backport
Actions #12

Updated by Matt Benjamin almost 7 years ago

  • Backport changed from jewel kraken to jewel luminous kraken
Actions #13

Updated by Nathan Cutler almost 7 years ago

  • Copied to Backport #20513: jewel: Stale bucket index entries are left over after object deletions added
Actions #14

Updated by Nathan Cutler almost 7 years ago

  • Backport changed from jewel luminous kraken to kraken, jewel
Actions #15

Updated by Nathan Cutler almost 7 years ago

  • Copied to Backport #20519: kraken: Stale bucket index entries are left over after object deletions added
Actions #16

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF