Bug #20380: Stale bucket index entries are left over after object deletions - rgw - Ceph

Actions

Copy link

Bug #20380

closed

Stale bucket index entries are left over after object deletions

Added by Pavan Rallabhandi almost 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

High

Assignee:

Matt Benjamin

Target version:

% Done:

Source:

Community (dev)

Tags:

Backport:

kraken, jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

On many of our clusters running Jewel (10.2.5+), am running into a strange problem of having stale bucket index entries left over for (some of the) objects deleted.

The symptoms are that the actual delete operation of an object is reported successful in the RGW logs, but a bucket list on the container would still show the deleted object. An attempt to download/stat of the object appropriately results in a failure. No failures are seen in the respective OSDs where the bucket index object is located. And rebuilding the bucket index by running ‘radosgw-admin bucket check –fix’ would fix the issue.

Though I could simulate the problem by instrumenting the code, to not to have invoked `complete_del` on the bucket index op https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that call is always seem to be made unless there is a cascading error from the actual delete operation of the object, which doesn’t seem to be the case here.

A simple script like below would easily reproduce the issue:


$# cat delete_create_object 

#!/bin/sh
i=1
objects=100
prefix=$1

while [ $i -lt $objects ]
do
    object=$prefix.`date +%Y-%m-%d:%H:%M:%S`
    `touch $object`
    swift -A http://localhost:8000/auth -U test:tester -K testing upload test $object
    swift -A http://localhost:8000/auth -U test:tester -K testing delete test $object
    swift -A http://localhost:8000/auth -U test:tester -K testing list test

i=`expr $i + 1`
done

$# cat script 
./delete_create_object one&
./delete_create_object two&
./delete_create_object three&
./delete_create_object four&
./delete_create_object five&
./delete_create_object six&
./delete_create_object seven&
./delete_create_object eight&
./delete_create_object nine&
./delete_create_object ten&

In one such run of the above script, I ended up with the below on my test single node (vstart) cluster running 1 RGW, 3 OSDs. A bucket-check with --fix resolves the issue.


$# swift -A http://localhost:8000/auth -U test:tester -K testing list test
eight.2017-06-22:15:25:59
eight.2017-06-22:15:26:31
eight.2017-06-22:15:26:39
eight.2017-06-22:15:27:22
eight.2017-06-22:15:27:56
eight.2017-06-22:15:29:20
eight.2017-06-22:15:30:10
eight.2017-06-22:15:31:24
eight.2017-06-22:15:31:31
eight.2017-06-22:15:32:17
eight.2017-06-22:15:33:26
eight.2017-06-22:15:33:53
eight.2017-06-22:15:34:20
five.2017-06-22:15:24:41
five.2017-06-22:15:25:27
five.2017-06-22:15:25:53
five.2017-06-22:15:27:07
five.2017-06-22:15:28:14
five.2017-06-22:15:28:18
five.2017-06-22:15:29:20
five.2017-06-22:15:31:04
five.2017-06-22:15:31:18
five.2017-06-22:15:31:31
five.2017-06-22:15:32:17
five.2017-06-22:15:33:40
five.2017-06-22:15:34:27
five.2017-06-22:15:35:05
five.2017-06-22:15:35:21
four.2017-06-22:15:24:48
four.2017-06-22:15:25:33
four.2017-06-22:15:26:06
four.2017-06-22:15:26:18
four.2017-06-22:15:26:25
four.2017-06-22:15:27:43
four.2017-06-22:15:28:18
four.2017-06-22:15:28:35
four.2017-06-22:15:32:31
four.2017-06-22:15:33:40
four.2017-06-22:15:33:53
four.2017-06-22:15:34:26
four.2017-06-22:15:35:05
four.2017-06-22:15:35:40
nine.2017-06-22:15:30:50
nine.2017-06-22:15:31:37
nine.2017-06-22:15:31:51
nine.2017-06-22:15:31:57
nine.2017-06-22:15:32:17
nine.2017-06-22:15:33:26
nine.2017-06-22:15:34:27
nine.2017-06-22:15:35:21
nine.2017-06-22:15:35:40
one.2017-06-22:15:25:53
one.2017-06-22:15:26:25
one.2017-06-22:15:26:39
one.2017-06-22:15:27:00
one.2017-06-22:15:27:07
one.2017-06-22:15:31:40
one.2017-06-22:15:31:51
one.2017-06-22:15:31:57
one.2017-06-22:15:32:37
one.2017-06-22:15:32:45
one.2017-06-22:15:34:00
one.2017-06-22:15:34:52
one.2017-06-22:15:34:58
seven.2017-06-22:15:25:20
seven.2017-06-22:15:25:27
seven.2017-06-22:15:25:33
seven.2017-06-22:15:26:49
seven.2017-06-22:15:27:00
seven.2017-06-22:15:27:13
seven.2017-06-22:15:27:22
seven.2017-06-22:15:28:18
seven.2017-06-22:15:28:35
seven.2017-06-22:15:31:04
seven.2017-06-22:15:31:11
seven.2017-06-22:15:31:24
seven.2017-06-22:15:31:44
seven.2017-06-22:15:34:20
seven.2017-06-22:15:34:27
six.2017-06-22:15:25:20
six.2017-06-22:15:26:06
six.2017-06-22:15:26:12
six.2017-06-22:15:27:13
six.2017-06-22:15:28:35
six.2017-06-22:15:28:42
six.2017-06-22:15:28:59
six.2017-06-22:15:29:35
six.2017-06-22:15:30:30
ten.2017-06-22:15:26:06
ten.2017-06-22:15:27:56
ten.2017-06-22:15:28:27
ten.2017-06-22:15:28:42
ten.2017-06-22:15:29:57
ten.2017-06-22:15:30:17
ten.2017-06-22:15:31:18
ten.2017-06-22:15:31:24
ten.2017-06-22:15:31:44
ten.2017-06-22:15:32:41
ten.2017-06-22:15:33:26
ten.2017-06-22:15:34:32
three.2017-06-22:15:24:41
three.2017-06-22:15:25:01
three.2017-06-22:15:25:53
three.2017-06-22:15:26:25
three.2017-06-22:15:27:43
three.2017-06-22:15:28:13
three.2017-06-22:15:28:59
three.2017-06-22:15:29:27
three.2017-06-22:15:30:04
three.2017-06-22:15:30:44
three.2017-06-22:15:30:57
three.2017-06-22:15:31:18
three.2017-06-22:15:32:31
three.2017-06-22:15:32:37
three.2017-06-22:15:32:52
three.2017-06-22:15:33:06
two.2017-06-22:15:25:27
two.2017-06-22:15:26:12
two.2017-06-22:15:26:18
two.2017-06-22:15:28:21
two.2017-06-22:15:31:44
two.2017-06-22:15:34:27
two.2017-06-22:15:34:52
two.2017-06-22:15:34:58

$# ./bin/radosgw-admin object stat --object three.2017-06-22:15:30:57 --bucket test
ERROR: failed to stat object, returned error: (2) No such file or directory

$# ./bin/radosgw-admin object stat --object ten.2017-06-22:15:30:17 --bucket test
ERROR: failed to stat object, returned error: (2) No such file or directory

$# ./bin/radosgw-admin bucket check --bucket test --check-objects
[]

$# ./bin/radosgw-admin bucket check --bucket test --check-objects --fix
[]
{}
{
    "existing_header": {
        "usage": {
            "rgw.main": {
                "size": 0,
                "size_actual": 0,
                "size_utilized": 0,
                "size_kb": 0,
                "size_kb_actual": 0,
                "size_kb_utilized": 0,
                "num_objects": 0
            }
        }
    },
    "calculated_header": {
        "usage": {}
    }
}

$# swift  -A http://localhost:8000/auth -U test:tester -K testing list test

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Pavan Rallabhandi almost 7 years ago

Source set to Community (dev)

Actions

Copy link

Updated by Pavan Rallabhandi almost 7 years ago

Description updated (diff)

Actions

Copy link

Updated by Pavan Rallabhandi almost 7 years ago

Description updated (diff)

Actions

Copy link

Updated by Orit Wasserman almost 7 years ago

Priority changed from Normal to High

Actions

Copy link

Updated by Orit Wasserman almost 7 years ago

Pavan, are you using versioning?

Actions

Copy link

Updated by Pavan Rallabhandi almost 7 years ago

Orit Wasserman wrote:

Pavan, are you using versioning?

No Orit, we are not using object versioning. One interesting observation from the repro steps is that, am not able to repro if the list operation is not present (out of create, delete, list). Only with those steps mentioned in the script am able to 100% repro.

Actions

Copy link

Updated by Matt Benjamin almost 7 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Matt Benjamin almost 7 years ago

Backport set to jewel kraken

Hi Pavan,

I haven't proved this resolves, but I ~~did~~ prove (in a non-reproducer scenario) that the tag_timeout values being produced did not compare equal with ones scaled as std::chrono::seconds. Assuming that's the race window, I'm not sure how large it is, maybe millis.

https://github.com/ceph/ceph/pull/16026

Actions

Copy link

Updated by Pavan Rallabhandi almost 7 years ago

Assignee changed from Pavan Rallabhandi to Matt Benjamin

Yes Matt, the fix seems to take effect. Thanks for the quick turnaround.

I was going through the commit where the resolution time had been changed, https://github.com/ceph/ceph/commit/416234b38be77ebb2decabc91a74055387a4b10a. Would this also mean we need to look at other places where `timespan` has been introduced and may possibly lead to other races?

Actions

Copy link

#10

Updated by Matt Benjamin almost 7 years ago

Yes, I think follows that there could be other ceph::timespan problems. I have so far grepped through the Ceph source and looked at all the instances.

From what I can tell, all the interesting instances use the make_timespan(const double d) inline function in common/ceph_time.h. Naively, it looks like this inline might induce the same problem, but for some reason, it ~~appears~~ as if that's not the case. I've proved that on my x86_64 system running F23, the following code does not assert:

auto t1 = make_timespan(20);
  auto t2 = ceph::timespan(std::chrono::seconds(20));
  assert(t1 == t2);

I intend to run this by Adam Emerson to see exactly how the correct unit is derived.

Matt

Actions

Copy link

#11

Updated by Matt Benjamin almost 7 years ago

Status changed from In Progress to Pending Backport

Actions

Copy link

#12

Updated by Matt Benjamin almost 7 years ago

Backport changed from jewel kraken to jewel luminous kraken

Actions

Copy link

#13

Updated by Nathan Cutler almost 7 years ago

Copied to Backport #20513: jewel: Stale bucket index entries are left over after object deletions added

Actions

Copy link

#14

Updated by Nathan Cutler almost 7 years ago

Backport changed from jewel luminous kraken to kraken, jewel

Actions

Copy link

#15

Updated by Nathan Cutler almost 7 years ago

Copied to Backport #20519: kraken: Stale bucket index entries are left over after object deletions added

Actions

Copy link

#16

Updated by Nathan Cutler over 6 years ago

Status changed from Pending Backport to Resolved

Actions

Copy link

Also available in: Atom PDF

	Copied to rgw - Backport #20513: jewel: Stale bucket index entries are left over after object deletions	Resolved	Casey Bodley				Actions
	Copied to rgw - Backport #20519: kraken: Stale bucket index entries are left over after object deletions	Rejected					Actions

Project

General

Profile

Ceph » rgw

Custom queries

Bug #20380

Stale bucket index entries are left over after object deletions

Updated by Pavan Rallabhandi almost 7 years ago

Updated by Pavan Rallabhandi almost 7 years ago

Updated by Pavan Rallabhandi almost 7 years ago

Updated by Orit Wasserman almost 7 years ago

Updated by Orit Wasserman almost 7 years ago

Updated by Pavan Rallabhandi almost 7 years ago

Updated by Matt Benjamin almost 7 years ago

Updated by Matt Benjamin almost 7 years ago

Updated by Pavan Rallabhandi almost 7 years ago

Updated by Matt Benjamin almost 7 years ago

Updated by Matt Benjamin almost 7 years ago

Updated by Matt Benjamin almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Nathan Cutler over 6 years ago