Project

General

Profile

Bug #20380

Updated by Pavan Rallabhandi almost 7 years ago

On many of our clusters running Jewel (10.2.5+), am running into a strange problem of having stale bucket index entries left over for (some of the) objects deleted. 

 The symptoms are that the actual delete operation of an object is reported successful in the RGW logs, but a bucket list on the container would still show the deleted object. An attempt to download/stat of the object appropriately results in a failure. No failures are seen in the respective OSDs where the bucket index object is located. And rebuilding the bucket index by running ‘radosgw-admin bucket check –fix’ would fix the issue. 

 Though I could simulate the problem by instrumenting the code, to not to have invoked `complete_del` on the bucket index op https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that call is always seem to be made unless there is a cascading error from the actual delete operation of the object, which doesn’t seem to be the case here. 

 A simple script like below would easily reproduce the issue: 

 <pre> <snip> 

 $# # cat delete_create_object  

 #!/bin/sh 
 i=1 
 objects=100 
 container=$1 

 while [ $i -lt $objects ] 
 do 
     object=$container.`date +%Y-%m-%d:%H:%M:%S` 
     `touch $object` 
     swift -A http://localhost:8000/auth -U test:tester -K testing upload test $object 
     swift -A http://localhost:8000/auth -U test:tester -K testing delete test $object 
     swift -A http://localhost:8000/auth -U test:tester -K testing list test 

 i=`expr $i + 1` 
 done 

 $# # cat script  
 ./delete_create_object one& 
 ./delete_create_object two& 
 ./delete_create_object three& 
 ./delete_create_object four& 
 ./delete_create_object five& 
 ./delete_create_object six& 
 ./delete_create_object seven& 
 ./delete_create_object eight& 
 ./delete_create_object nine& 
 ./delete_create_object ten& 


 </pre> <\snip> 

 In one such run of the above script, I ended up with the below on my test single node (vstart) cluster running 1 RGW, 3 OSDs. A bucket-check with --fix resolves the issue. 

 <pre> <snip> 

 $# # swift -A http://localhost:8000/auth -U test:tester -K testing list test 
 eight.2017-06-22:15:25:59 
 eight.2017-06-22:15:26:31 
 eight.2017-06-22:15:26:39 
 eight.2017-06-22:15:27:22 
 eight.2017-06-22:15:27:56 
 eight.2017-06-22:15:29:20 
 eight.2017-06-22:15:30:10 
 eight.2017-06-22:15:31:24 
 eight.2017-06-22:15:31:31 
 eight.2017-06-22:15:32:17 
 eight.2017-06-22:15:33:26 
 eight.2017-06-22:15:33:53 
 eight.2017-06-22:15:34:20 
 five.2017-06-22:15:24:41 
 five.2017-06-22:15:25:27 
 five.2017-06-22:15:25:53 
 five.2017-06-22:15:27:07 
 five.2017-06-22:15:28:14 
 five.2017-06-22:15:28:18 
 five.2017-06-22:15:29:20 
 five.2017-06-22:15:31:04 
 five.2017-06-22:15:31:18 
 five.2017-06-22:15:31:31 
 five.2017-06-22:15:32:17 
 five.2017-06-22:15:33:40 
 five.2017-06-22:15:34:27 
 five.2017-06-22:15:35:05 
 five.2017-06-22:15:35:21 
 four.2017-06-22:15:24:48 
 four.2017-06-22:15:25:33 
 four.2017-06-22:15:26:06 
 four.2017-06-22:15:26:18 
 four.2017-06-22:15:26:25 
 four.2017-06-22:15:27:43 
 four.2017-06-22:15:28:18 
 four.2017-06-22:15:28:35 
 four.2017-06-22:15:32:31 
 four.2017-06-22:15:33:40 
 four.2017-06-22:15:33:53 
 four.2017-06-22:15:34:26 
 four.2017-06-22:15:35:05 
 four.2017-06-22:15:35:40 
 nine.2017-06-22:15:30:50 
 nine.2017-06-22:15:31:37 
 nine.2017-06-22:15:31:51 
 nine.2017-06-22:15:31:57 
 nine.2017-06-22:15:32:17 
 nine.2017-06-22:15:33:26 
 nine.2017-06-22:15:34:27 
 nine.2017-06-22:15:35:21 
 nine.2017-06-22:15:35:40 
 one.2017-06-22:15:25:53 
 one.2017-06-22:15:26:25 
 one.2017-06-22:15:26:39 
 one.2017-06-22:15:27:00 
 one.2017-06-22:15:27:07 
 one.2017-06-22:15:31:40 
 one.2017-06-22:15:31:51 
 one.2017-06-22:15:31:57 
 one.2017-06-22:15:32:37 
 one.2017-06-22:15:32:45 
 one.2017-06-22:15:34:00 
 one.2017-06-22:15:34:52 
 one.2017-06-22:15:34:58 
 seven.2017-06-22:15:25:20 
 seven.2017-06-22:15:25:27 
 seven.2017-06-22:15:25:33 
 seven.2017-06-22:15:26:49 
 seven.2017-06-22:15:27:00 
 seven.2017-06-22:15:27:13 
 seven.2017-06-22:15:27:22 
 seven.2017-06-22:15:28:18 
 seven.2017-06-22:15:28:35 
 seven.2017-06-22:15:31:04 
 seven.2017-06-22:15:31:11 
 seven.2017-06-22:15:31:24 
 seven.2017-06-22:15:31:44 
 seven.2017-06-22:15:34:20 
 seven.2017-06-22:15:34:27 
 six.2017-06-22:15:25:20 
 six.2017-06-22:15:26:06 
 six.2017-06-22:15:26:12 
 six.2017-06-22:15:27:13 
 six.2017-06-22:15:28:35 
 six.2017-06-22:15:28:42 
 six.2017-06-22:15:28:59 
 six.2017-06-22:15:29:35 
 six.2017-06-22:15:30:30 
 ten.2017-06-22:15:26:06 
 ten.2017-06-22:15:27:56 
 ten.2017-06-22:15:28:27 
 ten.2017-06-22:15:28:42 
 ten.2017-06-22:15:29:57 
 ten.2017-06-22:15:30:17 
 ten.2017-06-22:15:31:18 
 ten.2017-06-22:15:31:24 
 ten.2017-06-22:15:31:44 
 ten.2017-06-22:15:32:41 
 ten.2017-06-22:15:33:26 
 ten.2017-06-22:15:34:32 
 three.2017-06-22:15:24:41 
 three.2017-06-22:15:25:01 
 three.2017-06-22:15:25:53 
 three.2017-06-22:15:26:25 
 three.2017-06-22:15:27:43 
 three.2017-06-22:15:28:13 
 three.2017-06-22:15:28:59 
 three.2017-06-22:15:29:27 
 three.2017-06-22:15:30:04 
 three.2017-06-22:15:30:44 
 three.2017-06-22:15:30:57 
 three.2017-06-22:15:31:18 
 three.2017-06-22:15:32:31 
 three.2017-06-22:15:32:37 
 three.2017-06-22:15:32:52 
 three.2017-06-22:15:33:06 
 two.2017-06-22:15:25:27 
 two.2017-06-22:15:26:12 
 two.2017-06-22:15:26:18 
 two.2017-06-22:15:28:21 
 two.2017-06-22:15:31:44 
 two.2017-06-22:15:34:27 
 two.2017-06-22:15:34:52 
 two.2017-06-22:15:34:58 

 $# # ./bin/radosgw-admin object stat --object three.2017-06-22:15:30:57 --bucket test 
 ERROR: failed to stat object, returned error: (2) No such file or directory 

 $# 
 # ./bin/radosgw-admin object stat --object ten.2017-06-22:15:30:17 --bucket test 
 ERROR: failed to stat object, returned error: (2) No such file or directory 

 $# # ./bin/radosgw-admin bucket check --bucket test --check-objects 
 [] 

 $# # ./bin/radosgw-admin bucket check --bucket test --check-objects --fix 
 [] 
 {} 
 { 
     "existing_header": { 
         "usage": { 
             "rgw.main": { 
                 "size": 0, 
                 "size_actual": 0, 
                 "size_utilized": 0, 
                 "size_kb": 0, 
                 "size_kb_actual": 0, 
                 "size_kb_utilized": 0, 
                 "num_objects": 0 
             } 
         } 
     }, 
     "calculated_header": { 
         "usage": {} 
     } 
 } 

 $# # swift    -A http://localhost:8000/auth -U test:tester -K testing list test  

 </pre> 
 <\snip> 

Back