Project

General

Profile

Actions

Bug #6899

closed

rgw: delete bucket on secondary zone returns success, but bucket is not deleted

Added by Josh Durgin over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This occured during a multi-zone sync test on dumpling. The sync agent requested a delete of the 'mybar' bucket to the secondary region, and received a 200 response. However, the bucket still exists, and the rgw log shows that there was an error ECANCELED (presumably from checking the object version).

Here's the log of the delete request:

2013-11-25 18:42:09.092063 7f9597fb7700  1 ====== starting new request req=0x137df10 =====
2013-11-25 18:42:09.092139 7f9597fb7700  2 req 962:0.000075::DELETE /admin/metadata/bucket::initializing
2013-11-25 18:42:09.092186 7f9597fb7700 20 FCGI_ROLE=RESPONDER
2013-11-25 18:42:09.092191 7f9597fb7700 20 SCRIPT_URL=/admin/metadata/bucket
2013-11-25 18:42:09.092192 7f9597fb7700 20 SCRIPT_URI=http://mira021.front.sepia.ceph.com:7281/admin/metadata/bucket
2013-11-25 18:42:09.092194 7f9597fb7700 20 HTTP_AUTHORIZATION=AWS 1te6NH5mcdcq0Tc5i8i3:cNfK0sK3R6iu0em7Jg/GK6pEBGU=
2013-11-25 18:42:09.092196 7f9597fb7700 20 RGW_LOG_LEVEL=20
2013-11-25 18:42:09.092197 7f9597fb7700 20 RGW_SHOULD_LOG=yes
2013-11-25 18:42:09.092199 7f9597fb7700 20 RGW_PRINT_CONTINUE=on
2013-11-25 18:42:09.092200 7f9597fb7700 20 HTTP_HOST=mira021.front.sepia.ceph.com:7281
2013-11-25 18:42:09.092201 7f9597fb7700 20 CONTENT_LENGTH=0
2013-11-25 18:42:09.092202 7f9597fb7700 20 HTTP_ACCEPT_ENCODING=gzip, deflate, compress
2013-11-25 18:42:09.092203 7f9597fb7700 20 HTTP_ACCEPT=*/*
2013-11-25 18:42:09.092204 7f9597fb7700 20 HTTP_USER_AGENT=Boto/2.18.0 Python/2.7.3 Linux/3.2.0-55-generic
2013-11-25 18:42:09.092205 7f9597fb7700 20 HTTP_DATE=Tue, 26 Nov 2013 02:42:09 GMT
2013-11-25 18:42:09.092206 7f9597fb7700 20 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
2013-11-25 18:42:09.092208 7f9597fb7700 20 SERVER_SIGNATURE=
2013-11-25 18:42:09.092209 7f9597fb7700 20 SERVER_SOFTWARE=Apache/2.2.22 (Ubuntu) mod_fastcgi/mod_fastcgi-SNAP-0910052141
2013-11-25 18:42:09.092210 7f9597fb7700 20 SERVER_NAME=mira021.front.sepia.ceph.com
2013-11-25 18:42:09.092211 7f9597fb7700 20 SERVER_ADDR=10.214.132.136
2013-11-25 18:42:09.092212 7f9597fb7700 20 SERVER_PORT=7281
2013-11-25 18:42:09.092213 7f9597fb7700 20 REMOTE_ADDR=10.214.132.136
2013-11-25 18:42:09.092214 7f9597fb7700 20 DOCUMENT_ROOT=/home/ubuntu/cephtest/apache/htdocs.client.1
2013-11-25 18:42:09.092216 7f9597fb7700 20 SERVER_ADMIN=[no address given]
2013-11-25 18:42:09.092218 7f9597fb7700 20 SCRIPT_FILENAME=/home/ubuntu/cephtest/apache/htdocs.client.1/rgw.fcgi
2013-11-25 18:42:09.092220 7f9597fb7700 20 REMOTE_PORT=57487
2013-11-25 18:42:09.092222 7f9597fb7700 20 GATEWAY_INTERFACE=CGI/1.1
2013-11-25 18:42:09.092224 7f9597fb7700 20 SERVER_PROTOCOL=HTTP/1.1
2013-11-25 18:42:09.092226 7f9597fb7700 20 REQUEST_METHOD=DELETE
2013-11-25 18:42:09.092228 7f9597fb7700 20 QUERY_STRING=page=admin&params=/metadata/bucket&key=mybar
2013-11-25 18:42:09.092229 7f9597fb7700 20 REQUEST_URI=/admin/metadata/bucket?key=mybar
2013-11-25 18:42:09.092231 7f9597fb7700 20 SCRIPT_NAME=/admin/metadata/bucket
2013-11-25 18:42:09.092234 7f9597fb7700  2 req 962:0.000171::DELETE /admin/metadata/bucket::getting op
2013-11-25 18:42:09.092241 7f9597fb7700  2 req 962:0.000178::DELETE /admin/metadata/bucket:remove_metadata:authorizing
2013-11-25 18:42:09.092257 7f9597fb7700 20 get_obj_state: rctx=0x7f9538003590 obj=.users.1:1te6NH5mcdcq0Tc5i8i3 state=0x7f95380030a8 s->prefetch_data=0
2013-11-25 18:42:09.092266 7f9597fb7700 10 moving .users.1+1te6NH5mcdcq0Tc5i8i3 to cache LRU end
2013-11-25 18:42:09.092269 7f9597fb7700 10 cache get: name=.users.1+1te6NH5mcdcq0Tc5i8i3 : hit
2013-11-25 18:42:09.092274 7f9597fb7700 20 get_obj_state: s->obj_tag was set empty
2013-11-25 18:42:09.092281 7f9597fb7700 10 moving .users.1+1te6NH5mcdcq0Tc5i8i3 to cache LRU end
2013-11-25 18:42:09.092285 7f9597fb7700 10 cache get: name=.users.1+1te6NH5mcdcq0Tc5i8i3 : hit
2013-11-25 18:42:09.092305 7f9597fb7700 20 get_obj_state: rctx=0x7f9538003590 obj=.users.uid.1:client1-system-user state=0x7f9538003dc8 s->prefetch_data=0
2013-11-25 18:42:09.092316 7f9597fb7700 10 moving .users.uid.1+client1-system-user to cache LRU end
2013-11-25 18:42:09.092318 7f9597fb7700 10 cache get: name=.users.uid.1+client1-system-user : hit
2013-11-25 18:42:09.092323 7f9597fb7700 20 get_obj_state: s->obj_tag was set empty
2013-11-25 18:42:09.092328 7f9597fb7700 10 moving .users.uid.1+client1-system-user to cache LRU end
2013-11-25 18:42:09.092330 7f9597fb7700 10 cache get: name=.users.uid.1+client1-system-user : hit
2013-11-25 18:42:09.092380 7f9597fb7700 10 get_canon_resource(): dest=
2013-11-25 18:42:09.092384 7f9597fb7700 10 auth_hdr:
DELETE

Tue, 26 Nov 2013 02:42:09 GMT
/admin/metadata/bucket
2013-11-25 18:42:09.092404 7f9597fb7700 15 calculated digest=cNfK0sK3R6iu0em7Jg/GK6pEBGU=
2013-11-25 18:42:09.092406 7f9597fb7700 15 auth_sign=cNfK0sK3R6iu0em7Jg/GK6pEBGU=
2013-11-25 18:42:09.092408 7f9597fb7700 15 compare=0
2013-11-25 18:42:09.092411 7f9597fb7700 20 system request
2013-11-25 18:42:09.092415 7f9597fb7700  2 req 962:0.000352::DELETE /admin/metadata/bucket:remove_metadata:reading permissions
2013-11-25 18:42:09.092423 7f9597fb7700  2 req 962:0.000359::DELETE /admin/metadata/bucket:remove_metadata:verifying op mask
2013-11-25 18:42:09.092428 7f9597fb7700 20 required_mask= 0 user.op_mask=7
2013-11-25 18:42:09.092431 7f9597fb7700  2 req 962:0.000368::DELETE /admin/metadata/bucket:remove_metadata:verifying op permissions
2013-11-25 18:42:09.092438 7f9597fb7700  2 overriding permissions due to system operation
2013-11-25 18:42:09.092440 7f9597fb7700  2 req 962:0.000377::DELETE /admin/metadata/bucket:remove_metadata:verifying op params
2013-11-25 18:42:09.092443 7f9597fb7700  2 req 962:0.000380::DELETE /admin/metadata/bucket:remove_metadata:executing
2013-11-25 18:42:09.092460 7f9597fb7700 20 get_obj_state: rctx=0x7f9538006080 obj=.region0.r0z1.domain_root:mybar state=0x7f95380071d8 s->prefetch_data=0
2013-11-25 18:42:09.092468 7f9597fb7700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.092471 7f9597fb7700 10 cache get: name=.region0.r0z1.domain_root+mybar : type miss (requested=22, cached=19)
2013-11-25 18:42:09.092512 7f9597fb7700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6804/11369 -- osd_op(client.4140.0:1168 mybar [call version.read,getxattrs,stat] 11.e29e365e e38) v4 -- ?+0 0x7f9538009010 con 0x136cc10
2013-11-25 18:42:09.093431 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.1 10.214.132.136:6804/11369 982 ==== osd_op_reply(1168 mybar [call,getxattrs,stat] ondisk = 0) v4 ==== 188+0+139 (3004828890 0 4096515565) 0x7f95c8001ae0 con 0x136cc10
2013-11-25 18:42:09.093508 7f9597fb7700 10 cache put: name=.region0.r0z1.domain_root+mybar
2013-11-25 18:42:09.093514 7f9597fb7700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.093521 7f9597fb7700 20 get_obj_state: s->obj_tag was set empty
2013-11-25 18:42:09.093524 7f9597fb7700 20 Read xattr: user.rgw.idtag
2013-11-25 18:42:09.093527 7f9597fb7700 20 Read xattr: user.rgw.manifest
2013-11-25 18:42:09.093538 7f9597fb7700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.093541 7f9597fb7700 10 cache get: name=.region0.r0z1.domain_root+mybar : hit
2013-11-25 18:42:09.093571 7f9597fb7700 20 get_obj_state: rctx=0x7f9538006080 obj=.region0.r0z1.domain_root:mybar state=0x7f95380071d8 s->prefetch_data=0
2013-11-25 18:42:09.093580 7f9597fb7700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.093582 7f9597fb7700 10 cache get: name=.region0.r0z1.domain_root+mybar : hit
2013-11-25 18:42:09.093587 7f9597fb7700 20 get_obj_state: s->obj_tag was set empty
2013-11-25 18:42:09.093593 7f9597fb7700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.093597 7f9597fb7700 10 cache get: name=.region0.r0z1.domain_root+mybar : hit
2013-11-25 18:42:09.093649 7f9597fb7700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6804/11369 -- osd_op(client.4140.0:1169 foo.buckets [omap-rm-keys 0~13] 13.9e48174c e38) v4 -- ?+0 0x7f95380099d0 con 0x136cc10
2013-11-25 18:42:09.096223 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.1 10.214.132.136:6804/11369 983 ==== osd_op_reply(1169 foo.buckets [omap-rm-keys 0~13] ack = 0) v4 ==== 110+0+0 (2293323495 0 0) 0x7f95c8001310 con 0x136cc10
2013-11-25 18:42:09.096286 7f9597fb7700 20 get_obj_state: rctx=0x7f95380038b0 obj=.region0.r0z1.domain_root:mybar state=0x7f95380099c8 s->prefetch_data=0
2013-11-25 18:42:09.096301 7f9597fb7700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.096306 7f9597fb7700 10 cache get: name=.region0.r0z1.domain_root+mybar : hit
2013-11-25 18:42:09.096315 7f9597fb7700 20 get_obj_state: s->obj_tag was set empty
2013-11-25 18:42:09.096320 7f9597fb7700 20 Read xattr: user.rgw.idtag
2013-11-25 18:42:09.096323 7f9597fb7700 20 Read xattr: user.rgw.manifest
2013-11-25 18:42:09.096333 7f9597fb7700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.096337 7f9597fb7700 10 cache get: name=.region0.r0z1.domain_root+mybar : hit
2013-11-25 18:42:09.096469 7f9597fb7700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6804/11369 -- osd_op(client.4140.0:1170 mybar [call version.check_conds,call version.set,writefull 0~122] 11.e29e365e e38) v4 -- ?+0 0x7f953800d7f0 con 0x136cc10
2013-11-25 18:42:09.099146 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.1 10.214.132.136:6804/11369 984 ==== osd_op_reply(1170 mybar [call,call,writefull 0~122] ack = 0) v4 ==== 188+0+0 (2052653044 0 0) 0x7f95c8001310 con 0x136cc10
2013-11-25 18:42:09.099231 7f9597fb7700 10 cache put: name=.region0.r0z1.domain_root+mybar
2013-11-25 18:42:09.099244 7f9597fb7700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.099250 7f9597fb7700 10 updating xattr: name=user.rgw.idtag bl.length()=0
2013-11-25 18:42:09.099254 7f9597fb7700 10 updating xattr: name=user.rgw.manifest bl.length()=0
2013-11-25 18:42:09.099281 7f9597fb7700 10 distributing notification oid=notify.5 bl.length()=447
2013-11-25 18:42:09.099321 7f9597fb7700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6800/11357 -- osd_op(client.4140.0:1171 notify.5 [notify 27~0] 10.31099063 e38) v4 -- ?+0 0x7f953800f010 con 0x1366d50
2013-11-25 18:42:09.099940 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 222 ==== watch-notify(c=6 v=1 i=163208757260 opcode=1) v1 ==== 477+0+0 (83342436 0 0) 0x7f95d0001830 con 0x1366d50
2013-11-25 18:42:09.099979 7f95f17fa700 10 RGWWatcher::notify() opcode=1 ver=1 bl.length()=447
2013-11-25 18:42:09.100005 7f95f17fa700 10 cache put: name=.region0.r0z1.domain_root+mybar
2013-11-25 18:42:09.100011 7f95f17fa700 10 moving .region0.r0z1.domain_root+mybar to cache LRU end
2013-11-25 18:42:09.100016 7f95f17fa700 10 updating xattr: name=user.rgw.idtag bl.length()=0
2013-11-25 18:42:09.100019 7f95f17fa700 10 updating xattr: name=user.rgw.manifest bl.length()=0
2013-11-25 18:42:09.100026 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 223 ==== osd_op_reply(1171 notify.5 [notify 27~0] ondisk = 0) v4 ==== 107+0+0 (2410619921 0 0) 0x7f95d0001830 con 0x1366d50
2013-11-25 18:42:09.100059 7f95f17fa700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6800/11357 -- osd_op(client.4140.0:1172 notify.5 [notify-ack 163208757260~1] 10.31099063 e38) v4 -- ?+0 0x7f9588005cf0 con 0x1366d50
2013-11-25 18:42:09.100688 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 224 ==== watch-notify(c=27 v=1 i=163208757260 opcode=1) v1 ==== 477+0+0 (3658847352 0 0) 0x7f95d0001530 con 0x1366d50
2013-11-25 18:42:09.100761 7f95f17fa700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6800/11357 -- osd_op(client.4140.0:1173 notify.5 [notify-ack 163208757260~1] 10.31099063 e38) v4 -- ?+0 0x7f9588005860 con 0x1366d50
2013-11-25 18:42:09.100777 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 225 ==== osd_op_reply(1172 notify.5 [notify-ack 163208757260~1] ondisk = 0) v4 ==== 107+0+0 (2124693930 0 0) 0x7f95d0000cd0 con 0x1366d50
2013-11-25 18:42:09.100860 7f9597fb7700 10 removing .region0.r0z1.domain_root+mybar from cache
2013-11-25 18:42:09.100880 7f9597fb7700 10 distributing notification oid=notify.5 bl.length()=254
2013-11-25 18:42:09.100911 7f9597fb7700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6800/11357 -- osd_op(client.4140.0:1174 notify.5 [notify 28~0] 10.31099063 e38) v4 -- ?+0 0x7f953800b010 con 0x1366d50
2013-11-25 18:42:09.101420 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 226 ==== osd_op_reply(1173 notify.5 [notify-ack 163208757260~1] ondisk = 0) v4 ==== 107+0+0 (2124693930 0 0) 0x7f95d0000a30 con 0x1366d50
2013-11-25 18:42:09.101604 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 227 ==== watch-notify(c=6 v=1 i=163208757261 opcode=1) v1 ==== 284+0+0 (4181902742 0 0) 0x7f95d0000a30 con 0x1366d50
2013-11-25 18:42:09.101646 7f95f17fa700 10 RGWWatcher::notify() opcode=1 ver=1 bl.length()=254
2013-11-25 18:42:09.101680 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 228 ==== osd_op_reply(1174 notify.5 [notify 28~0] ondisk = 0) v4 ==== 107+0+0 (2503451170 0 0) 0x7f95d00014c0 con 0x1366d50
2013-11-25 18:42:09.101702 7f95f17fa700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6800/11357 -- osd_op(client.4140.0:1175 notify.5 [notify-ack 163208757261~1] 10.31099063 e38) v4 -- ?+0 0x7f9588005860 con 0x1366d50
2013-11-25 18:42:09.102431 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 229 ==== watch-notify(c=28 v=1 i=163208757261 opcode=1) v1 ==== 284+0+0 (3377686476 0 0) 0x7f95d0000a30 con 0x1366d50
2013-11-25 18:42:09.102498 7f95f17fa700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6800/11357 -- osd_op(client.4140.0:1176 notify.5 [notify-ack 163208757261~1] 10.31099063 e38) v4 -- ?+0 0x7f9588005c70 con 0x1366d50
2013-11-25 18:42:09.102528 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 230 ==== osd_op_reply(1175 notify.5 [notify-ack 163208757261~1] ondisk = 0) v4 ==== 107+0+0 (2741261441 0 0) 0x7f95d00016e0 con 0x1366d50
2013-11-25 18:42:09.102655 7f9597fb7700  1 -- 10.214.132.136:0/1012510 --> 10.214.132.136:6804/11369 -- osd_op(client.4140.0:1177 mybar [call version.check_conds,call version.set,call refcount.put] 11.e29e365e e38) v4 -- ?+0 0x7f953800f5f0 con 0x136cc10
2013-11-25 18:42:09.103144 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.0 10.214.132.136:6800/11357 231 ==== osd_op_reply(1176 notify.5 [notify-ack 163208757261~1] ondisk = 0) v4 ==== 107+0+0 (2741261441 0 0) 0x7f95d0000ab0 con 0x1366d50
2013-11-25 18:42:09.103543 7f95f3fff700  1 -- 10.214.132.136:0/1012510 <== osd.1 10.214.132.136:6804/11369 985 ==== osd_op_reply(1177 mybar [call,call,call] ondisk = -125 (Operation canceled)) v4 ==== 188+0+0 (110618145 0 0) 0x7f95c8001310 con 0x136cc10
2013-11-25 18:42:09.103647 7f9597fb7700  2 req 962:0.011583::DELETE /admin/metadata/bucket:remove_metadata:http status=200

This is reproducible with teuthology's wip-rgw-sync branch and the following job:

interactive-on-error: true
roles:
- [mon.a, osd.0, osd.1, client.0, client.1]
tasks:
- chef:
- install:
    branch: dumpling
- ceph:
    conf:
      client:
        debug ms: 1
        debug rgw: 30
        rgw gc obj min wait: 15
        rgw data log window: 1
      osd:
        debug ms: 1
        debug objclass : 20
      client.0:
        rgw region: region0
        rgw zone: r0z0
        rgw region root pool: .rgw.region.0
        rgw zone root pool: .rgw.zone.0
        rgw gc pool: .rgw.gc.0
        rgw user uid pool: .users.uid.0
        rgw user keys pool: .users.0
        rgw log data: True
        rgw log meta: True
      client.1:
        rgw region: region0
        rgw zone: r0z1
        rgw region root pool: .rgw.region.0
        rgw zone root pool: .rgw.zone.1
        rgw gc pool: .rgw.gc.1
        rgw user uid pool: .users.uid.1
        rgw user keys pool: .users.1
        rgw log data: False
        rgw log meta: False
- rgw:
    regions:
      region0:
        api name: api1
        is master: True
        master zone: r0z0
        zones: [r0z0, r0z1]
    client.0:
      system user:
        name: client0-system-user
        access key: 0te6NH5mcdcq0Tc5i8i2
        secret key: Oy4IOauQoL18Gp2zM7lC1vLmoawgqcYPbYGcWfXv
    client.1:
      system user:
        name: client1-system-user
        access key: 1te6NH5mcdcq0Tc5i8i3
        secret key: Py4IOauQoL18Gp2zM7lC1vLmoawgqcYPbYGcWfXw
- radosgw-agent:
    client.0:
      src: client.0
      dest: client.1
- radosgw-admin:

The full logs can be found on mira021.

Actions #1

Updated by Josh Durgin over 10 years ago

It looks like this is happening because the admin metadata API is reading the version of the bucket object, then calling rgw_unlink_bucket(), which updates the bucket object, incrementing its version, and finally attempts to delete it while checking for the originally read version.

Actions #2

Updated by Ian Colle over 10 years ago

  • Assignee set to Yehuda Sadeh
Actions #3

Updated by Yehuda Sadeh over 10 years ago

This is issue #6056 which was fixed post dumpling. Commit:0373d749cea7d9b532069ba8ebca2f005b2c9f59 fixes it, we'll need to cherry-pick it into dumpling.

Actions #4

Updated by Josh Durgin over 10 years ago

  • Status changed from New to Resolved

cherry-picked in commit:8cd33e3a8ebf7c2aa796ec9f92d6b554c39ff705

Actions

Also available in: Atom PDF