Project

General

Profile

Bug #17698

multisite: ECANCELED & 500 error on bucket delete

Added by Abhishek Lekshmanan 5 months ago. Updated 4 months ago.

Status:
Pending Backport
Priority:
High
Assignee:
-
Target version:
-
Start date:
10/25/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

sometimes on bucket delete at the secondary, we're seeing secondary returning 500 on a bucket delete (caused by reraising ECANCELED from OSDs). However the bucket in question gets deleted actually and subsequent client requests fail with a 404

2016-10-25 13:59:21.999214 7f7a48417700  1 -- 127.0.0.1:0/1778501402 <== osd.0 127.0.0.1:6812/27412 2962 ==== osd_op_reply(6705 bucket26 [call,call,delete] v0'0 uv0 ondisk = -125 ((125) Operation canceled)) v7 ==== 212+0+0 (1575458175 0 0) 0x7f7a20013ba0 con 0x564acad41190
2016-10-25 13:59:21.999244 7f7920ff1700  2 req 1192:0.312583:s3:DELETE /bucket26/:delete_bucket:completing
2016-10-25 13:59:21.999247 7f7920ff1700  0 WARNING: set_req_state_err err_no=125 resorting to 500
2016-10-25 13:59:21.999286 7f7920ff1700  2 req 1192:0.312625:s3:DELETE /bucket26/:delete_bucket:op status=-125
2016-10-25 13:59:21.999289 7f7920ff1700  2 req 1192:0.312628:s3:DELETE /bucket26/:delete_bucket:http status=500
2016-10-25 13:59:21.999291 7f7920ff1700  1 ====== req done req=0x7f7920fee7e0 op status=-125 http_status=500 ======

rgw-bucket26.log View (108 KB) Abhishek Lekshmanan, 10/25/2016 04:35 PM


Related issues

Copied to Backport #17886: jewel: multisite: ECANCELED & 500 error on bucket delete Resolved

History

#1 Updated by Abhishek Lekshmanan 5 months ago

Adding the rgw log of the relevant bucket, it seems we're issuing delete on the bucket object twice, once at
2016-10-25 13:59:21.931712 7f7a01ffb700 1 -- 127.0.0.1:0/1778501402 --> 127.0.0.1:6812/27412 -- osd_op(client.4111.0:6691 3.c8ef106b bucket26 [call version.check_conds,call version.set,delete] snapc 0=[] ondisk+write+known_if_redirected e29) v7 -- ?+0 0x7f79e0018850 con 0x564acad41190

(which is probably mdlog/admin log?) and second when the secondary itself processes the delete.

#2 Updated by Yehuda Sadeh 5 months ago

Sounds like secondary is racing with the request forwarded from the master (which the secondary itself initiated). We should handle that ECANCELED gracefully.

#4 Updated by Yehuda Sadeh 5 months ago

  • Priority changed from Normal to High

#5 Updated by Casey Bodley 5 months ago

  • Status changed from New to Pending Backport

#6 Updated by Nathan Cutler 4 months ago

  • Source deleted (other)
  • Backport set to jewel

#7 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #17886: jewel: multisite: ECANCELED & 500 error on bucket delete added

Also available in: Atom PDF