Project

General

Profile

Actions

Bug #10102

closed

sync agent: does not handle gracefully transient errors

Added by Yehuda Sadeh over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

100%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

on a copy operation, rgw sent back 400 and the sync agent got stuck in the following loop:

2014-11-12T18:12:46.315 18110:DEBUG:urllib3.connectionpool:"PUT /Backups/%3CKey%3A%20Backups%2CCBB_DC1/some_path/some_file%3E?rgwx-op-id=LA3CGW01-22373A13A37606&rgwx-source-zone=la-primary&rgwx-client-id=radosgw-agent HTTP/1.1" 500 527
2014-11-12T18:12:46.317 18110:DEBUG:radosgw_agent.worker:exception during sync: Http error code 500 content <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 root@localhost to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
2014-11-12T18:12:48.616 18110:DEBUG:radosgw_agent.worker:op state is []
2014-11-12T18:12:48.617 18110:DEBUG:radosgw_agent.worker:error geting op state: list index out of range
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/radosgw_agent/worker.py", line 220, in wait_for_object
    state = state[0]['state']
IndexError: list index out of range

The sync agent needs to back out and retry in such cases.


Subtasks 1 (0 open1 closed)

Bug #10099: radosgw-agent - error geting op state: list index out of rangeDuplicate11/13/2014

Actions
Actions #1

Updated by Alfredo Deza over 9 years ago

  • Description updated (diff)
  • Assignee set to Alfredo Deza
Actions #2

Updated by Alfredo Deza over 9 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Alfredo Deza over 9 years ago

  • Description updated (diff)
Actions #4

Updated by Alfredo Deza over 9 years ago

Updated the description, and the RGW is not returning a 400 but a 500. The agent should probably get updated to understand
on what codes it should retry the operation and on what other it should give up with errors.

Actions #5

Updated by Alfredo Deza over 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by Josh Durgin over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions #7

Updated by Ken Dreyer about 9 years ago

Just a note that this fix shipped in the 1.2 release. https://github.com/ceph/radosgw-agent/blob/master/CHANGELOG.rst#12

Actions

Also available in: Atom PDF