Project

General

Profile

Actions

Bug #21256

closed

multisite sync hang when network is bad

Added by rui xie over 6 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

sync is not complete for very long time.

stack:
#5 0x00007fb85c348589 in RGWRemoteMetaLog::run_sync (this=0x7fb84b852838) at rgw/rgw_sync.cc:1992
1992 rgw/rgw_sync.cc: No such file or directory.
(gdb) p *sync_env->http_manager
$1 = {cct = 0x7fb84b840140, completion_mgr = 0x7fb84b82d400, multi_handle = 0x7fb84b9a3780, is_threaded = true, going_down = {lock = {lock = 1}, val = 0}, is_stopped = {lock = {lock = 1},
val = 0}, reqs_lock = {L = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 1, __writer_wakeup = 12, __nr_readers_queued = 0, __nr_writers_queued = 0, __writer = 0,
__shared = 0, __pad1 = 0, __pad2 = 0, __flags = 0}, __size = "\000\000\000\000\000\000\000\000\001\000\000\000\f", '\000' <repeats 42 times>, __align = 0},
name = "RGWHTTPManager::reqs_lock", id = -1, nrlock = {lock = {lock = 1}, val = 0}, nwlock = {lock = {lock = 1}, val = 0}, track = true, lockdep = true}, reqs = std::map with 8 elements = {*[35074] = 0x7fb6f1285a00, [35327] = 0x7fb6f1284200*, [37058] = 0x7fb6f1199400, [37557] = 0x7fb6f119a200, [39006] = 0x7fb6f1199600, [39435] = 0x7fb6eff38e00, [41628] = 0x7fb6f1198c00,
[42993] = 0x7fb6f119bc00}, unregistered_reqs = empty std::list, complete_reqs = std::map with 0 elements, num_reqs = 43040, max_threaded_req = 43040, thread_pipe = {12, 13},
request_start = {tv = {tv_sec = 1504660306, tv_nsec = 780243685}}, reqs_thread = 0x7fb84b9afe20}
(gdb) p *(rgw_http_req_data *)0x7fb6f1285a00
$2 = {<RefCountedObject> = {_vptr.RefCountedObject = 0x7fb85cbddc70 <vtable for rgw_http_req_data+16>, nref = {lock = {lock = 1}, val = 2}, cct = 0x0}, easy_handle = 0x7fb82d7ac000,
h = 0x7fb6f0c138e0, id = 35074, url = "http://172.18.0.100:8080/admin/metadata/bucket/xx_292?key=xx_292&rgwx-zonegroup=66a561a6-9db2-4560-abdb-79c37e90bfc0", ret = 0, done = {lock = {
lock = 1}, val = 0}, client = 0x7fb6f0ce52e0, user_info = 0x7fb6f0c92fa0, registered = true, mgr = 0x7fb84b852920, error_buf = '\000' <repeats 255 times>, lock = {
name = "rgw_http_req_data::lock", id = -1, recursive = false, lockdep = true, backtrace = false, m = {_data = {__lock = 0, _count = 0, __owner = 0, __nusers = 0, __kind = 2,
__spins = 0, __list = {
_prev = 0x0, _next = 0x0}}, __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>, __align = 0}, nlock = 0, locked_by = 0, cct = 0x0,
logger = 0x0}, cond = {_vptr.Cond = 0x7fb86607f190 <vtable for Cond+16>, _c = {
_data = {__lock = 0, _futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
__nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, waiter_mutex = 0x0}}
$2 = {<RefCountedObject> = {_vptr.RefCountedObject = 0x7fb85cbddc70 <vtable for rgw_http_req_data+16>, nref = {lock = {lock = 1}, val = 2}, cct = 0x0}, easy_handle = 0x7fb6f0c4f000,
h = 0x7fb6f1620350, id = 35327,
url = "http://172.18.0.100:8080/admin/log?type=metadata&id=1&period=cfaf60e3-77e4-4918-a354-b4781249404b&max-entries=100&marker=1_1504609047.480986_161400.1&rgwx-zonegroup=66a561a6-9db2-4560-abdb-79c37e90bfc"..., ret = 0, done = {lock = {lock = 1}, val = 0}, client = 0x7fb6f0ce48e0, user_info = 0x7fb82d4513a0, registered = true, mgr = 0x7fb84b852920,
error_buf = '\000' <repeats 255 times>, lock = {name = "rgw_http_req_data::lock", id = -1, recursive = false, lockdep = true, backtrace = false, _m = {
_data = {__lock = 0, _count = 0,
__owner = 0, __nusers = 0, __kind = 2, __spins = 0, __list = {
_prev = 0x0, _next = 0x0}}, __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>, __align = 0},
nlock = 0, locked_by = 0, cct = 0x0, logger = 0x0}, cond = {_vptr.Cond = 0x7fb86607f190 <vtable for Cond+16>, _c = {
_data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, waiter_mutex = 0x0}}
(gdb)

cr like these will always block unless restart radosgw.

can reproduce with tc.


Related issues 1 (0 open1 closed)

Is duplicate of rgw - Bug #25019: multisite: curl client does not time out on sync requestsResolved07/20/2018

Actions
Actions #1

Updated by rui xie over 6 years ago

radosgw runs with nginx + fastcgi

Actions #2

Updated by John Spray over 6 years ago

  • Project changed from Ceph to rgw
  • Category deleted (22)
Actions #3

Updated by Casey Bodley over 2 years ago

  • Is duplicate of Bug #25019: multisite: curl client does not time out on sync requests added
Actions #4

Updated by Casey Bodley over 2 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF