Project

General

Profile

Actions

Bug #19446

closed

rgw: heavy memory leak when multisite sync fail on 10.2.6

Added by Haroboro Ha about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

heavry memory leak was found when the site sync failed in an active-active multisite clusters (2 sites). In about 2 hours, it will consume about 60GB memory.

when the following log occured every time, the radosgw RSS increased again and again.

2017-04-02 22:56:10.621369 7fa580fe1700 0 ERROR: a sync operation returned error
2017-04-02 22:56:14.772360 7fa580fe1700 0 ERROR: failure in sync, backing out (sync_status=-5)
2017-04-02 22:56:14.782136 7fa580fe1700 0 ERROR: failed to log sync failure in error repo: retcode=0
2017-04-02 22:56:14.782207 7fa580fe1700 0 rgw meta sync: ERROR: RGWBackoffControlCR called coroutine returned 5
2017-04-02 22:57:53.071968 7fa580fe1700 0 ERROR: failed to read remote data log info: ret=-5
2017-04-02 22:57:53.165198 7fa580fe1700 0 rgw meta sync: ERROR: RGWBackoffControlCR called coroutine returned -5
2017-04-02 22:58:44.120891 7fa580fe1700 0 ERROR: failed to read remote data log info: ret=-5
2017-04-02 22:58:44.124339 7fa580fe1700 0 rgw meta sync: ERROR: RGWBackoffControlCR called coroutine returned -5
2017-04-02 22:59:25.327494 7fa580fe1700 0 ERROR: failed to read remote data log info: ret=-5
2017-04-02 22:59:25.406766 7fa580fe1700 0 rgw meta sync: ERROR: RGWBackoffControlCR called coroutine returned -5
2017-04-02 23:00:35.433212 7fa580fe1700 0 ERROR: failed to fetch remote data log info: ret=-5
2017-04-02 23:00:35.442030 7fa580fe1700 0 rgw meta sync: ERROR: RGWBackoffControlCR called coroutine returned -5
2017-04-02 23:00:55.583137 7fa580fe1700 0 ERROR: failed to read remote data log info: ret=-5
2017-04-02 23:00:55.587294 7fa580fe1700 0 rgw meta sync: ERROR: RGWBackoffControlCR called coroutine returned -5
2017-04-02 23:02:12.050433 7fa58bff7700 0 store
>fetch_remote_obj() returned r=-2
2017-04-02 23:02:12.050548 7fa580fe1700 0 ERROR: a sync operation returned error
2017-04-02 23:02:12.052318 7fa584fe9700 0 store->fetch_remote_obj() returned r=-2
2017-04-02 23:02:12.052447 7fa580fe1700 0 ERROR: a sync operation returned error
2017-04-02 23:02:12.054184 7fa5977fe700 0 store->fetch_remote_obj() returned r=-2
2017-04-02 23:02:12.054292 7fa580fe1700 0 ERROR: a sync operation returned error
2017-04-02 23:02:12.056197 7fa58cff9700 0 store->fetch_remote_obj() returned r=-2
2017-04-02 23:02:12.056307 7fa580fe1700 0 ERROR: a sync operation returned error
2017-04-02 23:02:12.064099 7fa5897f2700 0 store->fetch_remote_obj() returned r=-2
2017-04-02 23:02:12.064239 7fa580fe1700 0 ERROR: a sync operation returned error
2017-04-02 23:02:13.126223 7fa580fe1700 0 ERROR: failure in sync, backing out (sync_status=-2)
2017-04-02 23:02:13.233178 7fa580fe1700 0 WARNING: skipping data log entry for missing bucket aspen:1d0e03f4-f7fc-4ee6-a956-b66483526e3d.4741.4
2017-04-02 23:02:27.769720 7fa5887f0700 0 store->fetch_remote_obj() returned r=-2
2017-04-02 23:02:27.772560 7fa58ffff700 0 store->fetch_remote_obj() returned r=-2
2017-04-02 23:02:27.788009 7fa580fe1700 0 ERROR: a sync operation returned error
2017-04-02 23:02:27.788023 7fa580fe1700 0 ERROR: a sync operation returned error
2017-04-02 23:02:27.963422 7fa580fe1700 0 ERROR: failure in sync, backing out (sync_status=-2)
2017-04-02 23:02:28.093612 7fa580fe1700 0 WARNING: skipping data log entry for missing bucket aspen:1d0e03f4-f7fc-4ee6-a956-b66483526e3d.4741.4


Files

rgw-error-warn.log (842 KB) rgw-error-warn.log ERROR|WARN in ceph-client.rgw.ceph-1.log Gabriel Wicke, 04/15/2017 05:41 PM
debug_rgw_deletes.log.xz (39.3 KB) debug_rgw_deletes.log.xz radosgw log in swift-bench delete phase Gabriel Wicke, 04/17/2017 09:12 PM
debug_rgw_post_bucket_drop.log.xz (88.8 KB) debug_rgw_post_bucket_drop.log.xz radosgw log after deletion of previous bucket, with swift-bench pushing new puts for new bucket Gabriel Wicke, 04/17/2017 09:13 PM
debug_rgw_puts.log.xz (55.6 KB) debug_rgw_puts.log.xz radosgw log in swift-bench put phase Gabriel Wicke, 04/17/2017 09:13 PM
debug_20_rgw_deletes.log.xz (333 KB) debug_20_rgw_deletes.log.xz 10 second snapshot of debug 20 log during deletions Gabriel Wicke, 04/17/2017 09:50 PM
debug_20_rgw_post_bucket_delete_puts.log.xz (475 KB) debug_20_rgw_post_bucket_delete_puts.log.xz 10 second snapshot of debug 20 log during puts, after previous bucket was deleted Gabriel Wicke, 04/17/2017 09:50 PM
massif-3.txt (461 KB) massif-3.txt Gabriel Wicke, 04/20/2017 03:07 AM

Related issues 1 (0 open1 closed)

Related to rgw - Bug #19861: multisite: memory leak on failed lease in RGWDataSyncShardCRResolved05/04/2017

Actions
Actions

Also available in: Atom PDF