Bug #18300
leak from RGWMetaSyncShardCR::incremental_sync
% Done:
0%
Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
<kind>Leak_DefinitelyLost</kind> <xwhat> <text>448 bytes in 2 blocks are definitely lost in loss record 72 of 80</text> <leakedbytes>448</leakedbytes> <leakedblocks>2</leakedblocks> </xwhat> <stack> <frame> <ip>0x98AD105</ip> <obj>/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so</obj> <fn>operator new(unsigned long)</fn> </frame> <frame> <ip>0x3D2DD2</ip> <obj>/usr/bin/radosgw</obj> <fn>RGWCoroutinesManager::allocate_stack()</fn> <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir> <file>rgw_coroutine.cc</file> <line>664</line> </frame> <frame> <ip>0x3D5C3F</ip> <obj>/usr/bin/radosgw</obj> <fn>RGWCoroutinesStack::spawn(RGWCoroutine*, RGWCoroutine*, bool)</fn> <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir> <file>rgw_coroutine.cc</file> <line>249</line> </frame> <frame> <ip>0x5810FB</ip> <obj>/usr/bin/radosgw</obj> <fn>RGWMetaSyncShardCR::incremental_sync()</fn> <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir> <file>rgw_sync.cc</file> <line>1593</line> </frame> <frame> <ip>0x5818B3</ip> <obj>/usr/bin/radosgw</obj> <fn>RGWMetaSyncShardCR::operate()</fn> <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir> <file>rgw_sync.cc</file> <line>1317</line> </frame> <frame> <ip>0x3D202D</ip> <obj>/usr/bin/radosgw</obj> <fn>RGWCoroutinesStack::operate(RGWCoroutinesEnv*)</fn> <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir> <file>rgw_coroutine.cc</file> <line>197</line> </frame> <frame> <ip>0x3D4AFA</ip> <obj>/usr/bin/radosgw</obj> <fn>RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)</fn> <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir> <file>rgw_coroutine.cc</file> <line>489</line> </frame> <frame> <ip>0x3D58AF</ip> <obj>/usr/bin/radosgw</obj> <fn>RGWCoroutinesManager::run(RGWCoroutine*)</fn> <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir> <file>rgw_coroutine.cc</file> <line>628</line> </frame> <frame> <ip>0x571CEA</ip> <obj>/usr/bin/radosgw</obj> <fn>RGWRemoteMetaLog::run_sync()</fn> <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir> <file>rgw_sync.cc</file> <line>1997</line> </frame>
/a/sage-2016-12-16_16:52:37-rgw-master---basic-smithi/639993
Related issues
History
#1 Updated by Casey Bodley over 7 years ago
- Assignee set to Casey Bodley
#2 Updated by Casey Bodley over 7 years ago
looking at http://qa-proxy.ceph.com/teuthology/sage-2016-12-16_16:52:37-rgw-master---basic-smithi/639993/remote/smithi007/log/rgw.client.1.log.gz, I see that incremental sync was running when radosgw was shut down:
2016-12-16 20:13:58.946062 34a4b700 20 meta sync: incremental_sync:1605: shard_id=59 mdlog_marker=1_1481918963.277466_1349.1 max_marker=1_1481918963.277466_1349.1 sync_marker.marker=1_1481918963.277466_1349.1 period_marker= 2016-12-16 20:13:58.946127 34a4b700 20 run: stack=0x7555d160 is io blocked 2016-12-16 20:13:59.320313 3c844700 -1 received signal: Terminated from PID: 7322 task name: /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/client.1.log --time-stamp=yes --tool=memcheck radosgw --rgw-frontends civetweb port=7281 --rgw-zone r1z1 -n client.1 -k /etc/ceph/ceph.client.1.keyring --log-file /var/log/ceph/rgw.client.1.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.client.1.sock --foreground UID: 0 2016-12-16 20:13:59.322571 3c844700 1 handle_sigterm 2016-12-16 20:13:59.323874 8cc4980 -1 shutting down 2016-12-16 20:13:59.328940 3c844700 1 handle_sigterm set alarm for 120 2016-12-16 20:13:59.544463 34a4b700 0 ERROR: failed to clone shard, completion_mgr.get_next() returned ret=-125 2016-12-16 20:13:59.545388 34a4b700 5 run(): was stopped, exiting
This cancels the RGWMetaSyncShardCR before it can call collect_children() to drop its references to the stacks in the stack_to_pos map. RGWMetaSyncShardCR's destructor will need to make sure all references are dropped.
#3 Updated by Casey Bodley over 7 years ago
- Status changed from New to Fix Under Review
#4 Updated by Casey Bodley over 7 years ago
- Backport set to jewel kraken
#5 Updated by Yehuda Sadeh about 7 years ago
- Status changed from Fix Under Review to Pending Backport
#6 Updated by Casey Bodley about 7 years ago
backported to kraken in https://github.com/ceph/ceph/pull/12949
#7 Updated by Nathan Cutler about 7 years ago
- Copied to Backport #18563: jewel: leak from RGWMetaSyncShardCR::incremental_sync added
#8 Updated by Nathan Cutler about 7 years ago
- Copied to Backport #18564: kraken: leak from RGWMetaSyncShardCR::incremental_sync added
#9 Updated by Nathan Cutler about 7 years ago
- Backport changed from jewel kraken to jewel, kraken
#10 Updated by Sage Weil about 7 years ago
- Priority changed from Immediate to Urgent
#11 Updated by Casey Bodley about 7 years ago
both backports are resolved, can we close this one?
#12 Updated by Nathan Cutler about 7 years ago
- Status changed from Pending Backport to Resolved