Project

General

Profile

Bug #18300

leak from RGWMetaSyncShardCR::incremental_sync

Added by Sage Weil 9 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
Start date:
12/16/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

  <kind>Leak_DefinitelyLost</kind>
  <xwhat>
    <text>448 bytes in 2 blocks are definitely lost in loss record 72 of 80</text>
    <leakedbytes>448</leakedbytes>
    <leakedblocks>2</leakedblocks>
  </xwhat>
  <stack>
    <frame>
      <ip>0x98AD105</ip>
      <obj>/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator new(unsigned long)</fn>
    </frame>
    <frame>
      <ip>0x3D2DD2</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWCoroutinesManager::allocate_stack()</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir>
      <file>rgw_coroutine.cc</file>
      <line>664</line>
    </frame>
    <frame>
      <ip>0x3D5C3F</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWCoroutinesStack::spawn(RGWCoroutine*, RGWCoroutine*, bool)</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir>
      <file>rgw_coroutine.cc</file>
      <line>249</line>
    </frame>
    <frame>
      <ip>0x5810FB</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWMetaSyncShardCR::incremental_sync()</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir>
      <file>rgw_sync.cc</file>
      <line>1593</line>
    </frame>
    <frame>
      <ip>0x5818B3</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWMetaSyncShardCR::operate()</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir>
      <file>rgw_sync.cc</file>
      <line>1317</line>
    </frame>
    <frame>
      <ip>0x3D202D</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWCoroutinesStack::operate(RGWCoroutinesEnv*)</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir>
      <file>rgw_coroutine.cc</file>
      <line>197</line>
    </frame>
    <frame>
      <ip>0x3D4AFA</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWCoroutinesManager::run(std::list&lt;RGWCoroutinesStack*, std::allocator&lt;RGWCoroutinesStack*&gt; &gt;&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir>
      <file>rgw_coroutine.cc</file>
      <line>489</line>
    </frame>
    <frame>
      <ip>0x3D58AF</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWCoroutinesManager::run(RGWCoroutine*)</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir>
      <file>rgw_coroutine.cc</file>
      <line>628</line>
    </frame>
    <frame>
      <ip>0x571CEA</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWRemoteMetaLog::run_sync()</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6040-g478551e/src/rgw</dir>
      <file>rgw_sync.cc</file>
      <line>1997</line>
    </frame>

/a/sage-2016-12-16_16:52:37-rgw-master---basic-smithi/639993

Related issues

Copied to rgw - Backport #18563: jewel: leak from RGWMetaSyncShardCR::incremental_sync Resolved
Copied to rgw - Backport #18564: kraken: leak from RGWMetaSyncShardCR::incremental_sync Resolved

History

#1 Updated by Casey Bodley 9 months ago

  • Assignee set to Casey Bodley

#2 Updated by Casey Bodley 9 months ago

looking at http://qa-proxy.ceph.com/teuthology/sage-2016-12-16_16:52:37-rgw-master---basic-smithi/639993/remote/smithi007/log/rgw.client.1.log.gz, I see that incremental sync was running when radosgw was shut down:

2016-12-16 20:13:58.946062 34a4b700 20 meta sync: incremental_sync:1605: shard_id=59 mdlog_marker=1_1481918963.277466_1349.1 max_marker=1_1481918963.277466_1349.1 sync_marker.marker=1_1481918963.277466_1349.1 period_marker=
2016-12-16 20:13:58.946127 34a4b700 20 run: stack=0x7555d160 is io blocked
2016-12-16 20:13:59.320313 3c844700 -1 received  signal: Terminated from  PID: 7322 task name: /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/client.1.log --time-stamp=yes --tool=memcheck radosgw --rgw-frontends civetweb port=7281 --rgw-zone r1z1 -n client.1 -k /etc/ceph/ceph.client.1.keyring --log-file /var/log/ceph/rgw.client.1.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.client.1.sock --foreground  UID: 0
2016-12-16 20:13:59.322571 3c844700  1 handle_sigterm
2016-12-16 20:13:59.323874 8cc4980 -1 shutting down
2016-12-16 20:13:59.328940 3c844700  1 handle_sigterm set alarm for 120
2016-12-16 20:13:59.544463 34a4b700  0 ERROR: failed to clone shard, completion_mgr.get_next() returned ret=-125
2016-12-16 20:13:59.545388 34a4b700  5 run(): was stopped, exiting

This cancels the RGWMetaSyncShardCR before it can call collect_children() to drop its references to the stacks in the stack_to_pos map. RGWMetaSyncShardCR's destructor will need to make sure all references are dropped.

#3 Updated by Casey Bodley 9 months ago

  • Status changed from New to Need Review

#4 Updated by Casey Bodley 9 months ago

  • Backport set to jewel kraken

#5 Updated by Yehuda Sadeh 8 months ago

  • Status changed from Need Review to Pending Backport

#6 Updated by Casey Bodley 8 months ago

#7 Updated by Nathan Cutler 8 months ago

  • Copied to Backport #18563: jewel: leak from RGWMetaSyncShardCR::incremental_sync added

#8 Updated by Nathan Cutler 8 months ago

  • Copied to Backport #18564: kraken: leak from RGWMetaSyncShardCR::incremental_sync added

#9 Updated by Nathan Cutler 8 months ago

  • Backport changed from jewel kraken to jewel, kraken

#10 Updated by Sage Weil 7 months ago

  • Priority changed from Immediate to Urgent

#11 Updated by Casey Bodley 7 months ago

both backports are resolved, can we close this one?

#12 Updated by Nathan Cutler 6 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF