Bug #63501: ceph::common::leak_some_memory() got interpreted as an actual leak - RADOS - Ceph

Actions

Copy link

Bug #63501

open

ceph::common::leak_some_memory() got interpreted as an actual leak

Added by Radoslaw Zarzynski 6 months ago. Updated 4 months ago.

Status:

New

Priority:

Normal

Assignee:

Nitzan Mordechai

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

1. We run the osd.0 under valgrind in the exit-on-first-error mode,
2. then we provoke a leak by ceph tell osd.0 leak_some_memory to finally
3. rise an error on this particular leak.

It looks like a test issue!

rzarzynski@teuthology:/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448483$ less teuthology.log
...
2023-11-05T23:47:17.425 DEBUG:teuthology.orchestra.run.smithi061:> sudo TESTDIR=/home/ubuntu/cephtest bash -c 'ceph tell osd.0 leak_some_memory'
...
2023-11-05T23:48:06.449 INFO:tasks.ceph.osd.0.smithi061.stderr:2023-11-05T23:48:06.438+0000 ac7d640 -1 received  signal: Terminated from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=*tcmalloc* --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.0.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 0  (PID: 84417) UID: 0
2023-11-05T23:48:06.449 INFO:tasks.ceph.osd.0.smithi061.stderr:2023-11-05T23:48:06.439+0000 ac7d640 -1 osd.0 14 *** Got signal Terminated ***
2023-11-05T23:48:08.849 INFO:tasks.ceph.osd.0.smithi061.stderr:==00:00:01:46.767 84427==
2023-11-05T23:48:08.849 INFO:tasks.ceph.osd.0.smithi061.stderr:==00:00:01:46.767 84427== Exit program on first error (--exit-on-first-error=yes)
2023-11-05T23:48:08.862 INFO:tasks.ceph.osd.0.smithi061.stderr:daemon-helper: command failed with exit status 42
2023-11-05T23:48:12.538 DEBUG:teuthology.orchestra.run:got remote process result: 42
2023-11-05T23:48:12.539 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_6899cd26fceddb2fec83dc1a1349394b28c8998e/teuthology/orchestra/daemon/state.py", line 139, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_6899cd26fceddb2fec83dc1a1349394b28c8998e/teuthology/orchestra/run.py", line 479, in wait
    proc.wait()
  File "/home/teuthworker/src/git.ceph.com_teuthology_6899cd26fceddb2fec83dc1a1349394b28c8998e/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_teuthology_6899cd26fceddb2fec83dc1a1349394b28c8998e/teuthology/orchestra/run.py", line 181, in _raise_for_status
    raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on smithi061 with status 42: "cd /home/ubuntu/cephtest && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term env 'OPENSSL_ia32cap=~0x1000000000000000' valgrind --trace-children=no --child-silent-after-fork=yes '--soname-synonyms=somalloc=*tcmalloc*' --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.0.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 0" 
2023-11-05T23:48:12.539 INFO:tasks.ceph.osd.0:Stopped

rzarzynski@teuthology:/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448483$ less ./remote/smithi061/log/valgrind/osd.0.log.gz
...
<error>
  <unique>0x1dd25</unique>
  <tid>1</tid>
  <kind>Leak_DefinitelyLost</kind>
  <xwhat>
    <text>1,234 bytes in 1 blocks are definitely lost in loss record 198 of 201</text>
    <leakedbytes>1234</leakedbytes>
    <leakedblocks>1</leakedblocks>
  </xwhat>
  <stack>
    <frame>
      <ip>0x48462F3</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator new[](unsigned long)</fn>
      <dir>/builddir/build/BUILD/valgrind-3.21.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>714</line>
    </frame>
    <frame>
      <ip>0xBB6605</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::common::leak_some_memory()</fn>
      <dir>/usr/src/debug/ceph-18.2.0-1181.gf7e9f9af.el9.x86_64/src/common</dir>
      <file>ceph_context.cc</file>
      <line>510</line>
    </frame>
    <frame>
      <ip>0xBBF48C</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::common::CephContext::_do_command(std::basic_string_view&lt;char, std::char_traits&lt;char&gt; &gt;, std::map&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, boost::variant&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, bool, long, double, std::vector&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::allocator&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; &gt;, std::vector&lt;long, std::allocator&lt;long&gt; &gt;, std::vector&lt;double, std::allocator&lt;double&gt; &gt; &gt;, std::less&lt;void&gt;, std::allocator&lt;std::pair&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const, boost::variant&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, bool, long, double, std::vector&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::allocator&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; &gt;, std::vector&lt;long, std::allocator&lt;long&gt; &gt;, std::vector&lt;double, std::allocator&lt;double&gt; &gt; &gt; &gt; &gt; &gt; const&amp;, ceph::Formatter*, std::ostream&amp;, ceph::buffer::v15_2_0::list*)</fn>
      <dir>/usr/src/debug/ceph-18.2.0-1181.gf7e9f9af.el9.x86_64/src/common</dir>
      <file>ceph_context.cc</file>
      <line>533</line>
    </frame>

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Radoslaw Zarzynski 6 months ago

Related to Bug #57165: expected valgrind issues and found none added

Actions

Copy link

Updated by Radoslaw Zarzynski 6 months ago

Assignee set to Nitzan Mordechai

Actions

Copy link

Updated by Nitzan Mordechai 6 months ago

Looks like it is the correct response, valgrind caught the leak, but exit-on-first-error caused the mons and osds to be killed. Which caused the restart to retry to start them and failed.

Actions

Copy link

Updated by Nitzan Mordechai 6 months ago

Looks like it related to bug https://github.com/ceph/ceph/pull/52639
its not something with leak_some_memory

Actions

Copy link

Updated by Aishwarya Mathuria 4 months ago

/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505678/

Actions

Copy link

Updated by Nitzan Mordechai 4 months ago

Aishwarya Mathuria wrote:

/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505678/

The monitor got valgrind errors (PR https://github.com/ceph/ceph/pull/52639 will fix it) and it stopped the osds who didn't had a chance to leak some memory yet.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #63501

ceph::common::leak_some_memory() got interpreted as an actual leak

Updated by Radoslaw Zarzynski 6 months ago

Updated by Radoslaw Zarzynski 6 months ago

Updated by Nitzan Mordechai 6 months ago

Updated by Nitzan Mordechai 6 months ago

Updated by Aishwarya Mathuria 4 months ago

Updated by Nitzan Mordechai 4 months ago