Bug #63501
openceph::common::leak_some_memory() got interpreted as an actual leak
0%
Description
1. We run the osd.0 under valgrind in the exit-on-first-error
mode,
2. then we provoke a leak by ceph tell osd.0 leak_some_memory
to finally
3. rise an error on this particular leak.
It looks like a test issue!
rzarzynski@teuthology:/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448483$ less teuthology.log ... 2023-11-05T23:47:17.425 DEBUG:teuthology.orchestra.run.smithi061:> sudo TESTDIR=/home/ubuntu/cephtest bash -c 'ceph tell osd.0 leak_some_memory' ... 2023-11-05T23:48:06.449 INFO:tasks.ceph.osd.0.smithi061.stderr:2023-11-05T23:48:06.438+0000 ac7d640 -1 received signal: Terminated from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=*tcmalloc* --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.0.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 0 (PID: 84417) UID: 0 2023-11-05T23:48:06.449 INFO:tasks.ceph.osd.0.smithi061.stderr:2023-11-05T23:48:06.439+0000 ac7d640 -1 osd.0 14 *** Got signal Terminated *** 2023-11-05T23:48:08.849 INFO:tasks.ceph.osd.0.smithi061.stderr:==00:00:01:46.767 84427== 2023-11-05T23:48:08.849 INFO:tasks.ceph.osd.0.smithi061.stderr:==00:00:01:46.767 84427== Exit program on first error (--exit-on-first-error=yes) 2023-11-05T23:48:08.862 INFO:tasks.ceph.osd.0.smithi061.stderr:daemon-helper: command failed with exit status 42 2023-11-05T23:48:12.538 DEBUG:teuthology.orchestra.run:got remote process result: 42 2023-11-05T23:48:12.539 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_teuthology_6899cd26fceddb2fec83dc1a1349394b28c8998e/teuthology/orchestra/daemon/state.py", line 139, in stop run.wait([self.proc], timeout=timeout) File "/home/teuthworker/src/git.ceph.com_teuthology_6899cd26fceddb2fec83dc1a1349394b28c8998e/teuthology/orchestra/run.py", line 479, in wait proc.wait() File "/home/teuthworker/src/git.ceph.com_teuthology_6899cd26fceddb2fec83dc1a1349394b28c8998e/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_teuthology_6899cd26fceddb2fec83dc1a1349394b28c8998e/teuthology/orchestra/run.py", line 181, in _raise_for_status raise CommandFailedError( teuthology.exceptions.CommandFailedError: Command failed on smithi061 with status 42: "cd /home/ubuntu/cephtest && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term env 'OPENSSL_ia32cap=~0x1000000000000000' valgrind --trace-children=no --child-silent-after-fork=yes '--soname-synonyms=somalloc=*tcmalloc*' --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.0.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 0" 2023-11-05T23:48:12.539 INFO:tasks.ceph.osd.0:Stopped
rzarzynski@teuthology:/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448483$ less ./remote/smithi061/log/valgrind/osd.0.log.gz ... <error> <unique>0x1dd25</unique> <tid>1</tid> <kind>Leak_DefinitelyLost</kind> <xwhat> <text>1,234 bytes in 1 blocks are definitely lost in loss record 198 of 201</text> <leakedbytes>1234</leakedbytes> <leakedblocks>1</leakedblocks> </xwhat> <stack> <frame> <ip>0x48462F3</ip> <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj> <fn>operator new[](unsigned long)</fn> <dir>/builddir/build/BUILD/valgrind-3.21.0/coregrind/m_replacemalloc</dir> <file>vg_replace_malloc.c</file> <line>714</line> </frame> <frame> <ip>0xBB6605</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::common::leak_some_memory()</fn> <dir>/usr/src/debug/ceph-18.2.0-1181.gf7e9f9af.el9.x86_64/src/common</dir> <file>ceph_context.cc</file> <line>510</line> </frame> <frame> <ip>0xBBF48C</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::common::CephContext::_do_command(std::basic_string_view<char, std::char_traits<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, ceph::Formatter*, std::ostream&, ceph::buffer::v15_2_0::list*)</fn> <dir>/usr/src/debug/ceph-18.2.0-1181.gf7e9f9af.el9.x86_64/src/common</dir> <file>ceph_context.cc</file> <line>533</line> </frame>
Updated by Radoslaw Zarzynski 6 months ago
- Related to Bug #57165: expected valgrind issues and found none added
Updated by Nitzan Mordechai 6 months ago
Looks like it is the correct response, valgrind caught the leak, but exit-on-first-error caused the mons and osds to be killed. Which caused the restart to retry to start them and failed.
Updated by Nitzan Mordechai 6 months ago
Looks like it related to bug https://github.com/ceph/ceph/pull/52639
its not something with leak_some_memory
Updated by Aishwarya Mathuria 4 months ago
/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505678/
Updated by Nitzan Mordechai 4 months ago
Aishwarya Mathuria wrote:
/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505678/
The monitor got valgrind errors (PR https://github.com/ceph/ceph/pull/52639 will fix it) and it stopped the osds who didn't had a chance to leak some memory yet.