Bug #62777
rados/valgrind-leaks: expected valgrind issues and found none
0%
Description
rados/valgrind-leaks/{1-start 2-inject-leak/mon centos_latest}
/a/yuriw-2023-08-11_02:49:40-rados-wip-yuri4-testing-2023-08-10-1739-distro-default-smithi/7366916
2023-08-11T09:05:29.545 ERROR:teuthology.run_tasks:Manager failed: ceph
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_2d91d6813480a3969a4f052fc486a43386694206/qa/tasks/ceph.py", line 328, in valgrind_post
yield
File "/home/teuthworker/src/git.ceph.com_teuthology_7fda95956ac10132c9b74016ba832db907df09fa/teuthology/contextutil.py", line 46, in nested
if exit(*exc):
File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_2d91d6813480a3969a4f052fc486a43386694206/qa/tasks/ceph.py", line 1471, in run_daemon
teuthology.stop_daemons_of_type(ctx, type_, cluster_name)
File "/home/teuthworker/src/git.ceph.com_teuthology_7fda95956ac10132c9b74016ba832db907df09fa/teuthology/misc.py", line 1171, in stop_daemons_of_type
daemon.stop()
File "/home/teuthworker/src/git.ceph.com_teuthology_7fda95956ac10132c9b74016ba832db907df09fa/teuthology/orchestra/daemon/state.py", line 139, in stop
run.wait([self.proc], timeout=timeout)
File "/home/teuthworker/src/git.ceph.com_teuthology_7fda95956ac10132c9b74016ba832db907df09fa/teuthology/orchestra/run.py", line 473, in wait
check_time()
File "/home/teuthworker/src/git.ceph.com_teuthology_7fda95956ac10132c9b74016ba832db907df09fa/teuthology/contextutil.py", line 134, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (51) after waiting for 300 seconds
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_7fda95956ac10132c9b74016ba832db907df09fa/teuthology/run_tasks.py", line 154, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_2d91d6813480a3969a4f052fc486a43386694206/qa/tasks/ceph.py", line 1957, in task
mon0_remote.run(
File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/home/teuthworker/src/git.ceph.com_teuthology_7fda95956ac10132c9b74016ba832db907df09fa/teuthology/contextutil.py", line 54, in nested
raise exc[1]
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/home/teuthworker/src/github.com_ceph_ceph-c_2d91d6813480a3969a4f052fc486a43386694206/qa/tasks/ceph.py", line 251, in ceph_log
yield
File "/home/teuthworker/src/git.ceph.com_teuthology_7fda95956ac10132c9b74016ba832db907df09fa/teuthology/contextutil.py", line 46, in nested
if exit(*exc):
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/home/teuthworker/src/github.com_ceph_ceph-c_2d91d6813480a3969a4f052fc486a43386694206/qa/tasks/ceph.py", line 365, in valgrind_post
raise Exception('expected valgrind issues and found none')
Exception: expected valgrind issues and found none
History
#1 Updated by Radoslaw Zarzynski 3 months ago
Yeah, we have a test intentionally causing a leak just to ensure valgrind truly works.
I wonder what might if this tests fails before the place where the leak is made (due to e.g. network issues).
In the snipper:
teuthology.exceptions.MaxWhileTries: reached maximum tries (51) after waiting for 300 seconds
Let's keep an eye but if the hypothesis is correct, these errors will be very, very infrequent.
#2 Updated by Nitzan Mordechai 3 months ago
Also, the monitors didn't stop, we are checking valgring logs of running process (the memory leak error will show only after the process done and the leak was found)
2023-08-11T09:00:28.441 INFO:teuthology.misc:Shutting down mon daemons... 2023-08-11T09:00:28.442 DEBUG:tasks.ceph.mon.a:waiting for process to exit 2023-08-11T09:00:28.442 INFO:teuthology.orchestra.run:waiting for 300 2023-08-11T09:00:28.506 INFO:tasks.ceph.mon.a.smithi130.stderr:2023-08-11T09:00:28.492+0000 9f3c640 -1 received signal: Terminated from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=*tcmall oc* --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/mon.a.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck --leak-check=full --show-reachable=yes ceph-mon -f --cluster ceph -i a (PID: 84154) UID: 0 2023-08-11T09:00:28.507 INFO:tasks.ceph.mon.a.smithi130.stderr:2023-08-11T09:00:28.494+0000 9f3c640 -1 mon.a@0(leader) e1 *** Got Signal Terminated *** 2023-08-11T09:00:31.745 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.2 has been restored 2023-08-11T09:05:27.511 INFO:tasks.ceph:Checking cluster log for badness...
#3 Updated by Aishwarya Mathuria 2 months ago
/a/yuriw-2023-10-05_21:43:37-rados-wip-yuri6-testing-2023-10-04-0901-distro-default-smithi/7412032
#4 Updated by Radoslaw Zarzynski about 2 months ago
Hi Nitzan!
IIUC the test doesn't properly wait for exit of the process. Am I correct?
(this sounds like a nasty test issue).
#5 Updated by Nitzan Mordechai 4 days ago
Radoslaw Zarzynski wrote:
Hi Nitzan!
IIUC the test doesn't properly wait for exit of the process. Am I correct?
(this sounds like a nasty test issue).
Now that I check it again, we actually exist on the first error, which is different from the leak we expected.
We already have fix that didn't merge yet: https://tracker.ceph.com/issues/61774