Bug #57165
closedexpected valgrind issues and found none
0%
Description
2022-08-16T16:25:31.998 INFO:tasks.ceph:Checking for errors in any valgrind logs... 2022-08-16T16:25:31.999 DEBUG:teuthology.orchestra.run.smithi202:> sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq 2022-08-16T16:25:32.087 INFO:tasks.ceph:Archiving crash dumps... 2022-08-16T16:25:32.089 DEBUG:teuthology.misc:Transferring archived files from smithi202:/var/lib/ceph/crash to /home/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390/remote/smithi202/crash 2022-08-16T16:25:32.091 DEBUG:teuthology.orchestra.run.smithi202:> sudo tar cz -f - -C /var/lib/ceph/crash -- . 2022-08-16T16:25:32.121 INFO:tasks.ceph:Compressing logs... 2022-08-16T16:25:32.122 DEBUG:teuthology.orchestra.run.smithi202:> sudo find /var/log/ceph -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip -- 2022-08-16T16:25:32.483 INFO:tasks.ceph:Archiving logs... 2022-08-16T16:25:32.484 DEBUG:teuthology.misc:Transferring archived files from smithi202:/var/log/ceph to /home/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390/remote/smithi202/log 2022-08-16T16:25:32.487 DEBUG:teuthology.orchestra.run.smithi202:> sudo tar cz -f - -C /var/log/ceph -- . 2022-08-16T16:25:32.639 ERROR:teuthology.run_tasks:Manager failed: ceph Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/run_tasks.py", line 188, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__ next(self.gen) File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 1922, in task check_status=False, File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__ next(self.gen) File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 55, in nested raise exc[1] File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__ self.gen.throw(type, value, traceback) File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 251, in ceph_log yield File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 47, in nested if exit(*exc): File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__ next(self.gen) File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 359, in valgrind_post raise Exception('expected valgrind issues and found none')
/a/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390
Updated by Matan Breizman over 1 year ago
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973682
Updated by Laura Flores over 1 year ago
/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197
Updated by Laura Flores over 1 year ago
To me, this seems like a Teuthology failure. Perhaps Zack Cerza can rule this theory in/out.
In any case, it looks like the mgr is failing due to:
/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197/remote/smithi125/ceph-mgr.x.log.gz
2022-08-22T21:31:15.511+0000 7f1a27f90700 -1 received signal: Terminated from /usr/bin/python3 /bin/daemon-helper term ceph-mgr -f --cluster ceph -i x (PID: 41266) UID: 0
2022-08-22T21:31:15.511+0000 7f1a27f90700 -1 mgr handle_mgr_signal *** Got signal Terminated ***
Updated by Zack Cerza over 1 year ago
What I'm seeing is that the jobs in question were told to expect valgrind errors via the expect_valgrind_errors: true
item in their job configs. They didn't find any, so they failed the job. Here's me doing a simpler version of what the ceph task does:
$ d=/a/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390 $ zgrep -l 'kind' $d/remote/*/log/valgrind/* $ d=/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889 $ zgrep -l 'kind' $d/remote/*/log/valgrind/* $ d=/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973682 $ zgrep -l 'kind' $d/remote/*/log/valgrind/* $ d=/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197 $ zgrep -l 'kind' $d/remote/*/log/valgrind/*
Updated by Ernesto Puerta over 1 year ago
- Translation missing: en.field_tag_list set to test-failure
Updated by Radoslaw Zarzynski over 1 year ago
- Assignee set to Nitzan Mordechai
- Priority changed from Normal to High
Bumped the priority up as I'm afraid the longer we wait with ensuring valgrind is fully operational, the greater is the risk we'll be bombed with leak reports after restoring it.
Updated by Nitzan Mordechai over 1 year ago
we are leaking moemory with "ceph tell mon.a leak_some_memory" for some reason we are not seeing any memory leak in valgrind logs.
i checked with and without tcmalloc - both not showing any memory leak.
i removed the valgrind.supp file, i couldn't spot the memory leak as well. something causing that leak to disappear
Updated by Nitzan Mordechai over 1 year ago
- Status changed from New to In Progress
Updated by Nitzan Mordechai over 1 year ago
This is a memory optimization "fault" - the new gcc causing that to not leak the memory that we are trying to leak.
Updated by Nitzan Mordechai over 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 47802
Updated by Laura Flores over 1 year ago
/a/yuriw-2022-08-17_19:34:54-rados-wip-yuri7-testing-2022-08-17-0943-quincy-distro-default-smithi/6977767
Updated by Kefu Chai over 1 year ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot over 1 year ago
- Copied to Backport #57346: quincy: expected valgrind issues and found none added
Updated by Yaarit Hatuka over 1 year ago
Updated by Nitzan Mordechai 10 months ago
- Status changed from Pending Backport to Resolved
Updated by Radoslaw Zarzynski 5 months ago
- Related to Bug #63501: ceph::common::leak_some_memory() got interpreted as an actual leak added