Project

General

Profile

Bug #57165

expected valgrind issues and found none

Added by Nitzan Mordechai about 1 year ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-08-16T16:25:31.998 INFO:tasks.ceph:Checking for errors in any valgrind logs...
2022-08-16T16:25:31.999 DEBUG:teuthology.orchestra.run.smithi202:> sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq
2022-08-16T16:25:32.087 INFO:tasks.ceph:Archiving crash dumps...
2022-08-16T16:25:32.089 DEBUG:teuthology.misc:Transferring archived files from smithi202:/var/lib/ceph/crash to /home/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390/remote/smithi202/crash
2022-08-16T16:25:32.091 DEBUG:teuthology.orchestra.run.smithi202:> sudo tar cz -f - -C /var/lib/ceph/crash -- .
2022-08-16T16:25:32.121 INFO:tasks.ceph:Compressing logs...
2022-08-16T16:25:32.122 DEBUG:teuthology.orchestra.run.smithi202:> sudo find /var/log/ceph -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --
2022-08-16T16:25:32.483 INFO:tasks.ceph:Archiving logs...
2022-08-16T16:25:32.484 DEBUG:teuthology.misc:Transferring archived files from smithi202:/var/log/ceph to /home/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390/remote/smithi202/log
2022-08-16T16:25:32.487 DEBUG:teuthology.orchestra.run.smithi202:> sudo tar cz -f - -C /var/log/ceph -- .
2022-08-16T16:25:32.639 ERROR:teuthology.run_tasks:Manager failed: ceph
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/run_tasks.py", line 188, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 1922, in task
    check_status=False,
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 55, in nested
    raise exc[1]
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 251, in ceph_log
    yield
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 47, in nested
    if exit(*exc):
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 359, in valgrind_post
    raise Exception('expected valgrind issues and found none')

/a/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390


Related issues

Copied to RADOS - Backport #57346: quincy: expected valgrind issues and found none Resolved

History

#1 Updated by Matan Breizman about 1 year ago

/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973682

#2 Updated by Laura Flores about 1 year ago

/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197

#3 Updated by Laura Flores about 1 year ago

To me, this seems like a Teuthology failure. Perhaps Zack Cerza can rule this theory in/out.

In any case, it looks like the mgr is failing due to:

/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197/remote/smithi125/ceph-mgr.x.log.gz

2022-08-22T21:31:15.511+0000 7f1a27f90700 -1 received  signal: Terminated from /usr/bin/python3 /bin/daemon-helper term ceph-mgr -f --cluster ceph -i x  (PID: 41266) UID: 0
2022-08-22T21:31:15.511+0000 7f1a27f90700 -1 mgr handle_mgr_signal  *** Got signal Terminated ***

#4 Updated by Zack Cerza about 1 year ago

What I'm seeing is that the jobs in question were told to expect valgrind errors via the expect_valgrind_errors: true item in their job configs. They didn't find any, so they failed the job. Here's me doing a simpler version of what the ceph task does:

$ d=/a/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390
$ zgrep -l 'kind' $d/remote/*/log/valgrind/*
$ d=/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
$ zgrep -l 'kind' $d/remote/*/log/valgrind/*
$ d=/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973682
$ zgrep -l 'kind' $d/remote/*/log/valgrind/*
$ d=/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197
$ zgrep -l 'kind' $d/remote/*/log/valgrind/*

#5 Updated by Ernesto Puerta about 1 year ago

  • Tags set to test-failure

#6 Updated by Radoslaw Zarzynski about 1 year ago

  • Assignee set to Nitzan Mordechai
  • Priority changed from Normal to High

Bumped the priority up as I'm afraid the longer we wait with ensuring valgrind is fully operational, the greater is the risk we'll be bombed with leak reports after restoring it.

#7 Updated by Nitzan Mordechai about 1 year ago

we are leaking moemory with "ceph tell mon.a leak_some_memory" for some reason we are not seeing any memory leak in valgrind logs.
i checked with and without tcmalloc - both not showing any memory leak.
i removed the valgrind.supp file, i couldn't spot the memory leak as well. something causing that leak to disappear

#8 Updated by Nitzan Mordechai about 1 year ago

  • Status changed from New to In Progress

#9 Updated by Nitzan Mordechai about 1 year ago

This is a memory optimization "fault" - the new gcc causing that to not leak the memory that we are trying to leak.

#10 Updated by Nitzan Mordechai about 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 47802

#11 Updated by Laura Flores about 1 year ago

  • Backport set to quincy

#12 Updated by Laura Flores about 1 year ago

/a/yuriw-2022-08-17_19:34:54-rados-wip-yuri7-testing-2022-08-17-0943-quincy-distro-default-smithi/6977767

#13 Updated by Kefu Chai about 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#14 Updated by Backport Bot about 1 year ago

  • Copied to Backport #57346: quincy: expected valgrind issues and found none added

#15 Updated by Backport Bot about 1 year ago

  • Tags set to backport_processed

#17 Updated by Nitzan Mordechai 4 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF