Project

General

Profile

Actions

Bug #57165

closed

expected valgrind issues and found none

Added by Nitzan Mordechai over 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-08-16T16:25:31.998 INFO:tasks.ceph:Checking for errors in any valgrind logs...
2022-08-16T16:25:31.999 DEBUG:teuthology.orchestra.run.smithi202:> sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq
2022-08-16T16:25:32.087 INFO:tasks.ceph:Archiving crash dumps...
2022-08-16T16:25:32.089 DEBUG:teuthology.misc:Transferring archived files from smithi202:/var/lib/ceph/crash to /home/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390/remote/smithi202/crash
2022-08-16T16:25:32.091 DEBUG:teuthology.orchestra.run.smithi202:> sudo tar cz -f - -C /var/lib/ceph/crash -- .
2022-08-16T16:25:32.121 INFO:tasks.ceph:Compressing logs...
2022-08-16T16:25:32.122 DEBUG:teuthology.orchestra.run.smithi202:> sudo find /var/log/ceph -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --
2022-08-16T16:25:32.483 INFO:tasks.ceph:Archiving logs...
2022-08-16T16:25:32.484 DEBUG:teuthology.misc:Transferring archived files from smithi202:/var/log/ceph to /home/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390/remote/smithi202/log
2022-08-16T16:25:32.487 DEBUG:teuthology.orchestra.run.smithi202:> sudo tar cz -f - -C /var/log/ceph -- .
2022-08-16T16:25:32.639 ERROR:teuthology.run_tasks:Manager failed: ceph
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/run_tasks.py", line 188, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 1922, in task
    check_status=False,
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 55, in nested
    raise exc[1]
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 251, in ceph_log
    yield
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 47, in nested
    if exit(*exc):
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 359, in valgrind_post
    raise Exception('expected valgrind issues and found none')

/a/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390


Related issues 2 (1 open1 closed)

Related to RADOS - Bug #63501: ceph::common::leak_some_memory() got interpreted as an actual leakNewNitzan Mordechai

Actions
Copied to RADOS - Backport #57346: quincy: expected valgrind issues and found noneResolvedRadoslaw ZarzynskiActions
Actions #1

Updated by Matan Breizman over 1 year ago

/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973682

Actions #2

Updated by Laura Flores over 1 year ago

/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197

Actions #3

Updated by Laura Flores over 1 year ago

To me, this seems like a Teuthology failure. Perhaps Zack Cerza can rule this theory in/out.

In any case, it looks like the mgr is failing due to:

/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197/remote/smithi125/ceph-mgr.x.log.gz

2022-08-22T21:31:15.511+0000 7f1a27f90700 -1 received  signal: Terminated from /usr/bin/python3 /bin/daemon-helper term ceph-mgr -f --cluster ceph -i x  (PID: 41266) UID: 0
2022-08-22T21:31:15.511+0000 7f1a27f90700 -1 mgr handle_mgr_signal  *** Got signal Terminated ***

Actions #4

Updated by Zack Cerza over 1 year ago

What I'm seeing is that the jobs in question were told to expect valgrind errors via the expect_valgrind_errors: true item in their job configs. They didn't find any, so they failed the job. Here's me doing a simpler version of what the ceph task does:

$ d=/a/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390
$ zgrep -l 'kind' $d/remote/*/log/valgrind/*
$ d=/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
$ zgrep -l 'kind' $d/remote/*/log/valgrind/*
$ d=/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973682
$ zgrep -l 'kind' $d/remote/*/log/valgrind/*
$ d=/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197
$ zgrep -l 'kind' $d/remote/*/log/valgrind/*
Actions #5

Updated by Ernesto Puerta over 1 year ago

  • Translation missing: en.field_tag_list set to test-failure
Actions #6

Updated by Radoslaw Zarzynski over 1 year ago

  • Assignee set to Nitzan Mordechai
  • Priority changed from Normal to High

Bumped the priority up as I'm afraid the longer we wait with ensuring valgrind is fully operational, the greater is the risk we'll be bombed with leak reports after restoring it.

Actions #7

Updated by Nitzan Mordechai over 1 year ago

we are leaking moemory with "ceph tell mon.a leak_some_memory" for some reason we are not seeing any memory leak in valgrind logs.
i checked with and without tcmalloc - both not showing any memory leak.
i removed the valgrind.supp file, i couldn't spot the memory leak as well. something causing that leak to disappear

Actions #8

Updated by Nitzan Mordechai over 1 year ago

  • Status changed from New to In Progress
Actions #9

Updated by Nitzan Mordechai over 1 year ago

This is a memory optimization "fault" - the new gcc causing that to not leak the memory that we are trying to leak.

Actions #10

Updated by Nitzan Mordechai over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 47802
Actions #11

Updated by Laura Flores over 1 year ago

  • Backport set to quincy
Actions #12

Updated by Laura Flores over 1 year ago

/a/yuriw-2022-08-17_19:34:54-rados-wip-yuri7-testing-2022-08-17-0943-quincy-distro-default-smithi/6977767

Actions #13

Updated by Kefu Chai over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #14

Updated by Backport Bot over 1 year ago

  • Copied to Backport #57346: quincy: expected valgrind issues and found none added
Actions #15

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #17

Updated by Nitzan Mordechai 10 months ago

  • Status changed from Pending Backport to Resolved
Actions #18

Updated by Radoslaw Zarzynski 5 months ago

  • Related to Bug #63501: ceph::common::leak_some_memory() got interpreted as an actual leak added
Actions

Also available in: Atom PDF