Project

General

Profile

Actions

Bug #57618

closed

rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)

Added by Nitzan Mordechai over 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

100%

Source:
Community (dev)
Tags:
backport_processed
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Job stopped with

2022-09-15T12:49:22.055 INFO:tasks.workunit.client.0.smithi150.stdout:              api_tier_pp: [==========] 77 tests from 4 test suites ran. (1973701 ms total)
2022-09-15T12:49:22.056 INFO:tasks.workunit.client.0.smithi150.stdout:              api_tier_pp: [  PASSED  ] 77 tests.
2022-09-15T12:49:22.056 INFO:tasks.workunit.client.0.smithi150.stderr:+ exit 1
2022-09-15T12:49:22.057 INFO:tasks.workunit.client.0.smithi150.stderr:+ cleanup
2022-09-15T12:49:22.057 INFO:tasks.workunit.client.0.smithi150.stderr:+ pkill -P 120635
2022-09-15T12:49:22.058 DEBUG:teuthology.orchestra.run:got remote process result: 1

since we error in:

2022-09-15T12:37:48.615 DEBUG:teuthology.orchestra.run.smithi150:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight
2022-09-15T12:37:48.653 INFO:tasks.workunit.client.0.smithi150.stdout:                  api_watch_notify_pp: Running main() from gmock_main.cc
2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout:      api_watch_notify_pp: [==========] Running 16 tests from 2 test suites.
2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout:      api_watch_notify_pp: [----------] Global test environment set-up.
2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout:      api_watch_notify_pp: [----------] 2 tests from LibRadosWatchNotifyECPP
2022-09-15T12:37:48.655 INFO:tasks.workunit.client.0.smithi150.stdout:      api_watch_notify_pp: [ RUN      ] LibRadosWatchNotifyECPP.WatchNotify
2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:bash: line 1: 120883 Alarm clock             ceph_test_rados_api_watch_notify_pp 2>&1
2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:     120884 Done                    | tee ceph_test_rados_api_watch_notify_pp.log
2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:     120885 Done                    | sed "s/^/      api_watch_notify_pp: /" 
2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:+ echo 'error in api_watch_notify_pp (120879)'
2022-09-15T12:37:48.657 INFO:tasks.workunit.client.0.smithi150.stdout:error in api_watch_notify_pp (120879)

Alarm clock raised after 1200 seconds, the printing log was delay until the process killed.

/a/nmordech-2022-09-15_08:35:17-rados:verify-wip-nm-51282-distro-default-smithi/7033827


Related issues 5 (0 open5 closed)

Related to RADOS - Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.ResolvedNitzan Mordechai

Actions
Related to RADOS - Bug #57751: LibRadosAio.SimpleWritePP hang and pkillResolvedNitzan Mordechai

Actions
Related to RADOS - Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64ResolvedNitzan Mordechai

Actions
Copied to RADOS - Backport #59627: quincy: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)ResolvedNitzan MordechaiActions
Copied to RADOS - Backport #59628: pacific: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)ResolvedNitzan MordechaiActions
Actions #1

Updated by Nitzan Mordechai over 1 year ago

It will only happen with EC pools, the hang will happen when not all osd are up, but still, i'm not sure if we suppose to wait

Actions #2

Updated by Radoslaw Zarzynski over 1 year ago

Note from a scrub: might we worth talking about.

Actions #3

Updated by Nitzan Mordechai over 1 year ago

Some of the OSDs stopped due to valgrind errors. This is duplicate of other bug

Actions #4

Updated by Nitzan Mordechai over 1 year ago

  • Pull request ID set to 48641
Actions #5

Updated by Nitzan Mordechai over 1 year ago

  • Status changed from New to Fix Under Review
Actions #6

Updated by Nitzan Mordechai over 1 year ago

  • Related to Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors. added
Actions #7

Updated by Nitzan Mordechai over 1 year ago

  • Related to Bug #57751: LibRadosAio.SimpleWritePP hang and pkill added
Actions #8

Updated by Nitzan Mordechai over 1 year ago

  • Related to Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64 added
Actions #9

Updated by Laura Flores over 1 year ago

/a/yuriw-2022-11-29_22:29:58-rados-wip-yuri10-testing-2022-11-29-1005-pacific-distro-default-smithi/7097464/

Actions #10

Updated by Laura Flores 12 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #11

Updated by Laura Flores 12 months ago

  • Backport set to pacific,quincy
Actions #12

Updated by Backport Bot 12 months ago

  • Copied to Backport #59627: quincy: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify) added
Actions #13

Updated by Backport Bot 12 months ago

  • Copied to Backport #59628: pacific: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify) added
Actions #14

Updated by Backport Bot 12 months ago

  • Tags set to backport_processed
Actions #15

Updated by Nitzan Mordechai 12 months ago

  • Backport changed from pacific,quincy to pacific
Actions #16

Updated by Konstantin Shalygin 3 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
  • Source set to Community (dev)
Actions

Also available in: Atom PDF