Project

General

Profile

Bug #57618

rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)

Added by Nitzan Mordechai 4 months ago. Updated about 2 months ago.

Status:
Fix Under Review
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Job stopped with

2022-09-15T12:49:22.055 INFO:tasks.workunit.client.0.smithi150.stdout:              api_tier_pp: [==========] 77 tests from 4 test suites ran. (1973701 ms total)
2022-09-15T12:49:22.056 INFO:tasks.workunit.client.0.smithi150.stdout:              api_tier_pp: [  PASSED  ] 77 tests.
2022-09-15T12:49:22.056 INFO:tasks.workunit.client.0.smithi150.stderr:+ exit 1
2022-09-15T12:49:22.057 INFO:tasks.workunit.client.0.smithi150.stderr:+ cleanup
2022-09-15T12:49:22.057 INFO:tasks.workunit.client.0.smithi150.stderr:+ pkill -P 120635
2022-09-15T12:49:22.058 DEBUG:teuthology.orchestra.run:got remote process result: 1

since we error in:

2022-09-15T12:37:48.615 DEBUG:teuthology.orchestra.run.smithi150:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight
2022-09-15T12:37:48.653 INFO:tasks.workunit.client.0.smithi150.stdout:                  api_watch_notify_pp: Running main() from gmock_main.cc
2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout:      api_watch_notify_pp: [==========] Running 16 tests from 2 test suites.
2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout:      api_watch_notify_pp: [----------] Global test environment set-up.
2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout:      api_watch_notify_pp: [----------] 2 tests from LibRadosWatchNotifyECPP
2022-09-15T12:37:48.655 INFO:tasks.workunit.client.0.smithi150.stdout:      api_watch_notify_pp: [ RUN      ] LibRadosWatchNotifyECPP.WatchNotify
2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:bash: line 1: 120883 Alarm clock             ceph_test_rados_api_watch_notify_pp 2>&1
2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:     120884 Done                    | tee ceph_test_rados_api_watch_notify_pp.log
2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:     120885 Done                    | sed "s/^/      api_watch_notify_pp: /" 
2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:+ echo 'error in api_watch_notify_pp (120879)'
2022-09-15T12:37:48.657 INFO:tasks.workunit.client.0.smithi150.stdout:error in api_watch_notify_pp (120879)

Alarm clock raised after 1200 seconds, the printing log was delay until the process killed.

/a/nmordech-2022-09-15_08:35:17-rados:verify-wip-nm-51282-distro-default-smithi/7033827


Related issues

Related to RADOS - Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors. Pending Backport
Related to RADOS - Bug #57751: LibRadosAio.SimpleWritePP hang and pkill Resolved
Related to RADOS - Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64 Resolved

History

#1 Updated by Nitzan Mordechai 4 months ago

It will only happen with EC pools, the hang will happen when not all osd are up, but still, i'm not sure if we suppose to wait

#2 Updated by Radoslaw Zarzynski 4 months ago

Note from a scrub: might we worth talking about.

#3 Updated by Nitzan Mordechai 3 months ago

Some of the OSDs stopped due to valgrind errors. This is duplicate of other bug

#4 Updated by Nitzan Mordechai 3 months ago

  • Pull request ID set to 48641

#5 Updated by Nitzan Mordechai 3 months ago

  • Status changed from New to Fix Under Review

#6 Updated by Nitzan Mordechai 3 months ago

  • Related to Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors. added

#7 Updated by Nitzan Mordechai 3 months ago

  • Related to Bug #57751: LibRadosAio.SimpleWritePP hang and pkill added

#8 Updated by Nitzan Mordechai 3 months ago

  • Related to Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64 added

#9 Updated by Laura Flores about 2 months ago

/a/yuriw-2022-11-29_22:29:58-rados-wip-yuri10-testing-2022-11-29-1005-pacific-distro-default-smithi/7097464/

Also available in: Atom PDF