Bug #57618
rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Job stopped with
2022-09-15T12:49:22.055 INFO:tasks.workunit.client.0.smithi150.stdout: api_tier_pp: [==========] 77 tests from 4 test suites ran. (1973701 ms total) 2022-09-15T12:49:22.056 INFO:tasks.workunit.client.0.smithi150.stdout: api_tier_pp: [ PASSED ] 77 tests. 2022-09-15T12:49:22.056 INFO:tasks.workunit.client.0.smithi150.stderr:+ exit 1 2022-09-15T12:49:22.057 INFO:tasks.workunit.client.0.smithi150.stderr:+ cleanup 2022-09-15T12:49:22.057 INFO:tasks.workunit.client.0.smithi150.stderr:+ pkill -P 120635 2022-09-15T12:49:22.058 DEBUG:teuthology.orchestra.run:got remote process result: 1
since we error in:
2022-09-15T12:37:48.615 DEBUG:teuthology.orchestra.run.smithi150:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight 2022-09-15T12:37:48.653 INFO:tasks.workunit.client.0.smithi150.stdout: api_watch_notify_pp: Running main() from gmock_main.cc 2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout: api_watch_notify_pp: [==========] Running 16 tests from 2 test suites. 2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout: api_watch_notify_pp: [----------] Global test environment set-up. 2022-09-15T12:37:48.654 INFO:tasks.workunit.client.0.smithi150.stdout: api_watch_notify_pp: [----------] 2 tests from LibRadosWatchNotifyECPP 2022-09-15T12:37:48.655 INFO:tasks.workunit.client.0.smithi150.stdout: api_watch_notify_pp: [ RUN ] LibRadosWatchNotifyECPP.WatchNotify 2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:bash: line 1: 120883 Alarm clock ceph_test_rados_api_watch_notify_pp 2>&1 2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr: 120884 Done | tee ceph_test_rados_api_watch_notify_pp.log 2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr: 120885 Done | sed "s/^/ api_watch_notify_pp: /" 2022-09-15T12:37:48.656 INFO:tasks.workunit.client.0.smithi150.stderr:+ echo 'error in api_watch_notify_pp (120879)' 2022-09-15T12:37:48.657 INFO:tasks.workunit.client.0.smithi150.stdout:error in api_watch_notify_pp (120879)
Alarm clock raised after 1200 seconds, the printing log was delay until the process killed.
/a/nmordech-2022-09-15_08:35:17-rados:verify-wip-nm-51282-distro-default-smithi/7033827
Related issues
History
#1 Updated by Nitzan Mordechai 4 months ago
It will only happen with EC pools, the hang will happen when not all osd are up, but still, i'm not sure if we suppose to wait
#2 Updated by Radoslaw Zarzynski 4 months ago
Note from a scrub: might we worth talking about.
#3 Updated by Nitzan Mordechai 3 months ago
Some of the OSDs stopped due to valgrind errors. This is duplicate of other bug
#4 Updated by Nitzan Mordechai 3 months ago
- Pull request ID set to 48641
#5 Updated by Nitzan Mordechai 3 months ago
- Status changed from New to Fix Under Review
#6 Updated by Nitzan Mordechai 3 months ago
- Related to Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors. added
#7 Updated by Nitzan Mordechai 3 months ago
- Related to Bug #57751: LibRadosAio.SimpleWritePP hang and pkill added
#8 Updated by Nitzan Mordechai 3 months ago
- Related to Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64 added
#9 Updated by Laura Flores about 2 months ago
/a/yuriw-2022-11-29_22:29:58-rados-wip-yuri10-testing-2022-11-29-1005-pacific-distro-default-smithi/7097464/