Bug #55001
closedrados/test.sh: Early exit right after LibRados global tests complete
0%
Description
This failure was previously tracked in issue #50042, but it has come up enough that it warrants its own Tracker. See issue #50042 for earlier instances.
/a/yuriw-2022-03-19_14:37:23-rados-wip-yuri6-testing-2022-03-18-1104-distro-default-smithi/6747001
2022-03-19T19:56:53.724 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: [----------] Global test environment tear-down
2022-03-19T19:56:53.725 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: [==========] 77 tests from 4 test suites ran. (3861916 ms total)
2022-03-19T19:56:53.725 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: [ PASSED ] 77 tests.
2022-03-19T19:56:53.726 INFO:tasks.workunit.client.0.smithi053.stderr:+ exit 1
2022-03-19T19:56:53.726 INFO:tasks.workunit.client.0.smithi053.stderr:+ cleanup
2022-03-19T19:56:53.726 INFO:tasks.workunit.client.0.smithi053.stderr:+ pkill -P 48582
2022-03-19T19:56:53.732 INFO:tasks.workunit.client.0.smithi053.stderr:+ true
2022-03-19T19:56:53.733 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-03-19T19:56:53.734 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.0...
2022-03-19T19:56:53.734 DEBUG:teuthology.orchestra.run.smithi053:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2022-03-19T19:56:53.776 INFO:tasks.ceph.osd.1.smithi053.stderr:2022-03-19T19:56:53.773+0000 ee74700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=*tcmalloc* --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.1.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 1 (PID: 268905) UID: 0
2022-03-19T19:56:53.778 INFO:teuthology.orchestra.run.smithi053.stdout:{
2022-03-19T19:56:53.778 INFO:teuthology.orchestra.run.smithi053.stdout: "success": ""
2022-03-19T19:56:53.779 INFO:teuthology.orchestra.run.smithi053.stdout:}
2022-03-19T19:56:53.843 INFO:tasks.ceph.osd.5.smithi125.stderr:2022-03-19T19:56:53.840+0000 ee74700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=*tcmalloc* --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.5.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 5 (PID: 107842) UID: 0
2022-03-19T19:56:53.861 INFO:teuthology.orchestra.run.smithi053.stderr:2022-03-19T19:56:53.858+0000 7f4aeaffd700 1 -- 172.21.15.53:0/1210551706 <== mon.1 v2:172.21.15.125:3300/0 6 ==== mon_command_ack([{"prefix": "get_command_descriptions"}]=0 v0) v1 ==== 72+0+162503 (secure 0 0 0) 0x7f4aec04fc60 con 0x7f4b040ae240
2022-03-19T19:56:53.862 INFO:teuthology.orchestra.run.smithi053.stderr:2022-03-19T19:56:53.858+0000 7f4aeaffd700 10 monclient: handle_mon_command_ack 1 [{"prefix": "get_command_descriptions"}]
2022-03-19T19:56:53.862 INFO:teuthology.orchestra.run.smithi053.stderr:2022-03-19T19:56:53.858+0000 7f4aeaffd700 10 monclient: _finish_command 1 = system:0
2022-03-19T19:56:53.912 INFO:teuthology.orchestra.run.smithi053.stdout:ERROR: (22) Invalid argument
2022-03-19T19:56:53.912 INFO:teuthology.orchestra.run.smithi053.stdout:op_tracker tracking is not enabled now, so no ops are tracked currently, even those get stuck. Please enable "osd_enable_op_tracker", and the tracker will start to track new ops received afterwards.
2022-03-19T19:56:53.925 DEBUG:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.6.asok dump_ops_in_flight
2022-03-19T19:56:53.944 INFO:tasks.ceph.osd.0.smithi053.stderr:2022-03-19T19:56:53.940+0000 ee74700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=*tcmalloc* --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.0.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 0 (PID: 373503) UID: 0
2022-03-19T19:56:54.039 ERROR:teuthology.run_tasks:Saw exception from tasks.
Updated by Laura Flores about 2 years ago
- Related to Bug #50042: rados/test.sh: api_watch_notify failures added
Updated by Radoslaw Zarzynski about 2 years ago
I think we can't be sure the timeout is because of api_tier_pp
. To be conclusive we need to check who had the PID 48582
.
Updated by Laura Flores almost 2 years ago
/a/yuriw-2022-06-10_03:10:47-rados-wip-yuri4-testing-2022-06-09-1510-quincy-distro-default-smithi/6871955
Coredump on smithi060, but no evidence of any kind of assertion/segfault failures in the teuthology log.
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6905499
Updated by Matan Breizman almost 2 years ago
/a/yuriw-2022-07-15_19:06:53-rados-wip-yuri-testing-2022-07-15-0950-octopus-distro-default-smithi/6932690
Updated by Nitzan Mordechai almost 2 years ago
We have coredump and the console_log showing:
smithi042.log:[ 852.382596] ceph_test_rados110223: segfault at 0 ip 00007f26e7643e8e sp 00007ffeef9a9460 error 6 in librados.so.2.0.0[7f26e75f7000+15a000]
checking that
Updated by Nitzan Mordechai almost 2 years ago
the core dump showing:
#0 librados::IoCtxImpl::put (this=0x0) at /usr/include/c++/8/bits/atomic_base.h:303
#1 _rados_ioctx_destroy (io=0x0) at /usr/src/debug/ceph-17.2.0-434.g718f6c9e.el8.x86_64/src/librados/librados_c.cc:1177
#2 0x000055f5ae16b101 in RadosTestEC::TearDown (this=0x7fcba8075c40) at /usr/src/debug/ceph-17.2.0-434.g718f6c9e.el8.x86_64/src/test/librados/TestCase.cc:188
There are 2 issues:
1. the Tests doesn't check that ioctx was initialized
2. the C api doesn't check that ioctx initilized before calling the ctx->put()
extern "C" void LIBRADOS_C_API_DEFAULT_F(rados_ioctx_destroy)(rados_ioctx_t io)
{
tracepoint(librados, rados_ioctx_destroy_enter, io);
librados::IoCtxImpl *ctx = (librados::IoCtxImpl *)io;
ctx->put();
tracepoint(librados, rados_ioctx_destroy_exit);
}
unlike the c++ api:
void librados::IoCtx::close()
{
if (io_ctx_impl)
io_ctx_impl->put();
io_ctx_impl = 0;
}
Updated by Nitzan Mordechai almost 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 47165
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943763/
Updated by Yuri Weinstein over 1 year ago
Updated by Laura Flores over 1 year ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to quincy,pacific
Updated by Backport Bot over 1 year ago
- Copied to Backport #57029: pacific: rados/test.sh: Early exit right after LibRados global tests complete added
Updated by Backport Bot over 1 year ago
- Copied to Backport #57030: quincy: rados/test.sh: Early exit right after LibRados global tests complete added
Updated by Konstantin Shalygin over 1 year ago
- Status changed from Pending Backport to Resolved
- Tags deleted (
backport_processed)