Bug #17743
closed
ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
Added by Kefu Chai over 7 years ago.
Updated almost 7 years ago.
Description
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: ceph version v11.0.2-791-g5354e7c (5354e7c26f1898c748240dae4c4fc63f6b3155d1)
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 1: (()+0x6928ca) [0x557ab8b688ca]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 2: (()+0xf100) [0x7fdadec03100]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 3: (gsignal()+0x37) [0x7fdadab645f7]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 4: (abort()+0x148) [0x7fdadab65ce8]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 5: (()+0x2e566) [0x7fdadab5d566]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 6: (()+0x2e612) [0x7fdadab5d612]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 7: (SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)+0x351) [0x557ab87ee151]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 8: (Context::complete(int)+0x9) [0x557ab87c5b39]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 9: (Finisher::finisher_thread_entry()+0x216) [0x557ab89f8f76]
2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: 10: (()+0x7dc5) [0x7fdadebfbdc5]
2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: 11: (clone()+0x6d) [0x7fdadac25ced]
2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
http://pulpito.ceph.com/kchai-2016-10-29_17:34:40-rados-master---basic-mira/502147/
--gtest_filter=ObjectStore/StoreTest.Synthetic/1
Files
- Description updated (diff)
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1 0x00007ffff383640a in __GI_abort () at abort.c:89
#2 0x00007ffff382de47 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55555675da78 "bl_eq(state->contents[hoid].data, r2)",
file=file@entry=0x55555675c490 "/var/ceph/ceph/src/test/objectstore/store_test.cc", line=line@entry=3330,
function=function@entry=0x55555675f5e0 <SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)::__PRETTY_FUNCTION__> "virtual void SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)") at assert.c:92
#3 0x00007ffff382def2 in __GI___assert_fail (assertion=0x55555675da78 "bl_eq(state->contents[hoid].data, r2)",
file=0x55555675c490 "/var/ceph/ceph/src/test/objectstore/store_test.cc", line=3330,
function=0x55555675f5e0 <SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)::__PRETTY_FUNCTION__> "virtual void SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)") at assert.c:101
#4 0x0000555556036e19 in SyntheticWorkloadState::C_SyntheticOnReadable::finish (this=0x55555fce9170, r=290422) at /var/ceph/ceph/src/test/objectstore/store_test.cc:3330
#5 0x000055555602c64b in Context::complete (this=0x55555fce9170, r=0) at /var/ceph/ceph/src/include/Context.h:64
#6 0x000055555630e146 in Finisher::finisher_thread_entry (this=0x55555f023de0) at /var/ceph/ceph/src/common/Finisher.cc:68
#7 0x00005555560718d0 in Finisher::FinisherThread::entry (this=0x55555f023f38) at /var/ceph/ceph/src/common/Finisher.h:66
#8 0x00005555563d2dbb in Thread::entry_wrapper (this=0x55555f023f38) at /var/ceph/ceph/src/common/Thread.cc:89
#9 0x00005555563d2cf0 in Thread::_entry_func (arg=0x55555f023f38) at /var/ceph/ceph/src/common/Thread.cc:69
#10 0x00007ffff7bc4464 in start_thread (arg=0x7fffd3fff700) at pthread_create.c:333
#11 0x00007ffff38ea9df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
git bisect shows that 933a1da6d7517b8215c0cc720e47374adedf381e is the offending commit.
Are you able to reproduce this locally?
yes, constantly. and reverting the 933a1da6d7517b8215c0cc720e47374adedf381e helps.
ceph_test_objectstore --gtest_filter=ObjectStore/StoreTest.Synthetic/1
I've tried a few different machines now but I can't reproduce this.
Can you generate a filestore = 20 log for me?
Sage, i will try to fix this if you don't have enough bandwidth today.
i just tested on ext4 the problem disappears. and seems it is reproducible on btrfs.
- Assignee set to Kefu Chai
"ceph_test_objectstore --gtest_filter=\*/0" also, see
https://jenkins.ceph.com/job/ceph-pull-requests/14959/consoleFull#-824916114d63714d2-c8d8-41fc-a9d4-8dee30be4c32
0> 2016-11-26 20:48:19.042533 7fd54a19a6c0 -1 *** Caught signal (Aborted) **
in thread 7fd54a19a6c0 thread_name:ceph_test_objec
ceph version 11.0.2-1993-g9cb008d (9cb008dedce2094f54bfeffb698f3092d5af0233)
1: (()+0x682fa2) [0x7fd549caffa2]
2: (()+0x10330) [0x7fd5491fa330]
3: (gsignal()+0x37) [0x7fd5471bdc37]
4: (abort()+0x148) [0x7fd5471c1028]
5: (()+0x2fbf6) [0x7fd5471b6bf6]
6: (()+0x2fca2) [0x7fd5471b6ca2]
7: (()+0x32dd08) [0x7fd54995ad08]
8: (doSyntheticTest(boost::scoped_ptr<ObjectStore>&, int, unsigned long, unsigned long, unsigned long)+0x55e) [0x7fd54990c25e]
9: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x33) [0x7fd549ca81f3]
10: (testing::Test::Run()+0xb7) [0x7fd549c9bc67]
11: (testing::TestInfo::Run()+0x9e) [0x7fd549c9bd0e]
12: (testing::TestCase::Run()+0xa5) [0x7fd549c9be15]
13: (testing::internal::UnitTestImpl::RunAllTests()+0x248) [0x7fd549c9c0c8]
14: (testing::UnitTest::Run()+0x54) [0x7fd549c9c384]
15: (main()+0x35f) [0x7fd549864b0f]
16: (__libc_start_main()+0xf5) [0x7fd5471a8f45]
17: (()+0x2a97b6) [0x7fd5498d67b6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
- Subject changed from ceph_test_objectstore --gtest_filter=-*/2'" crashes in qa run to ceph_test_objectstore" crashes in qa run
i am able to reproduce the above failure using "ctest -R test_objectstore_memstore.sh -V --repeat-until-fail 400".
Total Test time (real) = 10373.15 sec
The following tests FAILED:
10 - test_objectstore_memstore.sh (Failed)
Errors while running CTest
- Has duplicate Bug #18027: test_objectstore_memstore.sh transient failure added
- Subject changed from ceph_test_objectstore" crashes in qa run to ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run
Updating the title so that it matches when looking for test_objectstore_memstore.sh failures
repeating the failure as instructed by Kefu and trying to get a core but cannot (3a9bcaa4aa6042587886c0eaae0ce4eeeb8f8164)
$ ctest -R test_objectstore_memstore.sh --repeat-until-fail 4000
tore_memstore.sh --repeat-until-fail 4000
Test project /slow/loic/ceph-ubuntu-14.04-loic/build
Start 10: test_objectstore_memstore.sh
1/1 Test #10: test_objectstore_memstore.sh ..... Passed 90.86 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 90.92 sec
https://jenkins.ceph.com/job/ceph-pull-requests/15167/console
7/154 Test #10: test_objectstore_memstore.sh ............***Failed 43.45 sec
...
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/store_test.cc:3750: Failure
Expected: r
Which is: 28697
To be equal to: 0
...
[ FAILED ] ObjectStore/StoreTest.Synthetic/0, where GetParam() = "memstore" (34805 ms)
https://jenkins.ceph.com/job/ceph-pull-requests/15166/console
3/154 Test #10: test_objectstore_memstore.sh ............***Failed 19.38 sec
...
0> 2016-12-01 13:37:34.675656 7ff2a4f18700 -1 *** Caught signal (Aborted) **
in thread 7ff2a4f18700 thread_name:fn_anonymous
ceph version 11.0.2-2148-g99305be (99305be55d0e60e5f6d06aebe4f6a8c1f5832e60)
1: (()+0x682922) [0x7ff2aa45d922]
2: (()+0x10330) [0x7ff2a99a8330]
3: (gsignal()+0x37) [0x7ff2a796bc37]
4: (abort()+0x148) [0x7ff2a796f028]
5: (()+0x2fbf6) [0x7ff2a7964bf6]
6: (()+0x2fca2) [0x7ff2a7964ca2]
7: (SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)+0x367) [0x7ff2aa100607]
8: (Context::complete(int)+0x9) [0x7ff2aa0ddcc9]
9: (Finisher::finisher_thread_entry()+0x1e6) [0x7ff2aa2fd456]
10: (()+0x8184) [0x7ff2a99a0184]
11: (clone()+0x6d) [0x7ff2a7a2f37d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
- Status changed from New to Fix Under Review
- Assignee changed from Kefu Chai to Loïc Dachary
- Status changed from Fix Under Review to Need More Info
This failure is plaguing kraken backports - see e.g.
- Project changed from Ceph to RADOS
- Subject changed from ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run to ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- Status changed from Need More Info to Won't Fix
Also available in: Atom
PDF