Bug #17743
closedceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
0%
Description
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: ceph version v11.0.2-791-g5354e7c (5354e7c26f1898c748240dae4c4fc63f6b3155d1) 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 1: (()+0x6928ca) [0x557ab8b688ca] 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 2: (()+0xf100) [0x7fdadec03100] 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 3: (gsignal()+0x37) [0x7fdadab645f7] 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 4: (abort()+0x148) [0x7fdadab65ce8] 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 5: (()+0x2e566) [0x7fdadab5d566] 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 6: (()+0x2e612) [0x7fdadab5d612] 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 7: (SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)+0x351) [0x557ab87ee151] 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 8: (Context::complete(int)+0x9) [0x557ab87c5b39] 2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 9: (Finisher::finisher_thread_entry()+0x216) [0x557ab89f8f76] 2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: 10: (()+0x7dc5) [0x7fdadebfbdc5] 2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: 11: (clone()+0x6d) [0x7fdadac25ced] 2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
http://pulpito.ceph.com/kchai-2016-10-29_17:34:40-rados-master---basic-mira/502147/
--gtest_filter=ObjectStore/StoreTest.Synthetic/1
Files
Updated by Kefu Chai over 7 years ago
(gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x00007ffff383640a in __GI_abort () at abort.c:89 #2 0x00007ffff382de47 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55555675da78 "bl_eq(state->contents[hoid].data, r2)", file=file@entry=0x55555675c490 "/var/ceph/ceph/src/test/objectstore/store_test.cc", line=line@entry=3330, function=function@entry=0x55555675f5e0 <SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)::__PRETTY_FUNCTION__> "virtual void SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)") at assert.c:92 #3 0x00007ffff382def2 in __GI___assert_fail (assertion=0x55555675da78 "bl_eq(state->contents[hoid].data, r2)", file=0x55555675c490 "/var/ceph/ceph/src/test/objectstore/store_test.cc", line=3330, function=0x55555675f5e0 <SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)::__PRETTY_FUNCTION__> "virtual void SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)") at assert.c:101 #4 0x0000555556036e19 in SyntheticWorkloadState::C_SyntheticOnReadable::finish (this=0x55555fce9170, r=290422) at /var/ceph/ceph/src/test/objectstore/store_test.cc:3330 #5 0x000055555602c64b in Context::complete (this=0x55555fce9170, r=0) at /var/ceph/ceph/src/include/Context.h:64 #6 0x000055555630e146 in Finisher::finisher_thread_entry (this=0x55555f023de0) at /var/ceph/ceph/src/common/Finisher.cc:68 #7 0x00005555560718d0 in Finisher::FinisherThread::entry (this=0x55555f023f38) at /var/ceph/ceph/src/common/Finisher.h:66 #8 0x00005555563d2dbb in Thread::entry_wrapper (this=0x55555f023f38) at /var/ceph/ceph/src/common/Thread.cc:89 #9 0x00005555563d2cf0 in Thread::_entry_func (arg=0x55555f023f38) at /var/ceph/ceph/src/common/Thread.cc:69 #10 0x00007ffff7bc4464 in start_thread (arg=0x7fffd3fff700) at pthread_create.c:333 #11 0x00007ffff38ea9df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Updated by Kefu Chai over 7 years ago
git bisect shows that 933a1da6d7517b8215c0cc720e47374adedf381e is the offending commit.
Updated by Kefu Chai over 7 years ago
yes, constantly. and reverting the 933a1da6d7517b8215c0cc720e47374adedf381e helps.
ceph_test_objectstore --gtest_filter=ObjectStore/StoreTest.Synthetic/1
Updated by Sage Weil over 7 years ago
I've tried a few different machines now but I can't reproduce this.
Can you generate a filestore = 20 log for me?
Updated by Kefu Chai over 7 years ago
Updated by Kefu Chai over 7 years ago
Sage, i will try to fix this if you don't have enough bandwidth today.
Updated by Kefu Chai over 7 years ago
i just tested on ext4 the problem disappears. and seems it is reproducible on btrfs.
Updated by Kefu Chai over 7 years ago
http://pulpito.ceph.com/kchai-2016-11-13_07:03:13-rados-wip-kefu-testing---basic-smithi/544085/
http://pulpito.ceph.com/samuelj-2016-11-16_21:33:31-rados-wip-sam-working---basic-smithi/554921/
both of them (smithi072, smithi037) are using ext4. weird...
Updated by Kefu Chai over 7 years ago
- File consoleText.txt.gz consoleText.txt.gz added
"ceph_test_objectstore --gtest_filter=\*/0" also, see
0> 2016-11-26 20:48:19.042533 7fd54a19a6c0 -1 *** Caught signal (Aborted) ** in thread 7fd54a19a6c0 thread_name:ceph_test_objec ceph version 11.0.2-1993-g9cb008d (9cb008dedce2094f54bfeffb698f3092d5af0233) 1: (()+0x682fa2) [0x7fd549caffa2] 2: (()+0x10330) [0x7fd5491fa330] 3: (gsignal()+0x37) [0x7fd5471bdc37] 4: (abort()+0x148) [0x7fd5471c1028] 5: (()+0x2fbf6) [0x7fd5471b6bf6] 6: (()+0x2fca2) [0x7fd5471b6ca2] 7: (()+0x32dd08) [0x7fd54995ad08] 8: (doSyntheticTest(boost::scoped_ptr<ObjectStore>&, int, unsigned long, unsigned long, unsigned long)+0x55e) [0x7fd54990c25e] 9: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x33) [0x7fd549ca81f3] 10: (testing::Test::Run()+0xb7) [0x7fd549c9bc67] 11: (testing::TestInfo::Run()+0x9e) [0x7fd549c9bd0e] 12: (testing::TestCase::Run()+0xa5) [0x7fd549c9be15] 13: (testing::internal::UnitTestImpl::RunAllTests()+0x248) [0x7fd549c9c0c8] 14: (testing::UnitTest::Run()+0x54) [0x7fd549c9c384] 15: (main()+0x35f) [0x7fd549864b0f] 16: (__libc_start_main()+0xf5) [0x7fd5471a8f45] 17: (()+0x2a97b6) [0x7fd5498d67b6] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Kefu Chai over 7 years ago
- Subject changed from ceph_test_objectstore --gtest_filter=-*/2'" crashes in qa run to ceph_test_objectstore" crashes in qa run
Updated by Kefu Chai over 7 years ago
i am able to reproduce the above failure using "ctest -R test_objectstore_memstore.sh -V --repeat-until-fail 400".
Total Test time (real) = 10373.15 sec The following tests FAILED: 10 - test_objectstore_memstore.sh (Failed) Errors while running CTest
Updated by Samuel Just over 7 years ago
- Has duplicate Bug #18027: test_objectstore_memstore.sh transient failure added
Updated by Loïc Dachary over 7 years ago
- Subject changed from ceph_test_objectstore" crashes in qa run to ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run
Updating the title so that it matches when looking for test_objectstore_memstore.sh failures
Updated by Loïc Dachary over 7 years ago
repeating the failure as instructed by Kefu and trying to get a core but cannot (3a9bcaa4aa6042587886c0eaae0ce4eeeb8f8164)
$ ctest -R test_objectstore_memstore.sh --repeat-until-fail 4000 tore_memstore.sh --repeat-until-fail 4000 Test project /slow/loic/ceph-ubuntu-14.04-loic/build Start 10: test_objectstore_memstore.sh 1/1 Test #10: test_objectstore_memstore.sh ..... Passed 90.86 sec 100% tests passed, 0 tests failed out of 1 Total Test time (real) = 90.92 sec
Updated by Loïc Dachary over 7 years ago
- File consoleText.gz consoleText.gz added
https://jenkins.ceph.com/job/ceph-pull-requests/15167/console
7/154 Test #10: test_objectstore_memstore.sh ............***Failed 43.45 sec ... /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/store_test.cc:3750: Failure Expected: r Which is: 28697 To be equal to: 0 ... [ FAILED ] ObjectStore/StoreTest.Synthetic/0, where GetParam() = "memstore" (34805 ms)
Updated by Loïc Dachary over 7 years ago
https://jenkins.ceph.com/job/ceph-pull-requests/15166/console
3/154 Test #10: test_objectstore_memstore.sh ............***Failed 19.38 sec ... 0> 2016-12-01 13:37:34.675656 7ff2a4f18700 -1 *** Caught signal (Aborted) ** in thread 7ff2a4f18700 thread_name:fn_anonymous ceph version 11.0.2-2148-g99305be (99305be55d0e60e5f6d06aebe4f6a8c1f5832e60) 1: (()+0x682922) [0x7ff2aa45d922] 2: (()+0x10330) [0x7ff2a99a8330] 3: (gsignal()+0x37) [0x7ff2a796bc37] 4: (abort()+0x148) [0x7ff2a796f028] 5: (()+0x2fbf6) [0x7ff2a7964bf6] 6: (()+0x2fca2) [0x7ff2a7964ca2] 7: (SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)+0x367) [0x7ff2aa100607] 8: (Context::complete(int)+0x9) [0x7ff2aa0ddcc9] 9: (Finisher::finisher_thread_entry()+0x1e6) [0x7ff2aa2fd456] 10: (()+0x8184) [0x7ff2a99a0184] 11: (clone()+0x6d) [0x7ff2a7a2f37d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Loïc Dachary over 7 years ago
- Status changed from New to Fix Under Review
- Assignee changed from Kefu Chai to Loïc Dachary
Updated by Loïc Dachary over 7 years ago
Updated by Loïc Dachary over 7 years ago
- File consoleText.3.gz consoleText.3.gz added
Updated by Loïc Dachary over 7 years ago
- File consoleText.4.gz consoleText.4.gz added
Updated by Loïc Dachary over 7 years ago
- Status changed from Fix Under Review to Need More Info
Updated by Nathan Cutler about 7 years ago
This failure is plaguing kraken backports - see e.g.
Updated by Nathan Cutler almost 7 years ago
Happened on another kraken backport: https://github.com/ceph/ceph/pull/16108
Updated by Sage Weil almost 7 years ago
Updated by Sage Weil almost 7 years ago
- Subject changed from ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run to ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- Status changed from Need More Info to Won't Fix
see https://github.com/ceph/ceph/pull/16215 (disabled the memstore tests on kraken)