Project

General

Profile

Bug #17743

ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)

Added by Kefu Chai 10 months ago. Updated about 1 month ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
10/30/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No
Component(RADOS):

Description

2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: ceph version v11.0.2-791-g5354e7c (5354e7c26f1898c748240dae4c4fc63f6b3155d1)
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 1: (()+0x6928ca) [0x557ab8b688ca]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 2: (()+0xf100) [0x7fdadec03100]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 3: (gsignal()+0x37) [0x7fdadab645f7]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 4: (abort()+0x148) [0x7fdadab65ce8]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 5: (()+0x2e566) [0x7fdadab5d566]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 6: (()+0x2e612) [0x7fdadab5d612]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 7: (SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)+0x351) [0x557ab87ee151]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 8: (Context::complete(int)+0x9) [0x557ab87c5b39]
2016-10-29T17:49:39.342 INFO:teuthology.orchestra.run.mira038.stderr: 9: (Finisher::finisher_thread_entry()+0x216) [0x557ab89f8f76]
2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: 10: (()+0x7dc5) [0x7fdadebfbdc5]
2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: 11: (clone()+0x6d) [0x7fdadac25ced]
2016-10-29T17:49:39.343 INFO:teuthology.orchestra.run.mira038.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

http://pulpito.ceph.com/kchai-2016-10-29_17:34:40-rados-master---basic-mira/502147/

--gtest_filter=ObjectStore/StoreTest.Synthetic/1

ceph_test_objectstore.log.xz (746 KB) Kefu Chai, 11/15/2016 09:04 AM

consoleText.txt.gz (529 KB) Kefu Chai, 11/27/2016 04:19 PM

consoleText.gz (599 KB) Loic Dachary, 12/02/2016 02:03 PM

consoleText.3.gz - ceph-pull-requests/15371/console (147 KB) Loic Dachary, 12/05/2016 09:40 PM

consoleText.4.gz (142 KB) Loic Dachary, 12/07/2016 09:26 AM


Related issues

Duplicated by Ceph - Bug #18027: test_objectstore_memstore.sh transient failure Duplicate 11/24/2016

History

#1 Updated by Kefu Chai 10 months ago

  • Description updated (diff)

#2 Updated by Kefu Chai 10 months ago

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x00007ffff383640a in __GI_abort () at abort.c:89
#2  0x00007ffff382de47 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55555675da78 "bl_eq(state->contents[hoid].data, r2)",
    file=file@entry=0x55555675c490 "/var/ceph/ceph/src/test/objectstore/store_test.cc", line=line@entry=3330,
    function=function@entry=0x55555675f5e0 <SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)::__PRETTY_FUNCTION__> "virtual void SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)") at assert.c:92
#3  0x00007ffff382def2 in __GI___assert_fail (assertion=0x55555675da78 "bl_eq(state->contents[hoid].data, r2)",
    file=0x55555675c490 "/var/ceph/ceph/src/test/objectstore/store_test.cc", line=3330,
    function=0x55555675f5e0 <SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)::__PRETTY_FUNCTION__> "virtual void SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)") at assert.c:101
#4  0x0000555556036e19 in SyntheticWorkloadState::C_SyntheticOnReadable::finish (this=0x55555fce9170, r=290422) at /var/ceph/ceph/src/test/objectstore/store_test.cc:3330
#5  0x000055555602c64b in Context::complete (this=0x55555fce9170, r=0) at /var/ceph/ceph/src/include/Context.h:64
#6  0x000055555630e146 in Finisher::finisher_thread_entry (this=0x55555f023de0) at /var/ceph/ceph/src/common/Finisher.cc:68
#7  0x00005555560718d0 in Finisher::FinisherThread::entry (this=0x55555f023f38) at /var/ceph/ceph/src/common/Finisher.h:66
#8  0x00005555563d2dbb in Thread::entry_wrapper (this=0x55555f023f38) at /var/ceph/ceph/src/common/Thread.cc:89
#9  0x00005555563d2cf0 in Thread::_entry_func (arg=0x55555f023f38) at /var/ceph/ceph/src/common/Thread.cc:69
#10 0x00007ffff7bc4464 in start_thread (arg=0x7fffd3fff700) at pthread_create.c:333
#11 0x00007ffff38ea9df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

#3 Updated by Kefu Chai 9 months ago

git bisect shows that 933a1da6d7517b8215c0cc720e47374adedf381e is the offending commit.

#4 Updated by Sage Weil 9 months ago

Are you able to reproduce this locally?

#5 Updated by Kefu Chai 9 months ago

yes, constantly. and reverting the 933a1da6d7517b8215c0cc720e47374adedf381e helps.

ceph_test_objectstore --gtest_filter=ObjectStore/StoreTest.Synthetic/1

#6 Updated by Sage Weil 9 months ago

I've tried a few different machines now but I can't reproduce this.

Can you generate a filestore = 20 log for me?

#7 Updated by Kefu Chai 9 months ago

  • File 17743.tgz added

#8 Updated by Kefu Chai 9 months ago

  • File deleted (17743.tgz)

#10 Updated by Kefu Chai 9 months ago

Sage, i will try to fix this if you don't have enough bandwidth today.

#11 Updated by Kefu Chai 9 months ago

i just tested on ext4 the problem disappears. and seems it is reproducible on btrfs.

#13 Updated by Samuel Just 9 months ago

  • Assignee set to Kefu Chai

#14 Updated by Kefu Chai 9 months ago

"ceph_test_objectstore --gtest_filter=\*/0" also, see

https://jenkins.ceph.com/job/ceph-pull-requests/14959/consoleFull#-824916114d63714d2-c8d8-41fc-a9d4-8dee30be4c32

     0> 2016-11-26 20:48:19.042533 7fd54a19a6c0 -1 *** Caught signal (Aborted) **
 in thread 7fd54a19a6c0 thread_name:ceph_test_objec

 ceph version 11.0.2-1993-g9cb008d (9cb008dedce2094f54bfeffb698f3092d5af0233)
 1: (()+0x682fa2) [0x7fd549caffa2]
 2: (()+0x10330) [0x7fd5491fa330]
 3: (gsignal()+0x37) [0x7fd5471bdc37]
 4: (abort()+0x148) [0x7fd5471c1028]
 5: (()+0x2fbf6) [0x7fd5471b6bf6]
 6: (()+0x2fca2) [0x7fd5471b6ca2]
 7: (()+0x32dd08) [0x7fd54995ad08]
 8: (doSyntheticTest(boost::scoped_ptr<ObjectStore>&, int, unsigned long, unsigned long, unsigned long)+0x55e) [0x7fd54990c25e]
 9: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x33) [0x7fd549ca81f3]
 10: (testing::Test::Run()+0xb7) [0x7fd549c9bc67]
 11: (testing::TestInfo::Run()+0x9e) [0x7fd549c9bd0e]
 12: (testing::TestCase::Run()+0xa5) [0x7fd549c9be15]
 13: (testing::internal::UnitTestImpl::RunAllTests()+0x248) [0x7fd549c9c0c8]
 14: (testing::UnitTest::Run()+0x54) [0x7fd549c9c384]
 15: (main()+0x35f) [0x7fd549864b0f]
 16: (__libc_start_main()+0xf5) [0x7fd5471a8f45]
 17: (()+0x2a97b6) [0x7fd5498d67b6]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#15 Updated by Kefu Chai 9 months ago

  • Subject changed from ceph_test_objectstore --gtest_filter=-*/2'" crashes in qa run to ceph_test_objectstore" crashes in qa run

#16 Updated by Kefu Chai 9 months ago

i am able to reproduce the above failure using "ctest -R test_objectstore_memstore.sh -V --repeat-until-fail 400".

Total Test time (real) = 10373.15 sec
The following tests FAILED:
         10 - test_objectstore_memstore.sh (Failed)
Errors while running CTest

#17 Updated by Samuel Just 9 months ago

  • Duplicated by Bug #18027: test_objectstore_memstore.sh transient failure added

#18 Updated by Loic Dachary 9 months ago

  • Subject changed from ceph_test_objectstore" crashes in qa run to ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run

Updating the title so that it matches when looking for test_objectstore_memstore.sh failures

#19 Updated by Loic Dachary 9 months ago

repeating the failure as instructed by Kefu and trying to get a core but cannot (3a9bcaa4aa6042587886c0eaae0ce4eeeb8f8164)

$ ctest -R test_objectstore_memstore.sh --repeat-until-fail 4000
tore_memstore.sh --repeat-until-fail 4000
Test project /slow/loic/ceph-ubuntu-14.04-loic/build
    Start 10: test_objectstore_memstore.sh
1/1 Test #10: test_objectstore_memstore.sh .....   Passed   90.86 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =  90.92 sec

#20 Updated by Loic Dachary 9 months ago

https://jenkins.ceph.com/job/ceph-pull-requests/15167/console

  7/154 Test  #10: test_objectstore_memstore.sh ............***Failed   43.45 sec
...
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/store_test.cc:3750: Failure
      Expected: r
      Which is: 28697
To be equal to: 0
...
[  FAILED  ] ObjectStore/StoreTest.Synthetic/0, where GetParam() = "memstore" (34805 ms)

#21 Updated by Loic Dachary 9 months ago

https://jenkins.ceph.com/job/ceph-pull-requests/15166/console

  3/154 Test  #10: test_objectstore_memstore.sh ............***Failed   19.38 sec

...
     0> 2016-12-01 13:37:34.675656 7ff2a4f18700 -1 *** Caught signal (Aborted) **
 in thread 7ff2a4f18700 thread_name:fn_anonymous

 ceph version 11.0.2-2148-g99305be (99305be55d0e60e5f6d06aebe4f6a8c1f5832e60)
 1: (()+0x682922) [0x7ff2aa45d922]
 2: (()+0x10330) [0x7ff2a99a8330]
 3: (gsignal()+0x37) [0x7ff2a796bc37]
 4: (abort()+0x148) [0x7ff2a796f028]
 5: (()+0x2fbf6) [0x7ff2a7964bf6]
 6: (()+0x2fca2) [0x7ff2a7964ca2]
 7: (SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)+0x367) [0x7ff2aa100607]
 8: (Context::complete(int)+0x9) [0x7ff2aa0ddcc9]
 9: (Finisher::finisher_thread_entry()+0x1e6) [0x7ff2aa2fd456]
 10: (()+0x8184) [0x7ff2a99a0184]
 11: (clone()+0x6d) [0x7ff2a7a2f37d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#22 Updated by Loic Dachary 9 months ago

  • Status changed from New to Need Review
  • Assignee changed from Kefu Chai to Loic Dachary

#24 Updated by Loic Dachary 8 months ago

#26 Updated by Loic Dachary 8 months ago

  • Status changed from Need Review to Need More Info

#28 Updated by Nathan Cutler 4 months ago

This failure is plaguing kraken backports - see e.g.

#29 Updated by Greg Farnum 2 months ago

  • Project changed from Ceph to RADOS

#30 Updated by Nathan Cutler about 1 month ago

Happened on another kraken backport: https://github.com/ceph/ceph/pull/16108

#32 Updated by Sage Weil about 1 month ago

  • Subject changed from ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run to ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
  • Status changed from Need More Info to Won't Fix

see https://github.com/ceph/ceph/pull/16215 (disabled the memstore tests on kraken)

Also available in: Atom PDF