Bug #9241
closedosdc/Objecter.cc: 1320: FAILED assert(s->get_nref() == 1)
0%
Description
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 1> 2014-08-26 03:00:57.380019 7fae604f4840 1 - 10.214.134.4:0/1029853 mark_down 0x7fae400008c0 -- 0x7fae4000a1a0
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 0> 2014-08-26 03:00:57.381185 7fae604f4840 1 osdc/Objecter.cc: In function 'void Objecter::close_session(Objecter::OSDSession*)' thread 7fae604f4840 time 2014-08-26 03:00:57.380050>get_nref() == 1)
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr:osdc/Objecter.cc: 1320: FAILED assert(s
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr:
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: ceph version 0.84-703-g4ffd3f1 (4ffd3f17f8e9b8b389548bf8c581a109ed2cbb5d)
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fae5f56f3cb]
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 2: (Objecter::close_session(Objecter::OSDSession*)+0x784) [0x7fae5f4ee794]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 3: (Objecter::shutdown()+0x73) [0x7fae5f4ee903]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 4: (librados::RadosClient::shutdown()+0x1c2) [0x7fae5f4bb662]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 5: (librados::Rados::shutdown()+0x28) [0x7fae5f495148]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 6: (main()+0x13b8) [0x411db8]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 7: (__libc_start_main()+0xf5) [0x7fae5e2deec5]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 8: rados() [0x41cbc7]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
4ffd3f17f8e9b8b389548bf8c581a109ed2cbb5d is wip-sam-testing with no client changes based on 4dbc4c412b5b86b1694b47fff02661e332ef4b3c with next merged in.
Updated by Samuel Just over 9 years ago
/a/samuelj-2014-08-25_22:38:54-rados-wip-sam-testing-testing-basic-multi/450704
Updated by Sage Weil over 9 years ago
ubuntu@teuthology:/a/sage-2014-08-28_16:08:59-rados-master-testing-basic-multi/458975
2014-08-29T02:57:49.885 INFO:tasks.workunit.client.0.plana79.stdout:process_10_[31061]: starting. 2014-08-29T02:57:50.165 INFO:teuthology.orchestra.run.plana79.stderr:dumped all in format json 2014-08-29T02:57:50.217 INFO:teuthology.orchestra.run.plana79:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set rbd pgp_num 304' 2014-08-29T02:57:50.652 INFO:teuthology.orchestra.run.plana79.stderr:set pool 0 pgp_num to 304 2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: creating pool foo.30288 2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: created object 0... 2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: created object 25... 2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: created object 49... 2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: finishing. 2014-08-29T02:57:52.801 INFO:tasks.workunit.client.0.plana79.stderr:osdc/Objecter.cc: In function 'void Objecter::close_session(Objecter::OSDSession*)' thread 7fd31f7db780 time 2014-08-29 02:57:52.796839 2014-08-29T02:57:52.801 INFO:tasks.workunit.client.0.plana79.stderr:osdc/Objecter.cc: 1320: FAILED assert(s->get_nref() == 1) 2014-08-29T02:57:52.801 INFO:tasks.workunit.client.0.plana79.stderr: ceph version 0.84-823-gf807a7e (f807a7ee2cab3d3eb0877e4846580fdbf3a6b3d0) 2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7fd31e8dfe4f] 2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 2: (Objecter::close_session(Objecter::OSDSession*)+0x7a2) [0x7fd31e85d962] 2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 3: (Objecter::shutdown()+0x6b) [0x7fd31e85da4b] 2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 4: (librados::RadosClient::shutdown()+0x24a) [0x7fd31e82823a] 2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 5: (rados_shutdown()+0x9) [0x7fd31e811b89] 2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 6: (StRadosCreatePool::run()+0x3ba) [0x40740a] 2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 7: (systest_runnable_pthread_helper(void*)+0x3a) [0x407d7a] 2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 8: (SysTestRunnable::start()+0x8b) [0x407e3b] 2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 9: (SysTestRunnable::run_until_finished(std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&)+0x4b) [0x409a7b] 2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 10: (main()+0xa30) [0x4049e0] 2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 11: (__libc_start_main()+0xed) [0x7fd31dbc576d] 2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 12: ceph_test_rados_list_parallel() [0x405079] 2014-08-29T02:57:52.804 INFO:tasks.workunit.client.0.plana79.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2014-08-29T02:57:52.804 INFO:tasks.workunit.client.0.plana79.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion' 2014-08-29T02:57:52.866 INFO:tasks.workunit.client.0.plana79.stdout:waitpid(30289)
(ceph_test_rados_list_parallel)
Updated by Sage Weil over 9 years ago
ubuntu@teuthology:/a/teuthology-2014-09-08_02:32:01-rados-master-testing-basic-multi/472170
2014-09-08T05:51:39.854 INFO:tasks.workunit.client.0.plana30.stdout:waitpid(3699) 2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:waitpid(3700) 2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:******************************* 2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:waitpid(3804) 2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:waitpid(3805) 2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:test2: got error: run_until_finished: runnable process_5: got error: process_5 exited with a signal 2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stderr:osdc/Objecter.cc: In function 'void Objecter::close_session(Objecter::OSDSession*)' thread 7f2ff9754780 time 2014-09-08 05:51:39.562352 2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stderr:osdc/Objecter.cc: 1320: FAILED assert(s->get_nref() == 1) 2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: ceph version 0.84-1033-g1250c31 (1250c31e49885b0c43a6ea9a9437efa899a04b41) 2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7f2ff7173a2f] 2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 2: (Objecter::close_session(Objecter::OSDSession*)+0x7a2) [0x7f2ff70f15e2] 2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 3: (Objecter::shutdown()+0x6b) [0x7f2ff70f16cb] 2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 4: (librados::RadosClient::shutdown()+0x24a) [0x7f2ff70bc00a] 2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 5: (rados_shutdown()+0x9) [0x7f2ff70a5959] 2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 6: (StRadosWatch::run()+0x16b) [0x40630b] 2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 7: (systest_runnable_pthread_helper(void*)+0x3a) [0x406c6a] 2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 8: (SysTestRunnable::start()+0x8b) [0x406d2b] 2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 9: (SysTestRunnable::run_until_finished(std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&)+0x4b) [0x40896b] 2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 10: (main()+0x611) [0x404221] 2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 11: (__libc_start_main()+0xed) [0x7f2ff645876d] 2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 12: ceph_test_rados_watch_notify() [0x404be9] 2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Sage Weil over 9 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
- Source changed from other to Q/A
sage-bug-9241-a
Updated by Sage Weil over 9 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil over 9 years ago
this is just a bad assert. one thread hits the assert:
(gdb) bt #0 0x00007f2ff646d425 in __bsd_signal (sig=3805, handler=0xedd) at ../sysdeps/posix/signal.c:38 #1 0x00000000023fa790 in ?? () #2 0x00007f2ff73abf7e in ?? () from /usr/lib/librados.so.2 #3 0x00007f2ff73ac9d3 in ?? () from /usr/lib/librados.so.2 #4 0x00007f2ff6ac2846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x00007f2ff6ac2873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x00007f2ff6ac296e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f2ff7173c09 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=1320, func=0x7f2ff73adee0 "void Objecter::close_session(Objecter::OSDSession*)") at common/assert.cc:77 #8 0x00007f2ff70f15e2 in Objecter::close_session (this=0x23fbb70, s=0x2409010) at osdc/Objecter.cc:1320 #9 0x00007f2ff70f16cb in Objecter::shutdown (this=0x23fbb70) at osdc/Objecter.cc:300 #10 0x00007f2ff70bc00a in librados::RadosClient::shutdown (this=0x23dffd0) at librados/RadosClient.cc:302 #11 0x00007f2ff70a5959 in rados_shutdown (cluster=<optimized out>) at librados/librados.cc:1907 #12 0x000000000040630b in StRadosWatch::run (this=0x7fffc9b0c120) at test/system/st_rados_watch.cc:93 #13 0x0000000000406c6a in systest_runnable_pthread_helper (arg=<optimized out>) at test/system/systest_runnable.cc:203 #14 0x0000000000406d2b in SysTestRunnable::start (this=0x7fffc9b0c120) at test/system/systest_runnable.cc:102 #15 0x000000000040896b in SysTestRunnable::run_until_finished (runnables=...) at test/system/systest_runnable.cc:174 #16 0x0000000000404221 in main (argc=1, argv=0x7fffc9b0c7a8) at test/system/rados_watch_notify.cc:104
because of this thread holding the session ref during the completion for an old request:
(gdb) bt #0 __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:368 #1 0x00007f2ff6d18104 in _L_unlock_644 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f2ff6d18063 in __pthread_mutex_unlock_usercnt (mutex=<optimized out>, decr=<optimized out>) at pthread_mutex_unlock.c:52 #3 __pthread_mutex_unlock (mutex=0x7fffc9b0b940) at pthread_mutex_unlock.c:290 #4 0x00007f2ff714de5a in Mutex::Unlock (this=0x7fffc9b0b930) at common/Mutex.cc:106 #5 0x00007f2ff70b2409 in Context::complete (this=0x2401bb0, r=<optimized out>) at ./include/Context.h:64 #6 0x00007f2ff70f6001 in Objecter::handle_osd_op_reply (this=0x23fbb70, m=0x7f2fd8000d70) at osdc/Objecter.cc:2637 #7 0x00007f2ff710040b in Objecter::ms_dispatch (this=0x23fbb70, m=0x7f2fd8000d70) at osdc/Objecter.cc:574 #8 0x00007f2ff7281259 in ms_fast_dispatch (m=0x7f2fd8000d70, this=0x23fafe0) at msg/Messenger.h:503 #9 DispatchQueue::fast_dispatch (this=0x23fb198, m=0x7f2fd8000d70) at msg/DispatchQueue.cc:71 #10 0x00007f2ff72d1fdc in Pipe::DelayedDelivery::entry (this=0x7f2fd40022f0) at msg/Pipe.cc:224 #11 0x00007f2ff6d14e9a in start_thread (arg=0x7f2fea7fc700) at pthread_create.c:308 #12 0x00007f2ff652b3fd in __qecvt_r (value=0, ndigit=-911165120, decpt=0x0, sign=0x7fffc9b0b940, buf=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, len=140736577190208) at efgcvt_r.c:182 #13 0x0000000000000000 in ?? ()
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Resolved