Project

General

Profile

Actions

Bug #9241

closed

osdc/Objecter.cc: 1320: FAILED assert(s->get_nref() == 1)

Added by Samuel Just over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 1> 2014-08-26 03:00:57.380019 7fae604f4840 1 - 10.214.134.4:0/1029853 mark_down 0x7fae400008c0 -- 0x7fae4000a1a0
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 0> 2014-08-26 03:00:57.381185 7fae604f4840 1 osdc/Objecter.cc: In function 'void Objecter::close_session(Objecter::OSDSession*)' thread 7fae604f4840 time 2014-08-26 03:00:57.380050
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr:osdc/Objecter.cc: 1320: FAILED assert(s
>get_nref() == 1)
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr:
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: ceph version 0.84-703-g4ffd3f1 (4ffd3f17f8e9b8b389548bf8c581a109ed2cbb5d)
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fae5f56f3cb]
2014-08-26T03:00:57.472 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 2: (Objecter::close_session(Objecter::OSDSession*)+0x784) [0x7fae5f4ee794]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 3: (Objecter::shutdown()+0x73) [0x7fae5f4ee903]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 4: (librados::RadosClient::shutdown()+0x1c2) [0x7fae5f4bb662]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 5: (librados::Rados::shutdown()+0x28) [0x7fae5f495148]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 6: (main()+0x13b8) [0x411db8]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 7: (__libc_start_main()+0xf5) [0x7fae5e2deec5]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: 8: rados() [0x41cbc7]
2014-08-26T03:00:57.473 INFO:tasks.radosbench.radosbench.0.burnupi24.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

4ffd3f17f8e9b8b389548bf8c581a109ed2cbb5d is wip-sam-testing with no client changes based on 4dbc4c412b5b86b1694b47fff02661e332ef4b3c with next merged in.

Actions #1

Updated by Samuel Just over 9 years ago

/a/samuelj-2014-08-25_22:38:54-rados-wip-sam-testing-testing-basic-multi/450704

Actions #2

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/a/sage-2014-08-28_16:08:59-rados-master-testing-basic-multi/458975

2014-08-29T02:57:49.885 INFO:tasks.workunit.client.0.plana79.stdout:process_10_[31061]: starting.
2014-08-29T02:57:50.165 INFO:teuthology.orchestra.run.plana79.stderr:dumped all in format json
2014-08-29T02:57:50.217 INFO:teuthology.orchestra.run.plana79:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set rbd pgp_num 304'
2014-08-29T02:57:50.652 INFO:teuthology.orchestra.run.plana79.stderr:set pool 0 pgp_num to 304
2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: creating pool foo.30288
2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: created object 0...
2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: created object 25...
2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: created object 49...
2014-08-29T02:57:52.800 INFO:tasks.workunit.client.0.plana79.stdout:process_9_[31060]: finishing.
2014-08-29T02:57:52.801 INFO:tasks.workunit.client.0.plana79.stderr:osdc/Objecter.cc: In function 'void Objecter::close_session(Objecter::OSDSession*)' thread 7fd31f7db780 time 2014-08-29 02:57:52.796839
2014-08-29T02:57:52.801 INFO:tasks.workunit.client.0.plana79.stderr:osdc/Objecter.cc: 1320: FAILED assert(s->get_nref() == 1)
2014-08-29T02:57:52.801 INFO:tasks.workunit.client.0.plana79.stderr: ceph version 0.84-823-gf807a7e (f807a7ee2cab3d3eb0877e4846580fdbf3a6b3d0)
2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7fd31e8dfe4f]
2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 2: (Objecter::close_session(Objecter::OSDSession*)+0x7a2) [0x7fd31e85d962]
2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 3: (Objecter::shutdown()+0x6b) [0x7fd31e85da4b]
2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 4: (librados::RadosClient::shutdown()+0x24a) [0x7fd31e82823a]
2014-08-29T02:57:52.802 INFO:tasks.workunit.client.0.plana79.stderr: 5: (rados_shutdown()+0x9) [0x7fd31e811b89]
2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 6: (StRadosCreatePool::run()+0x3ba) [0x40740a]
2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 7: (systest_runnable_pthread_helper(void*)+0x3a) [0x407d7a]
2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 8: (SysTestRunnable::start()+0x8b) [0x407e3b]
2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 9: (SysTestRunnable::run_until_finished(std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&)+0x4b) [0x409a7b]
2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 10: (main()+0xa30) [0x4049e0]
2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 11: (__libc_start_main()+0xed) [0x7fd31dbc576d]
2014-08-29T02:57:52.803 INFO:tasks.workunit.client.0.plana79.stderr: 12: ceph_test_rados_list_parallel() [0x405079]
2014-08-29T02:57:52.804 INFO:tasks.workunit.client.0.plana79.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-08-29T02:57:52.804 INFO:tasks.workunit.client.0.plana79.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
2014-08-29T02:57:52.866 INFO:tasks.workunit.client.0.plana79.stdout:waitpid(30289)


(ceph_test_rados_list_parallel)
Actions #3

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/a/teuthology-2014-09-08_02:32:01-rados-master-testing-basic-multi/472170

2014-09-08T05:51:39.854 INFO:tasks.workunit.client.0.plana30.stdout:waitpid(3699)
2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:waitpid(3700)
2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:*******************************
2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:waitpid(3804)
2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:waitpid(3805)
2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stdout:test2: got error: run_until_finished: runnable process_5: got error: process_5 exited with a signal
2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stderr:osdc/Objecter.cc: In function 'void Objecter::close_session(Objecter::OSDSession*)' thread 7f2ff9754780 time 2014-09-08 05:51:39.562352
2014-09-08T05:51:39.855 INFO:tasks.workunit.client.0.plana30.stderr:osdc/Objecter.cc: 1320: FAILED assert(s->get_nref() == 1)
2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: ceph version 0.84-1033-g1250c31 (1250c31e49885b0c43a6ea9a9437efa899a04b41)
2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7f2ff7173a2f]
2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 2: (Objecter::close_session(Objecter::OSDSession*)+0x7a2) [0x7f2ff70f15e2]
2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 3: (Objecter::shutdown()+0x6b) [0x7f2ff70f16cb]
2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 4: (librados::RadosClient::shutdown()+0x24a) [0x7f2ff70bc00a]
2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 5: (rados_shutdown()+0x9) [0x7f2ff70a5959]
2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 6: (StRadosWatch::run()+0x16b) [0x40630b]
2014-09-08T05:51:39.856 INFO:tasks.workunit.client.0.plana30.stderr: 7: (systest_runnable_pthread_helper(void*)+0x3a) [0x406c6a]
2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 8: (SysTestRunnable::start()+0x8b) [0x406d2b]
2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 9: (SysTestRunnable::run_until_finished(std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&)+0x4b) [0x40896b]
2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 10: (main()+0x611) [0x404221]
2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 11: (__libc_start_main()+0xed) [0x7f2ff645876d]
2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: 12: ceph_test_rados_watch_notify() [0x404be9]
2014-09-08T05:51:39.857 INFO:tasks.workunit.client.0.plana30.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #4

Updated by Sage Weil over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
  • Source changed from other to Q/A

sage-bug-9241-a

Actions #5

Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by Sage Weil over 9 years ago

this is just a bad assert. one thread hits the assert:

(gdb) bt
#0  0x00007f2ff646d425 in __bsd_signal (sig=3805, handler=0xedd) at ../sysdeps/posix/signal.c:38
#1  0x00000000023fa790 in ?? ()
#2  0x00007f2ff73abf7e in ?? () from /usr/lib/librados.so.2
#3  0x00007f2ff73ac9d3 in ?? () from /usr/lib/librados.so.2
#4  0x00007f2ff6ac2846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f2ff6ac2873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f2ff6ac296e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f2ff7173c09 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=1320, func=0x7f2ff73adee0 "void Objecter::close_session(Objecter::OSDSession*)") at common/assert.cc:77
#8  0x00007f2ff70f15e2 in Objecter::close_session (this=0x23fbb70, s=0x2409010) at osdc/Objecter.cc:1320
#9  0x00007f2ff70f16cb in Objecter::shutdown (this=0x23fbb70) at osdc/Objecter.cc:300
#10 0x00007f2ff70bc00a in librados::RadosClient::shutdown (this=0x23dffd0) at librados/RadosClient.cc:302
#11 0x00007f2ff70a5959 in rados_shutdown (cluster=<optimized out>) at librados/librados.cc:1907
#12 0x000000000040630b in StRadosWatch::run (this=0x7fffc9b0c120) at test/system/st_rados_watch.cc:93
#13 0x0000000000406c6a in systest_runnable_pthread_helper (arg=<optimized out>) at test/system/systest_runnable.cc:203
#14 0x0000000000406d2b in SysTestRunnable::start (this=0x7fffc9b0c120) at test/system/systest_runnable.cc:102
#15 0x000000000040896b in SysTestRunnable::run_until_finished (runnables=...) at test/system/systest_runnable.cc:174
#16 0x0000000000404221 in main (argc=1, argv=0x7fffc9b0c7a8) at test/system/rados_watch_notify.cc:104

because of this thread holding the session ref during the completion for an old request:
(gdb) bt
#0  __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:368
#1  0x00007f2ff6d18104 in _L_unlock_644 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2ff6d18063 in __pthread_mutex_unlock_usercnt (mutex=<optimized out>, decr=<optimized out>) at pthread_mutex_unlock.c:52
#3  __pthread_mutex_unlock (mutex=0x7fffc9b0b940) at pthread_mutex_unlock.c:290
#4  0x00007f2ff714de5a in Mutex::Unlock (this=0x7fffc9b0b930) at common/Mutex.cc:106
#5  0x00007f2ff70b2409 in Context::complete (this=0x2401bb0, r=<optimized out>) at ./include/Context.h:64
#6  0x00007f2ff70f6001 in Objecter::handle_osd_op_reply (this=0x23fbb70, m=0x7f2fd8000d70) at osdc/Objecter.cc:2637
#7  0x00007f2ff710040b in Objecter::ms_dispatch (this=0x23fbb70, m=0x7f2fd8000d70) at osdc/Objecter.cc:574
#8  0x00007f2ff7281259 in ms_fast_dispatch (m=0x7f2fd8000d70, this=0x23fafe0) at msg/Messenger.h:503
#9  DispatchQueue::fast_dispatch (this=0x23fb198, m=0x7f2fd8000d70) at msg/DispatchQueue.cc:71
#10 0x00007f2ff72d1fdc in Pipe::DelayedDelivery::entry (this=0x7f2fd40022f0) at msg/Pipe.cc:224
#11 0x00007f2ff6d14e9a in start_thread (arg=0x7f2fea7fc700) at pthread_create.c:308
#12 0x00007f2ff652b3fd in __qecvt_r (value=0, ndigit=-911165120, decpt=0x0, sign=0x7fffc9b0b940, buf=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, len=140736577190208) at efgcvt_r.c:182
#13 0x0000000000000000 in ?? ()

Actions #7

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF