Bug #5426
librbd: mutex assert in perfcounters::tinc in librbd::AioCompletion::complete()
100%
Description
2013-06-22T12:24:53.336 INFO:teuthology.orchestra.run.err:common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f32aaffd700 time 2013-06-22 12:24:55.286391 2013-06-22T12:24:53.336 INFO:teuthology.orchestra.run.err:common/Mutex.cc: 93: FAILED assert(r == 0) 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: ceph version 0.61.4-5-gd572cf6 (d572cf6f77418f217a5a8e37f1124dc566e24d0b) 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 1: (Mutex::Lock(bool)+0x1c3) [0x7f32b5362593] 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7f32b535fdbe] 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 3: (librbd::AioCompletion::complete()+0x210) [0x7f32b5f72ac0] 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7f32b5f71ed7] 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 5: (librbd::C_AioRead::finish(int)+0x1d6) [0x7f32b5f72336] 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0xa) [0x7f32b5f7281a] 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f32b5f86517] 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 8: (librados::C_AioComplete::finish(int)+0x1d) [0x7f32b530b12d] 2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 9: (Finisher::finisher_thread_entry()+0x1c0) [0x7f32b5378bb0] 2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: 10: (()+0x7e9a) [0x7f32b4902e9a] 2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: 11: (clone()+0x6d) [0x7f32b4c0bccd] 2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
run was
ubuntu@teuthology:/a/teuthology-2013-06-21_20:00:30-rbd-cuttlefish-testing-basic/42512$ cat orig.config.yaml kernel: kdb: true sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32 machine_type: plana nuke-on-error: true overrides: admin_socket: branch: cuttlefish ceph: conf: client: rbd cache: false global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 20 debug paxos: 20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b install: ceph: sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b s3tests: branch: cuttlefish workunit: sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 2000
Subtasks
Associated revisions
librbd: call user completion after incrementing perfcounters
The perfcounters (and the ictx) are only valid while the image is
still open. If the librbd user gets the callback for its last I/O,
then closes the image, the ictx and its perfcounters will be
invalid. If the AioCompletion object is has not run the rest of its
complete() method yet, it will access these now-invalid addresses,
possibly leading to a crash.
The AioCompletion object is independent of the ictx and does not
access it again after incrementing perfcounters, so avoid this race by
calling the user's callback after this step. The AioCompletion object
will be cleaned up by the rest of complete_request(), independent of
the ImageCtx.
Fixes: #5426
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Merge pull request #1000 from ceph/wip-rbd-tinc-5426
fix #5426 race in librbd
Reviewed-by: Sage Weil <sage@inktank.com>
librbd: call user completion after incrementing perfcounters
The perfcounters (and the ictx) are only valid while the image is
still open. If the librbd user gets the callback for its last I/O,
then closes the image, the ictx and its perfcounters will be
invalid. If the AioCompletion object is has not run the rest of its
complete() method yet, it will access these now-invalid addresses,
possibly leading to a crash.
The AioCompletion object is independent of the ictx and does not
access it again after incrementing perfcounters, so avoid this race by
calling the user's callback after this step. The AioCompletion object
will be cleaned up by the rest of complete_request(), independent of
the ImageCtx.
Fixes: #5426
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 4cea7895da7331b84d8c6079851fdc0ff2f4afb1)
librbd: call user completion after incrementing perfcounters
The perfcounters (and the ictx) are only valid while the image is
still open. If the librbd user gets the callback for its last I/O,
then closes the image, the ictx and its perfcounters will be
invalid. If the AioCompletion object is has not run the rest of its
complete() method yet, it will access these now-invalid addresses,
possibly leading to a crash.
The AioCompletion object is independent of the ictx and does not
access it again after incrementing perfcounters, so avoid this race by
calling the user's callback after this step. The AioCompletion object
will be cleaned up by the rest of complete_request(), independent of
the ImageCtx.
Fixes: #5426
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 4cea7895da7331b84d8c6079851fdc0ff2f4afb1)
History
#1 Updated by Ian Colle over 10 years ago
- Assignee set to Josh Durgin
#2 Updated by Sage Weil about 10 years ago
- Status changed from New to 12
2013-07-09T20:46:21.561 INFO:teuthology.orchestra.run.err:common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fb3b1ffb700 time 2013-07-09 20:46:41.372806 2013-07-09T20:46:21.561 INFO:teuthology.orchestra.run.err:common/Mutex.cc: 93: FAILED assert(r == 0) 2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: ceph version 0.61.4-48-g24f90b8 (24f90b832c695ef13021db66a178c18369ac356d) 2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 1: (Mutex::Lock(bool)+0x1c3) [0x7fb3bc4410d3] 2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7fb3bc43e8fe] 2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 3: (librbd::AioCompletion::complete()+0x210) [0x7fb3bd053ac0] 2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7fb3bd052ed7] 2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 5: (librbd::C_AioRead::finish(int)+0x1d6) [0x7fb3bd053336] 2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0xa) [0x7fb3bd05381a] 2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7fb3bd067517] 2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 8: (librados::C_AioComplete::finish(int)+0x1d) [0x7fb3bc3e9c6d] 2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 9: (Finisher::finisher_thread_entry()+0x1c0) [0x7fb3bc4576f0] 2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 10: (()+0x7e9a) [0x7fb3bb9e0e9a] 2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 11: (clone()+0x6d) [0x7fb3bbce9ccd] 2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2013-07-09T20:46:21.569 INFO:teuthology.orchestra.run.err:terminate called after throwing an instance of 'ceph::FailedAssertion'
job was
ubuntu@teuthology:/a/teuthology-2013-07-09_20:00:36-rbd-cuttlefish-testing-basic/61034$ cat orig.config.yaml kernel: kdb: true sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7 machine_type: plana nuke-on-error: true overrides: admin_socket: branch: cuttlefish ceph: conf: client: rbd cache: false global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 20 debug paxos: 20 fs: xfs log-whitelist: - slow request sha1: 24f90b832c695ef13021db66a178c18369ac356d install: ceph: sha1: 24f90b832c695ef13021db66a178c18369ac356d s3tests: branch: cuttlefish workunit: sha1: 24f90b832c695ef13021db66a178c18369ac356d roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 2000
#3 Updated by Sage Weil about 10 years ago
both of these are cuttlefish.. maybe something that hasn't been backported?
#4 Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/a/teuthology-2013-07-13_20:00:45-rbd-cuttlefish-testing-basic/66483$ cat orig.config.yaml kernel: kdb: true sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7 machine_type: plana nuke-on-error: true overrides: admin_socket: branch: cuttlefish ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 20 debug paxos: 20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: 658240710baaf9c661b8fbf856322907a0d394ee install: ceph: sha1: 658240710baaf9c661b8fbf856322907a0d394ee s3tests: branch: cuttlefish workunit: sha1: 658240710baaf9c661b8fbf856322907a0d394ee roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: conf: client: rbd cache: false - rbd_fsx: clients: - client.0 ops: 5000
cuttlefish again!
#5 Updated by Sage Weil about 10 years ago
okay, not cuttlefish.
ubuntu@teuthology:/a/teuthology-2013-07-17_01:00:47-rbd-next-testing-basic/70375
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: 0> 2013-07-17 10:14:41.818619 7f00ae7fc700 -1 *** Caught signal (Segmentation fault) ** 2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: in thread 7f00ae7fc700 2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: 2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: ceph version 0.66-669-g884fa2f (884fa2fcb6d707b23317bab1da909586ddc27608) 2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: 1: rbd() [0x41f2aa] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 2: (()+0xfcb0) [0x7f00b8569cb0] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 3: (PerfCounters::tinc(int, utime_t)+0x28) [0x7f00b895f6b8] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 4: (librbd::AioCompletion::complete()+0x210) [0x7f00b95e95d0] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 5: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7f00b95e8a37] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 6: (librbd::C_AioRead::finish(int)+0x15d) [0x7f00b95e8e1d] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 7: (Context::complete(int)+0xa) [0x7f00b95e932a] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 8: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f00b95fd007] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 9: (librados::C_AioComplete::finish(int)+0x1d) [0x7f00b88fef7d] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 10: (Finisher::finisher_thread_entry()+0x1c0) [0x7f00b89826b0] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 11: (()+0x7e9a) [0x7f00b8561e9a] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 12: (clone()+0x6d) [0x7f00b7b74ccd] 2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
#6 Updated by Josh Durgin about 10 years ago
- Status changed from 12 to In Progress
#7 Updated by Sage Weil about 10 years ago
- Priority changed from Urgent to High
#8 Updated by Josh Durgin about 10 years ago
Reproduced after 22 tries with no debugging on. Couldn't reproduce without debugging, but this is a start.
The PerfCounters object hasn't been destroyed yet, but the _conf and other fields that its m_cct references is invalid, so m_cct is probably already destroyed at this point.
p *(this->m_cct) $8 = {nref = {val = 23004272}, _conf = 0x671, _log = 0x7facf203ebf8, _module_type = 4060343288, _service_thread = 0x15e8960, _log_obs = 0x15e8960, _admin_socket = 0x30, _service_thread_lock = 769, _perf_counters_collection = 0x15b7570, _perf_counters_conf_obs = 0x16af240, _admin_hook = 0x10000fffffffe, _heartbeat_map = 0x722032393935363a, _crypto_none = 0x2e617461645f6462, _crypto_aes = 0x3166636463303031}
#9 Updated by Josh Durgin about 10 years ago
nevermind, the PerfCounters is invalid too, with m_lower_bound = 22777856, m_upper_bound = 0
#10 Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/a/teuthology-2013-08-22_01:01:01-rbd-next-testing-basic-plana/888
#11 Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/a/teuthology-2013-08-23_00:30:06-ceph-deploy-master---basic-saya/1087
slightly different, though:
... Thread 10 (Thread 0x7f9d1c19e700 (LWP 4064)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 ---Type <return> to continue, or q <return> to quit--- #1 0x00007f9d1df100ab in std::vector<ceph::log::Subsystem, std::allocator<ceph::log::Subsystem> >::_M_fill_insert(__gnu_cxx::__normal_iterator<ceph::log::Subsystem*, std::vector<ceph::log::Subsystem, std::allocator<ceph::log::Subsystem> > >, unsigned long, ceph::log::Subsystem const&) () from /usr/lib/librados.so.2 #2 0x0000000000000000 in ?? () Thread 9 (Thread 0x7f9d1b19c700 (LWP 4066)): #0 0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f9d1de9561b in std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> >::~pair() () from /usr/lib/librados.so.2 #2 0x0000000000000000 in ?? () Thread 8 (Thread 0x7f9d19999700 (LWP 4069)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f9d1dea9ebb in Finisher::wait_for_empty() () from /usr/lib/librados.so.2 #2 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f9d0acf3700 (LWP 8025)): #0 0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f9d1df8ed72 in Pipe::read_message(Message**) () from /usr/lib/librados.so.2 #2 0x0000000000000000 in ?? () Thread 6 (Thread 0x7f9d0adf4700 (LWP 8024)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2 #2 0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2 #3 0x0000000000000000 in ?? () Thread 5 (Thread 0x7f9d0abf2700 (LWP 8041)): #0 0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f9d1df8ed72 in Pipe::read_message(Message**) () from /usr/lib/librados.so.2 #2 0x0000000000000000 in ?? () Thread 4 (Thread 0x7f9d1ef9a780 (LWP 4063)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f9d1de173b3 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, Objecter::PoolOp*>, std::_Select1st<std::pair<unsigned long const, Objecter::PoolOp*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::PoolOp*> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, Objecter::PoolOp*> >*) () from /usr/lib/librados.so.2 #2 0x00007f9d1de2285b in librados::IoCtxImpl::aio_sparse_read(object_t, librados::AioCompletionImpl*, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*, ceph::buffer::list*, unsigned long, unsigned long, unsigned long) () from /usr/lib/librados.so.2 #3 0x00007fff1216c820 in ?? () #4 0x00007fff1216c820 in ?? () #5 0x0000000100000000 in ?? () #6 0x00007fff1216c820 in ?? () #7 0x0000000000000000 in ?? () Thread 3 (Thread 0x7f9d0b1f8700 (LWP 8080)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2 #2 0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2 #3 0x0000000000000000 in ?? () Thread 2 (Thread 0x7f9d0aef5700 (LWP 8039)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2 #2 0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2 #3 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f9d0bfff700 (LWP 4074)): #0 0x00007f9d1de87188 in PerfCountersCollection::remove(PerfCounters*) () from /usr/lib/librados.so.2 #1 0x0000003000000003 in ?? () #2 0x0000000000020000 in ?? () #3 0x00007f9d0bffe778 in ?? () #4 0x00007f9d0bffe778 in ?? () #5 0x00007f9c00000000 in ?? () #6 0x0000000000000000 in ?? () (gdb) q
#12 Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/a/teuthology-2013-09-05_01:00:58-rbd-next-testing-basic-plana/22153
#13 Updated by Tamilarasi muthamizhan about 10 years ago
ubuntu@teuthology:/a/teuthology-2013-09-09_01:36:24-upgrade-small-next-testing-basic-vps/27433
#15 Updated by Sage Weil almost 10 years ago
- Priority changed from High to Normal
#16 Updated by Tamilarasi muthamizhan almost 10 years ago
latest logs: ubuntu@teuthology:/a/teuthology-2013-11-08_05:35:02-upgrade-small-next-testing-basic-vps/90358
#17 Updated by Ian Colle almost 10 years ago
- Priority changed from Normal to Urgent
Upgraded to Urgent since causing failures in nightlies
#18 Updated by Tamilarasi muthamizhan almost 10 years ago
recent logs : ubuntu@teuthology:/a/teuthology-2013-11-21_23:01:11-rbd-next-testing-basic-plana/113257
2013-11-22T03:56:35.652 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3523 flatten 2013-11-22T03:56:38.000 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3525 trunc from 0x62e39fe to 0x1b9c7ec 2013-11-22T03:56:38.135 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3529 punch from 0x1b7c8d9 to 0x1b8ae76, (0xe59d bytes) 2013-11-22T03:56:38.178 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3530 trunc from 0x1b9c7ec to 0x74f792a 2013-11-22T03:56:38.210 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3531 clone 20 order 18 su 131072 sc 12 2013-11-22T03:56:39.281 INFO:teuthology.orchestra.run.out:[10.214.132.22]: checking clone #18, image image_client.0-clone18 against file /home/ubuntu/cephtest/archive/fsx-image_client.0-parent19 2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]: common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f291d4f1700 time 2013-11-22 03:56:41.432336 2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]: common/Mutex.cc: 93: FAILED assert(r == 0) 2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]: ceph version 0.72-214-g203065c (203065c076daf9aaa9ade3d7cdd1b10e6700bfe6) 2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 1: (Mutex::Lock(bool)+0x1c3) [0x7f29232fc0e3] 2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7f29232f983e] 2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 3: (librbd::AioCompletion::complete()+0x210) [0x7f2923fecb10] 2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x177) [0x7f2923febfd7] 2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 5: (librbd::C_AioRead::finish(int)+0x104) [0x7f2923fec2d4] 2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 6: (Context::complete(int)+0x9) [0x7f2923fec769] 2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f2923ff6f57] 2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 8: (librados::C_AioComplete::finish(int)+0x1d) [0x7f292329024d] 2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 9: (Context::complete(int)+0x9) [0x7f2923270e69] 2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 10: (Finisher::finisher_thread_entry()+0x1c0) [0x7f292331d7d0] 2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 11: (()+0x7e9a) [0x7f2922825e9a] 2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]: 12: (clone()+0x6d) [0x7f2922b2eccd] 2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]: terminate called after throwing an instance of 'ceph::FailedAssertion' 2013-11-22T03:56:52.443 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/teuthology-next/teuthology/run_tasks.py", line 33, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/teuthworker/teuthology-next/teuthology/task/rbd_fsx.py", line 40, in task p.spawn(_run_one_client, ctx, config, role) File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 83, in __exit__ for result in self: File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 100, in next resurrect_traceback(result) File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/teuthology-next/teuthology/task/rbd_fsx.py", line 62, in _run_one_client 'image_{image}'.format(image=role), File "/home/teuthworker/teuthology-next/teuthology/orchestra/remote.py", line 47, in run r = self._runner(client=self.ssh, **kwargs) File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 271, in run r.exitstatus = _check_status(r.exitstatus) File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 264, in _check_status raise CommandCrashedError(command=r.command) CommandCrashedError: Command crashed: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_librbd_fsx -d -W -R -p 100 -P /home/ubuntu/cephtest/archive -t 1 -l 250000000 -S 0 -N 5000 pool_client.0 image_client.0' 2013-11-22T03:56:52.650 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/inktank/teuthology/search?q=c31f734e23384b0396a3d735e27ad1f7 CommandCrashedError: Command crashed: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_librbd_fsx -d -W -R -p 100 -P /home/ubuntu/cephtest/archive -t 1 -l 250000000 -S 0 -N 5000 pool_client.0 image_client.0'
#19 Updated by Dmitry Borodaenko almost 10 years ago
This problem is easily reproducible with Mirantis OpenStack 4.0:
https://bugs.launchpad.net/fuel/+bug/1260911
After converting and uploading a RAW CirrOS image, all you have to do is launch a VM from that image, then run rbd -p compute export <instance-id>_disk - > /dev/null.
#20 Updated by Josh Durgin almost 10 years ago
- Status changed from In Progress to Fix Under Review
- Backport set to dumpling, emperor
#21 Updated by Josh Durgin almost 10 years ago
- Status changed from Fix Under Review to Pending Backport
I cherry-picked the fix to dumpling-5426 and emperor-5426 branches based on dumpling and emperor, respectively. The fix will go on the actual dumpling and emperor branches once its passed nightly tests for a while.
#22 Updated by Sage Weil over 9 years ago
- Status changed from Pending Backport to Resolved
#23 Updated by Dmitry Borodaenko over 9 years ago
Fix confirmed with the packages from http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/dumpling-5426/ on Mirantis OpenStack 4.0, SEGV no longer occurs.