Bug #5426: librbd: mutex assert in perfcounters::tinc in librbd::AioCompletion::complete() - rbd - Ceph

Actions

Copy link

Bug #5426

closed

librbd: mutex assert in perfcounters::tinc in librbd::AioCompletion::complete()

Added by Sage Weil almost 11 years ago. Updated almost 10 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Josh Durgin

Target version:

% Done:

100%

Source:

Q/A

Tags:

Backport:

dumpling, emperor

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2013-06-22T12:24:53.336 INFO:teuthology.orchestra.run.err:common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f32aaffd700 time 2013-06-22 12:24:55.286391
2013-06-22T12:24:53.336 INFO:teuthology.orchestra.run.err:common/Mutex.cc: 93: FAILED assert(r == 0)
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: ceph version 0.61.4-5-gd572cf6 (d572cf6f77418f217a5a8e37f1124dc566e24d0b)
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 1: (Mutex::Lock(bool)+0x1c3) [0x7f32b5362593]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7f32b535fdbe]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 3: (librbd::AioCompletion::complete()+0x210) [0x7f32b5f72ac0]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7f32b5f71ed7]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 5: (librbd::C_AioRead::finish(int)+0x1d6) [0x7f32b5f72336]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0xa) [0x7f32b5f7281a]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f32b5f86517]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 8: (librados::C_AioComplete::finish(int)+0x1d) [0x7f32b530b12d]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 9: (Finisher::finisher_thread_entry()+0x1c0) [0x7f32b5378bb0]
2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: 10: (()+0x7e9a) [0x7f32b4902e9a]
2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: 11: (clone()+0x6d) [0x7f32b4c0bccd]
2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

run was

ubuntu@teuthology:/a/teuthology-2013-06-21_20:00:30-rbd-cuttlefish-testing-basic/42512$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      client:
        rbd cache: false
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b
  install:
    ceph:
      sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000

Subtasks 1 (0 open — 1 closed)

Actions

Copy link

Updated by Ian Colle almost 11 years ago

Assignee set to Josh Durgin

Actions

Copy link

Updated by Sage Weil almost 11 years ago

Status changed from New to 12

2013-07-09T20:46:21.561 INFO:teuthology.orchestra.run.err:common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fb3b1ffb700 time 2013-07-09 20:46:41.372806
2013-07-09T20:46:21.561 INFO:teuthology.orchestra.run.err:common/Mutex.cc: 93: FAILED assert(r == 0)
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: ceph version 0.61.4-48-g24f90b8 (24f90b832c695ef13021db66a178c18369ac356d)
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 1: (Mutex::Lock(bool)+0x1c3) [0x7fb3bc4410d3]
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7fb3bc43e8fe]
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 3: (librbd::AioCompletion::complete()+0x210) [0x7fb3bd053ac0]
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7fb3bd052ed7]
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 5: (librbd::C_AioRead::finish(int)+0x1d6) [0x7fb3bd053336]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0xa) [0x7fb3bd05381a]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7fb3bd067517]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 8: (librados::C_AioComplete::finish(int)+0x1d) [0x7fb3bc3e9c6d]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 9: (Finisher::finisher_thread_entry()+0x1c0) [0x7fb3bc4576f0]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 10: (()+0x7e9a) [0x7fb3bb9e0e9a]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 11: (clone()+0x6d) [0x7fb3bbce9ccd]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-07-09T20:46:21.569 INFO:teuthology.orchestra.run.err:terminate called after throwing an instance of 'ceph::FailedAssertion'

job was

ubuntu@teuthology:/a/teuthology-2013-07-09_20:00:36-rbd-cuttlefish-testing-basic/61034$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      client:
        rbd cache: false
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: 24f90b832c695ef13021db66a178c18369ac356d
  install:
    ceph:
      sha1: 24f90b832c695ef13021db66a178c18369ac356d
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: 24f90b832c695ef13021db66a178c18369ac356d
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000

Actions

Copy link

Updated by Sage Weil almost 11 years ago

both of these are cuttlefish.. maybe something that hasn't been backported?

Actions

Copy link

Updated by Sage Weil almost 11 years ago

ubuntu@teuthology:/a/teuthology-2013-07-13_20:00:45-rbd-cuttlefish-testing-basic/66483$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 658240710baaf9c661b8fbf856322907a0d394ee
  install:
    ceph:
      sha1: 658240710baaf9c661b8fbf856322907a0d394ee
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: 658240710baaf9c661b8fbf856322907a0d394ee
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    conf:
      client:
        rbd cache: false
- rbd_fsx:
    clients:
    - client.0
    ops: 5000

cuttlefish again!

Actions

Copy link

Updated by Sage Weil over 10 years ago

okay, not cuttlefish.

ubuntu@teuthology:/a/teuthology-2013-07-17_01:00:47-rbd-next-testing-basic/70375

2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err:     0> 2013-07-17 10:14:41.818619 7f00ae7fc700 -1 *** Caught signal (Segmentation fault) **
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: in thread 7f00ae7fc700
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err:
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: ceph version 0.66-669-g884fa2f (884fa2fcb6d707b23317bab1da909586ddc27608)
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: 1: rbd() [0x41f2aa]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 2: (()+0xfcb0) [0x7f00b8569cb0]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 3: (PerfCounters::tinc(int, utime_t)+0x28) [0x7f00b895f6b8]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 4: (librbd::AioCompletion::complete()+0x210) [0x7f00b95e95d0]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 5: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7f00b95e8a37]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 6: (librbd::C_AioRead::finish(int)+0x15d) [0x7f00b95e8e1d]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 7: (Context::complete(int)+0xa) [0x7f00b95e932a]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 8: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f00b95fd007]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 9: (librados::C_AioComplete::finish(int)+0x1d) [0x7f00b88fef7d]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 10: (Finisher::finisher_thread_entry()+0x1c0) [0x7f00b89826b0]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 11: (()+0x7e9a) [0x7f00b8561e9a]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 12: (clone()+0x6d) [0x7f00b7b74ccd]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions

Copy link

Updated by Josh Durgin over 10 years ago

Status changed from 12 to In Progress

Actions

Copy link

Updated by Sage Weil over 10 years ago

Priority changed from Urgent to High

Actions

Copy link

Updated by Josh Durgin over 10 years ago

Reproduced after 22 tries with no debugging on. Couldn't reproduce without debugging, but this is a start.

The PerfCounters object hasn't been destroyed yet, but the _conf and other fields that its m_cct references is invalid, so m_cct is probably already destroyed at this point.

p *(this->m_cct)
$8 = {nref = {val = 23004272}, _conf = 0x671, _log = 0x7facf203ebf8, _module_type = 4060343288, _service_thread = 0x15e8960, _log_obs = 0x15e8960, _admin_socket = 0x30, 
  _service_thread_lock = 769, _perf_counters_collection = 0x15b7570, _perf_counters_conf_obs = 0x16af240, _admin_hook = 0x10000fffffffe, 
  _heartbeat_map = 0x722032393935363a, _crypto_none = 0x2e617461645f6462, _crypto_aes = 0x3166636463303031}

Actions

Copy link

Updated by Josh Durgin over 10 years ago

nevermind, the PerfCounters is invalid too, with m_lower_bound = 22777856, m_upper_bound = 0

Actions

Copy link

#10

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-08-22_01:01:01-rbd-next-testing-basic-plana/888

Actions

Copy link

#11

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-08-23_00:30:06-ceph-deploy-master---basic-saya/1087

slightly different, though:

...

Thread 10 (Thread 0x7f9d1c19e700 (LWP 4064)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
---Type <return> to continue, or q <return> to quit---
#1  0x00007f9d1df100ab in std::vector<ceph::log::Subsystem, std::allocator<ceph::log::Subsystem> >::_M_fill_insert(__gnu_cxx::__normal_iterator<ceph::log::Subsystem*, std::vector<ceph::log::Subsystem, std::allocator<ceph::log::Subsystem> > >, unsigned long, ceph::log::Subsystem const&) ()
   from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f9d1b19c700 (LWP 4066)):
#0  0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f9d1de9561b in std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> >::~pair() () from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f9d19999700 (LWP 4069)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007f9d1dea9ebb in Finisher::wait_for_empty() () from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f9d0acf3700 (LWP 8025)):
#0  0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f9d1df8ed72 in Pipe::read_message(Message**) () from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f9d0adf4700 (LWP 8024)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2
#2  0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2
#3  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f9d0abf2700 (LWP 8041)):
#0  0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f9d1df8ed72 in Pipe::read_message(Message**) () from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f9d1ef9a780 (LWP 4063)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f9d1de173b3 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, Objecter::PoolOp*>, std::_Select1st<std::pair<unsigned long const, Objecter::PoolOp*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::PoolOp*> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, Objecter::PoolOp*> >*) () from /usr/lib/librados.so.2
#2  0x00007f9d1de2285b in librados::IoCtxImpl::aio_sparse_read(object_t, librados::AioCompletionImpl*, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*, ceph::buffer::list*, unsigned long, unsigned long, unsigned long) () from /usr/lib/librados.so.2
#3  0x00007fff1216c820 in ?? ()
#4  0x00007fff1216c820 in ?? ()
#5  0x0000000100000000 in ?? ()
#6  0x00007fff1216c820 in ?? ()
#7  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f9d0b1f8700 (LWP 8080)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2
#2  0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2
#3  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f9d0aef5700 (LWP 8039)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2
#2  0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2
#3  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f9d0bfff700 (LWP 4074)):
#0  0x00007f9d1de87188 in PerfCountersCollection::remove(PerfCounters*) () from /usr/lib/librados.so.2
#1  0x0000003000000003 in ?? ()
#2  0x0000000000020000 in ?? ()
#3  0x00007f9d0bffe778 in ?? ()
#4  0x00007f9d0bffe778 in ?? ()
#5  0x00007f9c00000000 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) q

Actions

Copy link

#12

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-09-05_01:00:58-rbd-next-testing-basic-plana/22153

Actions

Copy link

#13

Updated by Tamilarasi muthamizhan over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-09-09_01:36:24-upgrade-small-next-testing-basic-vps/27433

Actions

Copy link

#14

Updated by Josh Durgin over 10 years ago

http://qa-proxy.ceph.com/teuthology/teuthology-2013-10-07_19:01:03-rbd-dumpling-testing-basic-plana/39473/

Actions

Copy link

#15

Updated by Sage Weil over 10 years ago

Priority changed from High to Normal

Actions

Copy link

#16

Updated by Tamilarasi muthamizhan over 10 years ago

latest logs: ubuntu@teuthology:/a/teuthology-2013-11-08_05:35:02-upgrade-small-next-testing-basic-vps/90358

Actions

Copy link

#17

Updated by Ian Colle over 10 years ago

Priority changed from Normal to Urgent

Upgraded to Urgent since causing failures in nightlies

Actions

Copy link

#18

Updated by Tamilarasi muthamizhan over 10 years ago

recent logs : ubuntu@teuthology:/a/teuthology-2013-11-21_23:01:11-rbd-next-testing-basic-plana/113257

2013-11-22T03:56:35.652 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3523 flatten
2013-11-22T03:56:38.000 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3525 trunc   from 0x62e39fe to 0x1b9c7ec
2013-11-22T03:56:38.135 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3529 punch   from 0x1b7c8d9 to 0x1b8ae76, (0xe59d bytes)
2013-11-22T03:56:38.178 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3530 trunc   from 0x1b9c7ec to 0x74f792a
2013-11-22T03:56:38.210 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3531 clone   20 order 18 su 131072 sc 12
2013-11-22T03:56:39.281 INFO:teuthology.orchestra.run.out:[10.214.132.22]: checking clone #18, image image_client.0-clone18 against file /home/ubuntu/cephtest/archive/fsx-image_client.0-parent19
2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]: common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f291d4f1700 time 2013-11-22 03:56:41.432336
2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]: common/Mutex.cc: 93: FAILED assert(r == 0)
2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  ceph version 0.72-214-g203065c (203065c076daf9aaa9ade3d7cdd1b10e6700bfe6)
2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  1: (Mutex::Lock(bool)+0x1c3) [0x7f29232fc0e3]
2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7f29232f983e]
2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  3: (librbd::AioCompletion::complete()+0x210) [0x7f2923fecb10]
2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x177) [0x7f2923febfd7]
2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  5: (librbd::C_AioRead::finish(int)+0x104) [0x7f2923fec2d4]
2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  6: (Context::complete(int)+0x9) [0x7f2923fec769]
2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f2923ff6f57]
2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  8: (librados::C_AioComplete::finish(int)+0x1d) [0x7f292329024d]
2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  9: (Context::complete(int)+0x9) [0x7f2923270e69]
2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  10: (Finisher::finisher_thread_entry()+0x1c0) [0x7f292331d7d0]
2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  11: (()+0x7e9a) [0x7f2922825e9a]
2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  12: (clone()+0x6d) [0x7f2922b2eccd]
2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]: terminate called after throwing an instance of 'ceph::FailedAssertion'
2013-11-22T03:56:52.443 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-next/teuthology/run_tasks.py", line 33, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/teuthology-next/teuthology/task/rbd_fsx.py", line 40, in task
    p.spawn(_run_one_client, ctx, config, role)
  File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 83, in __exit__
    for result in self:
  File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 100, in next
    resurrect_traceback(result)
  File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/teuthology-next/teuthology/task/rbd_fsx.py", line 62, in _run_one_client
    'image_{image}'.format(image=role),
  File "/home/teuthworker/teuthology-next/teuthology/orchestra/remote.py", line 47, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 271, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 264, in _check_status
    raise CommandCrashedError(command=r.command)
CommandCrashedError: Command crashed: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_librbd_fsx -d -W -R -p 100 -P /home/ubuntu/cephtest/archive -t 1 -l 250000000 -S 0 -N 5000 pool_client.0 image_client.0'
2013-11-22T03:56:52.650 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/inktank/teuthology/search?q=c31f734e23384b0396a3d735e27ad1f7
CommandCrashedError: Command crashed: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_librbd_fsx -d -W -R -p 100 -P /home/ubuntu/cephtest/archive -t 1 -l 250000000 -S 0 -N 5000 pool_client.0 image_client.0'

Actions

Copy link

#19

Updated by Dmitry Borodaenko over 10 years ago

This problem is easily reproducible with Mirantis OpenStack 4.0:
https://bugs.launchpad.net/fuel/+bug/1260911

After converting and uploading a RAW CirrOS image, all you have to do is launch a VM from that image, then run rbd -p compute export <instance-id>_disk - > /dev/null.

Actions

Copy link

#20

Updated by Josh Durgin over 10 years ago

Status changed from In Progress to Fix Under Review
Backport set to dumpling, emperor

https://github.com/ceph/ceph/pull/1000

Actions

Copy link

#21

Updated by Josh Durgin over 10 years ago

Status changed from Fix Under Review to Pending Backport

I cherry-picked the fix to dumpling-5426 and emperor-5426 branches based on dumpling and emperor, respectively. The fix will go on the actual dumpling and emperor branches once its passed nightly tests for a while.

Actions

Copy link

#22

Updated by Sage Weil over 10 years ago

Status changed from Pending Backport to Resolved

Actions

Copy link

#23

Updated by Dmitry Borodaenko over 10 years ago

Fix confirmed with the packages from http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/dumpling-5426/ on Mirantis OpenStack 4.0, SEGV no longer occurs.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #5426

librbd: mutex assert in perfcounters::tinc in librbd::AioCompletion::complete()

Updated by Ian Colle almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Tamilarasi muthamizhan over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Tamilarasi muthamizhan over 10 years ago

Updated by Ian Colle over 10 years ago

Updated by Tamilarasi muthamizhan over 10 years ago

Updated by Dmitry Borodaenko over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Josh Durgin over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Dmitry Borodaenko over 10 years ago