Project

General

Profile

Actions

Bug #5426

closed

librbd: mutex assert in perfcounters::tinc in librbd::AioCompletion::complete()

Added by Sage Weil almost 11 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

100%

Source:
Q/A
Tags:
Backport:
dumpling, emperor
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-06-22T12:24:53.336 INFO:teuthology.orchestra.run.err:common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f32aaffd700 time 2013-06-22 12:24:55.286391
2013-06-22T12:24:53.336 INFO:teuthology.orchestra.run.err:common/Mutex.cc: 93: FAILED assert(r == 0)
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: ceph version 0.61.4-5-gd572cf6 (d572cf6f77418f217a5a8e37f1124dc566e24d0b)
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 1: (Mutex::Lock(bool)+0x1c3) [0x7f32b5362593]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7f32b535fdbe]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 3: (librbd::AioCompletion::complete()+0x210) [0x7f32b5f72ac0]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7f32b5f71ed7]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 5: (librbd::C_AioRead::finish(int)+0x1d6) [0x7f32b5f72336]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0xa) [0x7f32b5f7281a]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f32b5f86517]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 8: (librados::C_AioComplete::finish(int)+0x1d) [0x7f32b530b12d]
2013-06-22T12:24:53.337 INFO:teuthology.orchestra.run.err: 9: (Finisher::finisher_thread_entry()+0x1c0) [0x7f32b5378bb0]
2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: 10: (()+0x7e9a) [0x7f32b4902e9a]
2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: 11: (clone()+0x6d) [0x7f32b4c0bccd]
2013-06-22T12:24:53.338 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

run was
ubuntu@teuthology:/a/teuthology-2013-06-21_20:00:30-rbd-cuttlefish-testing-basic/42512$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      client:
        rbd cache: false
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b
  install:
    ceph:
      sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: d572cf6f77418f217a5a8e37f1124dc566e24d0b
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000

Subtasks 1 (0 open1 closed)

Bug #8409: "Segmentation fault" in rbd in upgrade:dumpling-dumpling-testing-basic-vpsDuplicate05/20/2014

Actions
Actions #1

Updated by Ian Colle almost 11 years ago

  • Assignee set to Josh Durgin
Actions #2

Updated by Sage Weil almost 11 years ago

  • Status changed from New to 12
2013-07-09T20:46:21.561 INFO:teuthology.orchestra.run.err:common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fb3b1ffb700 time 2013-07-09 20:46:41.372806
2013-07-09T20:46:21.561 INFO:teuthology.orchestra.run.err:common/Mutex.cc: 93: FAILED assert(r == 0)
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: ceph version 0.61.4-48-g24f90b8 (24f90b832c695ef13021db66a178c18369ac356d)
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 1: (Mutex::Lock(bool)+0x1c3) [0x7fb3bc4410d3]
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7fb3bc43e8fe]
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 3: (librbd::AioCompletion::complete()+0x210) [0x7fb3bd053ac0]
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7fb3bd052ed7]
2013-07-09T20:46:21.567 INFO:teuthology.orchestra.run.err: 5: (librbd::C_AioRead::finish(int)+0x1d6) [0x7fb3bd053336]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 6: (Context::complete(int)+0xa) [0x7fb3bd05381a]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7fb3bd067517]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 8: (librados::C_AioComplete::finish(int)+0x1d) [0x7fb3bc3e9c6d]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 9: (Finisher::finisher_thread_entry()+0x1c0) [0x7fb3bc4576f0]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 10: (()+0x7e9a) [0x7fb3bb9e0e9a]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: 11: (clone()+0x6d) [0x7fb3bbce9ccd]
2013-07-09T20:46:21.568 INFO:teuthology.orchestra.run.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-07-09T20:46:21.569 INFO:teuthology.orchestra.run.err:terminate called after throwing an instance of 'ceph::FailedAssertion'

job was
ubuntu@teuthology:/a/teuthology-2013-07-09_20:00:36-rbd-cuttlefish-testing-basic/61034$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      client:
        rbd cache: false
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: 24f90b832c695ef13021db66a178c18369ac356d
  install:
    ceph:
      sha1: 24f90b832c695ef13021db66a178c18369ac356d
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: 24f90b832c695ef13021db66a178c18369ac356d
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000
Actions #3

Updated by Sage Weil almost 11 years ago

both of these are cuttlefish.. maybe something that hasn't been backported?

Actions #4

Updated by Sage Weil almost 11 years ago

ubuntu@teuthology:/a/teuthology-2013-07-13_20:00:45-rbd-cuttlefish-testing-basic/66483$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 658240710baaf9c661b8fbf856322907a0d394ee
  install:
    ceph:
      sha1: 658240710baaf9c661b8fbf856322907a0d394ee
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: 658240710baaf9c661b8fbf856322907a0d394ee
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    conf:
      client:
        rbd cache: false
- rbd_fsx:
    clients:
    - client.0
    ops: 5000

cuttlefish again!

Actions #5

Updated by Sage Weil over 10 years ago

okay, not cuttlefish.

ubuntu@teuthology:/a/teuthology-2013-07-17_01:00:47-rbd-next-testing-basic/70375

2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err:     0> 2013-07-17 10:14:41.818619 7f00ae7fc700 -1 *** Caught signal (Segmentation fault) **
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: in thread 7f00ae7fc700
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err:
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: ceph version 0.66-669-g884fa2f (884fa2fcb6d707b23317bab1da909586ddc27608)
2013-07-17T10:14:13.999 INFO:teuthology.task.workunit.client.0.err: 1: rbd() [0x41f2aa]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 2: (()+0xfcb0) [0x7f00b8569cb0]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 3: (PerfCounters::tinc(int, utime_t)+0x28) [0x7f00b895f6b8]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 4: (librbd::AioCompletion::complete()+0x210) [0x7f00b95e95d0]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 5: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7f00b95e8a37]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 6: (librbd::C_AioRead::finish(int)+0x15d) [0x7f00b95e8e1d]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 7: (Context::complete(int)+0xa) [0x7f00b95e932a]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 8: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f00b95fd007]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 9: (librados::C_AioComplete::finish(int)+0x1d) [0x7f00b88fef7d]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 10: (Finisher::finisher_thread_entry()+0x1c0) [0x7f00b89826b0]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 11: (()+0x7e9a) [0x7f00b8561e9a]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: 12: (clone()+0x6d) [0x7f00b7b74ccd]
2013-07-17T10:14:14.000 INFO:teuthology.task.workunit.client.0.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #6

Updated by Josh Durgin over 10 years ago

  • Status changed from 12 to In Progress
Actions #7

Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High
Actions #8

Updated by Josh Durgin over 10 years ago

Reproduced after 22 tries with no debugging on. Couldn't reproduce without debugging, but this is a start.

The PerfCounters object hasn't been destroyed yet, but the _conf and other fields that its m_cct references is invalid, so m_cct is probably already destroyed at this point.

p *(this->m_cct)
$8 = {nref = {val = 23004272}, _conf = 0x671, _log = 0x7facf203ebf8, _module_type = 4060343288, _service_thread = 0x15e8960, _log_obs = 0x15e8960, _admin_socket = 0x30, 
  _service_thread_lock = 769, _perf_counters_collection = 0x15b7570, _perf_counters_conf_obs = 0x16af240, _admin_hook = 0x10000fffffffe, 
  _heartbeat_map = 0x722032393935363a, _crypto_none = 0x2e617461645f6462, _crypto_aes = 0x3166636463303031}

Actions #9

Updated by Josh Durgin over 10 years ago

nevermind, the PerfCounters is invalid too, with m_lower_bound = 22777856, m_upper_bound = 0

Actions #10

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-08-22_01:01:01-rbd-next-testing-basic-plana/888

Actions #11

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-08-23_00:30:06-ceph-deploy-master---basic-saya/1087

slightly different, though:

...

Thread 10 (Thread 0x7f9d1c19e700 (LWP 4064)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
---Type <return> to continue, or q <return> to quit---
#1  0x00007f9d1df100ab in std::vector<ceph::log::Subsystem, std::allocator<ceph::log::Subsystem> >::_M_fill_insert(__gnu_cxx::__normal_iterator<ceph::log::Subsystem*, std::vector<ceph::log::Subsystem, std::allocator<ceph::log::Subsystem> > >, unsigned long, ceph::log::Subsystem const&) ()
   from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f9d1b19c700 (LWP 4066)):
#0  0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f9d1de9561b in std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> >::~pair() () from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f9d19999700 (LWP 4069)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007f9d1dea9ebb in Finisher::wait_for_empty() () from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f9d0acf3700 (LWP 8025)):
#0  0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f9d1df8ed72 in Pipe::read_message(Message**) () from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f9d0adf4700 (LWP 8024)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2
#2  0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2
#3  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f9d0abf2700 (LWP 8041)):
#0  0x00007f9d1d6cb313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f9d1df8ed72 in Pipe::read_message(Message**) () from /usr/lib/librados.so.2
#2  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f9d1ef9a780 (LWP 4063)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f9d1de173b3 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, Objecter::PoolOp*>, std::_Select1st<std::pair<unsigned long const, Objecter::PoolOp*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::PoolOp*> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, Objecter::PoolOp*> >*) () from /usr/lib/librados.so.2
#2  0x00007f9d1de2285b in librados::IoCtxImpl::aio_sparse_read(object_t, librados::AioCompletionImpl*, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*, ceph::buffer::list*, unsigned long, unsigned long, unsigned long) () from /usr/lib/librados.so.2
#3  0x00007fff1216c820 in ?? ()
#4  0x00007fff1216c820 in ?? ()
#5  0x0000000100000000 in ?? ()
#6  0x00007fff1216c820 in ?? ()
#7  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f9d0b1f8700 (LWP 8080)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2
#2  0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2
#3  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f9d0aef5700 (LWP 8039)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f9d1df9b3db in Pipe::writer() () from /usr/lib/librados.so.2
#2  0x00007f9d1dfa66bd in ?? () from /usr/lib/librados.so.2
#3  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f9d0bfff700 (LWP 4074)):
#0  0x00007f9d1de87188 in PerfCountersCollection::remove(PerfCounters*) () from /usr/lib/librados.so.2
#1  0x0000003000000003 in ?? ()
#2  0x0000000000020000 in ?? ()
#3  0x00007f9d0bffe778 in ?? ()
#4  0x00007f9d0bffe778 in ?? ()
#5  0x00007f9c00000000 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) q

Actions #12

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-09-05_01:00:58-rbd-next-testing-basic-plana/22153

Actions #13

Updated by Tamilarasi muthamizhan over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-09-09_01:36:24-upgrade-small-next-testing-basic-vps/27433

Actions #15

Updated by Sage Weil over 10 years ago

  • Priority changed from High to Normal
Actions #16

Updated by Tamilarasi muthamizhan over 10 years ago

latest logs: ubuntu@teuthology:/a/teuthology-2013-11-08_05:35:02-upgrade-small-next-testing-basic-vps/90358

Actions #17

Updated by Ian Colle over 10 years ago

  • Priority changed from Normal to Urgent

Upgraded to Urgent since causing failures in nightlies

Actions #18

Updated by Tamilarasi muthamizhan over 10 years ago

recent logs : ubuntu@teuthology:/a/teuthology-2013-11-21_23:01:11-rbd-next-testing-basic-plana/113257

2013-11-22T03:56:35.652 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3523 flatten
2013-11-22T03:56:38.000 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3525 trunc   from 0x62e39fe to 0x1b9c7ec
2013-11-22T03:56:38.135 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3529 punch   from 0x1b7c8d9 to 0x1b8ae76, (0xe59d bytes)
2013-11-22T03:56:38.178 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3530 trunc   from 0x1b9c7ec to 0x74f792a
2013-11-22T03:56:38.210 INFO:teuthology.orchestra.run.out:[10.214.132.22]: 3531 clone   20 order 18 su 131072 sc 12
2013-11-22T03:56:39.281 INFO:teuthology.orchestra.run.out:[10.214.132.22]: checking clone #18, image image_client.0-clone18 against file /home/ubuntu/cephtest/archive/fsx-image_client.0-parent19
2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]: common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f291d4f1700 time 2013-11-22 03:56:41.432336
2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]: common/Mutex.cc: 93: FAILED assert(r == 0)
2013-11-22T03:56:41.435 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  ceph version 0.72-214-g203065c (203065c076daf9aaa9ade3d7cdd1b10e6700bfe6)
2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  1: (Mutex::Lock(bool)+0x1c3) [0x7f29232fc0e3]
2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  2: (PerfCounters::tinc(int, utime_t)+0x5e) [0x7f29232f983e]
2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  3: (librbd::AioCompletion::complete()+0x210) [0x7f2923fecb10]
2013-11-22T03:56:41.436 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  4: (librbd::AioCompletion::complete_request(CephContext*, long)+0x177) [0x7f2923febfd7]
2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  5: (librbd::C_AioRead::finish(int)+0x104) [0x7f2923fec2d4]
2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  6: (Context::complete(int)+0x9) [0x7f2923fec769]
2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  7: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f2923ff6f57]
2013-11-22T03:56:41.437 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  8: (librados::C_AioComplete::finish(int)+0x1d) [0x7f292329024d]
2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  9: (Context::complete(int)+0x9) [0x7f2923270e69]
2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  10: (Finisher::finisher_thread_entry()+0x1c0) [0x7f292331d7d0]
2013-11-22T03:56:41.438 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  11: (()+0x7e9a) [0x7f2922825e9a]
2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  12: (clone()+0x6d) [0x7f2922b2eccd]
2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-11-22T03:56:41.439 INFO:teuthology.orchestra.run.err:[10.214.132.22]: terminate called after throwing an instance of 'ceph::FailedAssertion'
2013-11-22T03:56:52.443 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-next/teuthology/run_tasks.py", line 33, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/teuthology-next/teuthology/task/rbd_fsx.py", line 40, in task
    p.spawn(_run_one_client, ctx, config, role)
  File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 83, in __exit__
    for result in self:
  File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 100, in next
    resurrect_traceback(result)
  File "/home/teuthworker/teuthology-next/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/teuthology-next/teuthology/task/rbd_fsx.py", line 62, in _run_one_client
    'image_{image}'.format(image=role),
  File "/home/teuthworker/teuthology-next/teuthology/orchestra/remote.py", line 47, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 271, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/teuthology-next/teuthology/orchestra/run.py", line 264, in _check_status
    raise CommandCrashedError(command=r.command)
CommandCrashedError: Command crashed: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_librbd_fsx -d -W -R -p 100 -P /home/ubuntu/cephtest/archive -t 1 -l 250000000 -S 0 -N 5000 pool_client.0 image_client.0'
2013-11-22T03:56:52.650 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/inktank/teuthology/search?q=c31f734e23384b0396a3d735e27ad1f7
CommandCrashedError: Command crashed: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_librbd_fsx -d -W -R -p 100 -P /home/ubuntu/cephtest/archive -t 1 -l 250000000 -S 0 -N 5000 pool_client.0 image_client.0'

Actions #19

Updated by Dmitry Borodaenko over 10 years ago

This problem is easily reproducible with Mirantis OpenStack 4.0:
https://bugs.launchpad.net/fuel/+bug/1260911

After converting and uploading a RAW CirrOS image, all you have to do is launch a VM from that image, then run rbd -p compute export <instance-id>_disk - > /dev/null.

Actions #20

Updated by Josh Durgin over 10 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to dumpling, emperor
Actions #21

Updated by Josh Durgin over 10 years ago

  • Status changed from Fix Under Review to Pending Backport

I cherry-picked the fix to dumpling-5426 and emperor-5426 branches based on dumpling and emperor, respectively. The fix will go on the actual dumpling and emperor branches once its passed nightly tests for a while.

Actions #22

Updated by Sage Weil over 10 years ago

  • Status changed from Pending Backport to Resolved
Actions #23

Updated by Dmitry Borodaenko over 10 years ago

Fix confirmed with the packages from http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/dumpling-5426/ on Mirantis OpenStack 4.0, SEGV no longer occurs.

Actions

Also available in: Atom PDF