Project

General

Profile

Actions

Bug #521

closed

objecter: crash in osdmap assert

Added by Sage Weil over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


When accessing multiple RBD-Volumes from one VM in parallel, we are
receiving an assertion:

./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (ABRT) ***
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (sigabrt_handler(int)+0x91) [0x3865922b91]
2: /lib64/libc.so.6() [0x3c0c032a30]
3: (gsignal()+0x35) [0x3c0c0329b5]
4: (abort()+0x175) [0x3c0c034195]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3c110beaad]

This is reproducible by doing the following inside a VM:

# mkfs.btrfs /dev/vdb /dev/vdc /dev/vdd /dev/vde
# mount /dev/vdb /mnt
# cd /mnt
# bonnie++ -u root -d /mnt -f

Any hints are welcome...

Thanks,
Actions #1

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.22.2 to v0.22.3
Actions #2

Updated by Sage Weil over 13 years ago

  • Status changed from New to 4
Actions #3

Updated by Sage Weil over 13 years ago

  • Source set to 0

latest from ML:

Subject:    Re: AW: ./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
From:       Christian Brunner <chb () muc ! de>
Date:       2010-11-05 12:40:39
Message-ID: AANLkTi=2=oaobOtwacs8EZRGnBRzctShTFPs_2E-_gcq () mail ! gmail ! com
[Download message RAW]

Hi Sage,

I'm sorry, I was busy with some other things, but I was able to look
at this now:

Now I can confirm that the problem is related to a missing osd, as I
had to stop one of the osds to reproduce it. When I split up the two
asserts, the error occurs in:

./osd/OSDMap.h:460: FAILED assert(exists(osd))

and here is the gdb backtrace:

#0  0x0000003c0c0329b5 in raise () from /lib64/libc.so.6
#1  0x0000003c0c034195 in abort () from /lib64/libc.so.6
#2  0x0000003c110beaad in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib64/libstdc++.so.6
#3  0x0000003c110bcc36 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x0000003c110bcc63 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5  0x0000003c110bcd5e in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x00007ffc36df6136 in ceph::__ceph_assert_fail (
    assertion=0x7ffc36e22c59 "exists(osd)", file=<value optimized out>,
    line=460, func=<value optimized out>) at common/assert.cc:30
#7  0x00007ffc36d8014f in OSDMap::get_inst (this=<value optimized out>,
    osd=<value optimized out>) at osd/OSDMap.h:460
#8  0x00007ffc36d7b4c2 in Objecter::op_submit (this=0x159a4a0,
    op=0x7ffb29e264c0) at osdc/Objecter.cc:461
#9  0x00007ffc36d4bdc9 in Objecter::write (this=0x159a4a0, oid=..., ol=...,
    off=<value optimized out>, len=1638400, snapc=..., bl=..., mtime=...,
    onack=0x7ffc00811d20, oncommit=0x7ffc000461a0, flags=0)
    at osdc/Objecter.h:606
#10 0x00007ffc36d4d24b in RadosClient::aio_write (this=0x15916e0, pool=...,
    oid=..., off=917504, bl=..., len=1638400, c=0x7ffc03d4c010)
    at librados.cc:949
#11 0x00007ffc36d4d41a in rados_aio_write (pool=0x159a000,
    o=<value optimized out>, off=917504, buf=<value optimized out>,
    len=1638400, completion=0x7ffc03d4c010) at librados.cc:2119
#12 0x000000000045a305 in rbd_aio_rw_vector (bs=<value optimized out>,
    sector_num=<value optimized out>, qiov=<value optimized out>,
    nb_sectors=917504, cb=<value optimized out>, opaque=<value optimized out>,
    write=1) at block/rbd.c:769
#13 0x000000000045a430 in rbd_aio_writev (bs=<value optimized out>,
    sector_num=<value optimized out>, qiov=<value optimized out>,
    nb_sectors=<value optimized out>, cb=<value optimized out>,
    opaque=<value optimized out>) at block/rbd.c:802
#14 0x000000000043bb73 in bdrv_aio_writev (bs=0x159dd20, sector_num=2279168,
    qiov=0x7ffb29e26480, nb_sectors=3200, cb=<value optimized out>,
    opaque=<value optimized out>) at block.c:2019
#15 0x000000000043bb73 in bdrv_aio_writev (bs=0x159d3f0, sector_num=2279168,
    qiov=0x7ffb29e26480, nb_sectors=3200, cb=<value optimized out>,
    opaque=<value optimized out>) at block.c:2019
#16 0x000000000043ca2c in bdrv_aio_multiwrite (bs=0x159d3f0,
    reqs=0x7ffc0f5fd690, num_reqs=<value optimized out>) at block.c:2228
#17 0x000000000041cca5 in virtio_submit_multiwrite (bs=<value optimized out>,
    mrb=0x7ffc0f5fd690) at /usr/src/debug/qemu-kvm-0.13.0/hw/virtio-blk.c:241
#18 0x000000000041d30c in virtio_blk_handle_output (vdev=0x1e56c30,
    vq=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.13.0/hw/virtio-blk.c:359
#19 0x000000000042dc5d in kvm_handle_io (env=0x15c4b20)
    at /usr/src/debug/qemu-kvm-0.13.0/kvm-all.c:763
#20 kvm_run (env=0x15c4b20) at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:645
#21 0x000000000042dd89 in kvm_cpu_exec (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:1238
#22 0x000000000042f181 in kvm_main_loop_cpu (_env=0x15c4b20)
    at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:1495
#23 ap_main_loop (_env=0x15c4b20)
    at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:1541
#24 0x0000003c0c4077e1 in start_thread () from /lib64/libpthread.so.0
#25 0x0000003c0c0e151d in clone () from /lib64/libc.so.6

I hope this helps.

Regards,
Christian

2010/11/4 Sage Weil <sage@newdream.net>:
> Hi Christian,
>
> On Tue, 26 Oct 2010, Christian Brunner wrote:
>> I can't promise this for tomorrow, but I think I can do this on Thursday.
>
> Have you had a chance to look into this one at all?
>
> Thanks-
> sage

Actions #4

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.22.3 to v0.23
Actions #5

Updated by Sage Weil over 13 years ago

Can you try with something like


diff --git a/src/osdc/Objecter.cc b/src/osdc/Objecter.cc
index 0fe4d65..d2f4c54 100644
--- a/src/osdc/Objecter.cc
+++ b/src/osdc/Objecter.cc
@@ -494,6 +494,11 @@ tid_t Objecter::op_submit(Op *op)
     if (op->priority)
       m->set_priority(op->priority);

+    if (!osdmap->exists(pg.primary())) {
+      dout(0) << "pgid " << op->pgid << " acting " << pg.acting
+             << " primary dne, osdmap epoch " << osdmap->get_epoch() << dendl;
+    }
+
     messenger->send_message(m, osdmap->get_inst(pg.primary()));
   } else 
     maybe_request_map();

(That is against the latest 'rc' branch.)

Once you get the osdmap epoch, can you dump that (ceph osd dump <epoch> -o -). And map the pg explicitly, 'ceph pg map 1.23', or 'ceph osd getmap <epoch> -o /tmp/foo ; osdmaptool /tmp/foo --test-map-pg 1.23').

Actions #6

Updated by Sage Weil over 13 years ago

  • Status changed from 4 to Resolved
  • Source changed from 0 to 2
Actions

Also available in: Atom PDF