Project

General

Profile

Actions

Bug #8694

closed

OSD crashed (assertion failure) at FileStore::_collection_move_rename

Added by Guang Yang almost 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Most recently when the cluster was doing backfilling/recovery, we captured one OSD crash at FileStore::_collection_move_rename, following is the full backtrace:

No symbol table info available.
#10 0x00000000009ed129 in ceph::__ceph_assert_fail (assertion=0x1d5c230 "\001", file=0xd23c5b0 "\320\300#\r", line=4454, 
    func=0xbc4280 "int FileStore::_collection_move_rename(coll_t, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&)") at common/assert.cc:77
        tss = <incomplete type>
        buf = "os/FileStore.cc: In function 'int FileStore::_collection_move_rename(coll_t, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&)' thread 7fb1ea55a700 time 2014-06-27 13:19:35.06167"...
        bt = 0x6174a80
        oss = <incomplete type>
#11 0x00000000008eec8f in FileStore::_collection_move_rename (this=0x1d78000, oldcid=..., oldoid=..., c=..., o=..., spos=...) at os/FileStore.cc:4454
        fd = std::tr1::shared_ptr (empty) 0x0
        __func__ = "_collection_move_rename" 
        srccmp = -2
        __PRETTY_FUNCTION__ = "int FileStore::_collection_move_rename(coll_t, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&)" 
        r = -2
        dstcmp = 1
#12 0x00000000008f3579 in FileStore::_do_transaction (this=0x1d78000, t=..., op_seq=<value optimized out>, trans_num=<value optimized out>, handle=0x7fb1ea559cb0) at os/FileStore.cc:2349
        oldcid = {static META_COLL = {static META_COLL = <same as static member of an already seen type>, str = "meta"}, str = "3.5bfs6_head"}
        oldoid = {hobj = {oid = {name = "default.5470.715__shadow_.KMVmfZ4wW3C8q_0UB_DIxAF-4HnzJ61_1"}, snap = {val = 18446744073709551614}, hash = 960337343, max = false, static POOL_IS_TEMP = -1, pool = 3, nspace = "", key = ""}, 
          generation = 18446744073709551615, shard_id = 6 '\006', static NO_SHARD = 255 '\377', static NO_GEN = 18446744073709551615}
        newcid = {static META_COLL = {static META_COLL = <same as static member of an already seen type>, str = "meta"}, str = "3.5bfs6_head"}
        newoid = {hobj = {oid = {name = "default.5470.715__shadow_.KMVmfZ4wW3C8q_0UB_DIxAF-4HnzJ61_1"}, snap = {val = 18446744073709551614}, hash = 960337343, max = false, static POOL_IS_TEMP = -1, pool = 3, nspace = "", key = ""}, 
          generation = 13947, shard_id = 6 '\006', static NO_SHARD = 255 '\377', static NO_GEN = 18446744073709551615}
        op = 38
        r = 0
        i = {p = {bl = 0x34f550a0, ls = 0x34f550a0, off = 2516, p = {_raw = , _off = 78260816, _len = 0}, p_off = 0}, sobject_encoding = false, pool_override = -1, use_pool_override = false, replica = false, 
          _tolerate_collection_add_enoent = false}
        spos = {seq = 5351696, trans = 0, op = 6}
---Type <return> to continue, or q <return> to quit---      
        __PRETTY_FUNCTION__ = "unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)" 
#13 0x00000000008fab34 in FileStore::_do_transactions (this=0x1d78000, tls=std::list = {...}, op_seq=5351696, handle=0x7fb1ea559cb0) at os/FileStore.cc:1868
        p = <value optimized out>
        r = <value optimized out>
        bytes = <value optimized out>
        ops = <value optimized out>
        trans_num = <value optimized out>
#14 0x00000000008fade1 in FileStore::_do_op (this=0x1d78000, osr=0x2f5b37a0, handle=...) at os/FileStore.cc:1698
        o = 0x32d65130
        r = <value optimized out>
#15 0x0000000000a8c301 in ThreadPool::worker (this=0x1d78d90, wt=0x1da6de0) at common/WorkQueue.cc:125
        tp_handle = {cct = 0x1d5c230, hb = 0x1db6630, grace = 60, suicide_grace = 180}
        item = 0x2f5b37a0
        wq = 0x1d78f18
        did = false
        ss = <incomplete type>
        hb = 0x1db6630
#16 0x0000000000a8f340 in ThreadPool::WorkThread::entry (this=<value optimized out>) at common/WorkQueue.h:317
No locals.

The log showed that it tried to open an non-existing file which led to the crash we observed, there was not much verbose log captured during the time.

More information:
1. the pool is using EC
2. Ceph version: ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
3. Restarting the OSD worked with no crashing anymore


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #8733: OSD crashed at void ECBackend::handle_sub_readResolved07/02/2014

Actions
Actions

Also available in: Atom PDF