Bug #8694: OSD crashed (assertion failure) at FileStore::_collection_move_rename - Ceph - Ceph

Actions

Copy link

Bug #8694

closed

OSD crashed (assertion failure) at FileStore::_collection_move_rename

Added by Guang Yang almost 10 years ago. Updated almost 10 years ago.

Status:

Duplicate

Priority:

High

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Most recently when the cluster was doing backfilling/recovery, we captured one OSD crash at FileStore::_collection_move_rename, following is the full backtrace:

No symbol table info available.
#10 0x00000000009ed129 in ceph::__ceph_assert_fail (assertion=0x1d5c230 "\001", file=0xd23c5b0 "\320\300#\r", line=4454, 
    func=0xbc4280 "int FileStore::_collection_move_rename(coll_t, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&)") at common/assert.cc:77
        tss = <incomplete type>
        buf = "os/FileStore.cc: In function 'int FileStore::_collection_move_rename(coll_t, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&)' thread 7fb1ea55a700 time 2014-06-27 13:19:35.06167"...
        bt = 0x6174a80
        oss = <incomplete type>
#11 0x00000000008eec8f in FileStore::_collection_move_rename (this=0x1d78000, oldcid=..., oldoid=..., c=..., o=..., spos=...) at os/FileStore.cc:4454
        fd = std::tr1::shared_ptr (empty) 0x0
        __func__ = "_collection_move_rename" 
        srccmp = -2
        __PRETTY_FUNCTION__ = "int FileStore::_collection_move_rename(coll_t, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&)" 
        r = -2
        dstcmp = 1
#12 0x00000000008f3579 in FileStore::_do_transaction (this=0x1d78000, t=..., op_seq=<value optimized out>, trans_num=<value optimized out>, handle=0x7fb1ea559cb0) at os/FileStore.cc:2349
        oldcid = {static META_COLL = {static META_COLL = <same as static member of an already seen type>, str = "meta"}, str = "3.5bfs6_head"}
        oldoid = {hobj = {oid = {name = "default.5470.715__shadow_.KMVmfZ4wW3C8q_0UB_DIxAF-4HnzJ61_1"}, snap = {val = 18446744073709551614}, hash = 960337343, max = false, static POOL_IS_TEMP = -1, pool = 3, nspace = "", key = ""}, 
          generation = 18446744073709551615, shard_id = 6 '\006', static NO_SHARD = 255 '\377', static NO_GEN = 18446744073709551615}
        newcid = {static META_COLL = {static META_COLL = <same as static member of an already seen type>, str = "meta"}, str = "3.5bfs6_head"}
        newoid = {hobj = {oid = {name = "default.5470.715__shadow_.KMVmfZ4wW3C8q_0UB_DIxAF-4HnzJ61_1"}, snap = {val = 18446744073709551614}, hash = 960337343, max = false, static POOL_IS_TEMP = -1, pool = 3, nspace = "", key = ""}, 
          generation = 13947, shard_id = 6 '\006', static NO_SHARD = 255 '\377', static NO_GEN = 18446744073709551615}
        op = 38
        r = 0
        i = {p = {bl = 0x34f550a0, ls = 0x34f550a0, off = 2516, p = {_raw = , _off = 78260816, _len = 0}, p_off = 0}, sobject_encoding = false, pool_override = -1, use_pool_override = false, replica = false, 
          _tolerate_collection_add_enoent = false}
        spos = {seq = 5351696, trans = 0, op = 6}
---Type <return> to continue, or q <return> to quit---      
        __PRETTY_FUNCTION__ = "unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)" 
#13 0x00000000008fab34 in FileStore::_do_transactions (this=0x1d78000, tls=std::list = {...}, op_seq=5351696, handle=0x7fb1ea559cb0) at os/FileStore.cc:1868
        p = <value optimized out>
        r = <value optimized out>
        bytes = <value optimized out>
        ops = <value optimized out>
        trans_num = <value optimized out>
#14 0x00000000008fade1 in FileStore::_do_op (this=0x1d78000, osr=0x2f5b37a0, handle=...) at os/FileStore.cc:1698
        o = 0x32d65130
        r = <value optimized out>
#15 0x0000000000a8c301 in ThreadPool::worker (this=0x1d78d90, wt=0x1da6de0) at common/WorkQueue.cc:125
        tp_handle = {cct = 0x1d5c230, hb = 0x1db6630, grace = 60, suicide_grace = 180}
        item = 0x2f5b37a0
        wq = 0x1d78f18
        did = false
        ss = <incomplete type>
        hb = 0x1db6630
#16 0x0000000000a8f340 in ThreadPool::WorkThread::entry (this=<value optimized out>) at common/WorkQueue.h:317
No locals.

The log showed that it tried to open an non-existing file which led to the crash we observed, there was not much verbose log captured during the time.

More information:
1. the pool is using EC
2. Ceph version: ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
3. Restarting the OSD worked with no crashing anymore

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #8694

OSD crashed (assertion failure) at FileStore::_collection_move_rename

Updated by Greg Farnum almost 10 years ago

Updated by Guang Yang almost 10 years ago

Updated by Guang Yang almost 10 years ago

Updated by Samuel Just almost 10 years ago

Updated by Sage Weil almost 10 years ago