Bug #11527: KV OSD stacktrace on disk failure - Ceph - Ceph

Actions

Copy link

Bug #11527

closed

KV OSD stacktrace on disk failure

Added by Kenneth Waegeman almost 9 years ago. Updated almost 9 years ago.

Status:

Closed

Priority:

Low

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

4 - irritation

Reviewed:

Affected Versions:

v0.94

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

When an OSD failed, this was the stacktrace I got:

    -6> 2015-05-01 15:10:28.472385 7f02820f3700  1 -- 10.141.16.14:6846/1003323 <== osd.17 10.143.16.11:0/5157 110160 ==== osd_ping(ping e3296 stamp 2015-05-01 15:10:28.471824) v2 ==== 
47+0+0 (3225483472 0 0) 0x1ed0c400 con 0xe8e61a0
    -5> 2015-05-01 15:10:28.472491 7f02820f3700  1 -- 10.141.16.14:6846/1003323 --> 10.143.16.11:0/5157 -- osd_ping(ping_reply e3296 stamp 2015-05-01 15:10:28.471824) v2 -- ?+0 0x1a48a2
00 con 0xe8e61a0
    -4> 2015-05-01 15:10:28.474514 7f02808f0700  1 -- 10.143.16.14:6849/1003323 <== osd.89 10.143.16.15:0/3407 110135 ==== osd_ping(ping e3296 stamp 2015-05-01 15:10:28.473849) v2 ==== 
47+0+0 (605194218 0 0) 0x2d552c00 con 0xe8e4780
    -3> 2015-05-01 15:10:28.474548 7f02808f0700  1 -- 10.143.16.14:6849/1003323 --> 10.143.16.15:0/3407 -- osd_ping(ping_reply e3296 stamp 2015-05-01 15:10:28.473849) v2 -- ?+0 0x13daf2
00 con 0xe8e4780
    -2> 2015-05-01 15:10:28.474558 7f02820f3700  1 -- 10.141.16.14:6846/1003323 <== osd.89 10.143.16.15:0/3407 110135 ==== osd_ping(ping e3296 stamp 2015-05-01 15:10:28.473849) v2 ==== 
47+0+0 (605194218 0 0) 0x23ab1200 con 0xe8e23c0
    -1> 2015-05-01 15:10:28.474590 7f02820f3700  1 -- 10.141.16.14:6846/1003323 --> 10.143.16.15:0/3407 -- osd_ping(ping_reply e3296 stamp 2015-05-01 15:10:28.473849) v2 -- ?+0 0x1ed0c4
00 con 0xe8e23c0
     0> 2015-05-01 15:10:28.475037 7f02768dc700 -1 *** Caught signal (Bus error) **
 in thread 7f02768dc700

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: /usr/bin/ceph-osd() [0xac51f2]
 2: (()+0xf130) [0x7f02938f6130]
 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x233) [0x7f0294510733]
 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x276) [0x7f02945118a6]
 5: (()+0x3acd0) [0x7f0294513cd0]
 6: (()+0x3b071) [0x7f0294514071]
 7: (()+0x38028) [0x7f0294511028]
 8: (()+0x21a45) [0x7f02944faa45]
 9: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0x49) [0x96a4d9]
 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x907) [0xa8d777]
 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x239) [0x930b69]
 12: (KeyValueStore::collection_list_range(coll_t, ghobject_t, ghobject_t, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*)+0x164) [0x954e14]
 13: (PGBackend::objects_list_range(hobject_t const&, hobject_t const&, snapid_t, std::vector<hobject_t, std::allocator<hobject_t> >*, std::vector<ghobject_t, std::allocator<ghobject_t>
 >*)+0x106) [0x8cb496]
 14: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x1df) [0x7dd0df]
 15: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x4c2) [0x7dd8e2]
 16: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbe) [0x6da9ce]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0xbb5f16]
 18: (ThreadPool::WorkThread::entry()+0x10) [0xbb6fa0]
 19: (()+0x7df5) [0x7f02938eedf5]
 20: (clone()+0x6d) [0x7f02923d11ad]

This is not a problem, but I don't know if it is possible to get like a message the disk failed instead of this, because this looked like a ceph issue at first ?