Actions
Bug #11527
closedKV OSD stacktrace on disk failure
Status:
Closed
Priority:
Low
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
4 - irritation
Reviewed:
Description
When an OSD failed, this was the stacktrace I got:
-6> 2015-05-01 15:10:28.472385 7f02820f3700 1 -- 10.141.16.14:6846/1003323 <== osd.17 10.143.16.11:0/5157 110160 ==== osd_ping(ping e3296 stamp 2015-05-01 15:10:28.471824) v2 ==== 47+0+0 (3225483472 0 0) 0x1ed0c400 con 0xe8e61a0 -5> 2015-05-01 15:10:28.472491 7f02820f3700 1 -- 10.141.16.14:6846/1003323 --> 10.143.16.11:0/5157 -- osd_ping(ping_reply e3296 stamp 2015-05-01 15:10:28.471824) v2 -- ?+0 0x1a48a2 00 con 0xe8e61a0 -4> 2015-05-01 15:10:28.474514 7f02808f0700 1 -- 10.143.16.14:6849/1003323 <== osd.89 10.143.16.15:0/3407 110135 ==== osd_ping(ping e3296 stamp 2015-05-01 15:10:28.473849) v2 ==== 47+0+0 (605194218 0 0) 0x2d552c00 con 0xe8e4780 -3> 2015-05-01 15:10:28.474548 7f02808f0700 1 -- 10.143.16.14:6849/1003323 --> 10.143.16.15:0/3407 -- osd_ping(ping_reply e3296 stamp 2015-05-01 15:10:28.473849) v2 -- ?+0 0x13daf2 00 con 0xe8e4780 -2> 2015-05-01 15:10:28.474558 7f02820f3700 1 -- 10.141.16.14:6846/1003323 <== osd.89 10.143.16.15:0/3407 110135 ==== osd_ping(ping e3296 stamp 2015-05-01 15:10:28.473849) v2 ==== 47+0+0 (605194218 0 0) 0x23ab1200 con 0xe8e23c0 -1> 2015-05-01 15:10:28.474590 7f02820f3700 1 -- 10.141.16.14:6846/1003323 --> 10.143.16.15:0/3407 -- osd_ping(ping_reply e3296 stamp 2015-05-01 15:10:28.473849) v2 -- ?+0 0x1ed0c4 00 con 0xe8e23c0 0> 2015-05-01 15:10:28.475037 7f02768dc700 -1 *** Caught signal (Bus error) ** in thread 7f02768dc700 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: /usr/bin/ceph-osd() [0xac51f2] 2: (()+0xf130) [0x7f02938f6130] 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x233) [0x7f0294510733] 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x276) [0x7f02945118a6] 5: (()+0x3acd0) [0x7f0294513cd0] 6: (()+0x3b071) [0x7f0294514071] 7: (()+0x38028) [0x7f0294511028] 8: (()+0x21a45) [0x7f02944faa45] 9: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound(std::string const&, std::string const&)+0x49) [0x96a4d9] 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x907) [0xa8d777] 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x239) [0x930b69] 12: (KeyValueStore::collection_list_range(coll_t, ghobject_t, ghobject_t, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*)+0x164) [0x954e14] 13: (PGBackend::objects_list_range(hobject_t const&, hobject_t const&, snapid_t, std::vector<hobject_t, std::allocator<hobject_t> >*, std::vector<ghobject_t, std::allocator<ghobject_t> >*)+0x106) [0x8cb496] 14: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x1df) [0x7dd0df] 15: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x4c2) [0x7dd8e2] 16: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbe) [0x6da9ce] 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0xbb5f16] 18: (ThreadPool::WorkThread::entry()+0x10) [0xbb6fa0] 19: (()+0x7df5) [0x7f02938eedf5] 20: (clone()+0x6d) [0x7f02923d11ad]
This is not a problem, but I don't know if it is possible to get like a message the disk failed instead of this, because this looked like a ceph issue at first ?
Updated by Haomai Wang about 9 years ago
From the "Bus Error" msg, I more like to think it's hardware IO error?
Updated by Kenneth Waegeman about 9 years ago
Yes indeed, I should have been more clear, it was indeed a disk hardware failure, the disk needed replacement.
I was just thinking if it was possible to throw another error than a stack trace that looks like something is wrong with LevelDB :)
Updated by Haomai Wang about 9 years ago
- Status changed from New to Closed
Hmm, I thin it may need more attempts from ceph itself. But I still has no sense about this
Actions