Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2018-01-15T07:21:01Z
Ceph
Redmine
bluestore - Bug #22678 (Duplicate): block checksum mismatch from rocksdb
https://tracker.ceph.com/issues/22678
2018-01-15T07:21:01Z
Mike O'Connor
<p>Hi<br />There seems to be a crash bug in the Luminous OSD code which causes OSDs to crash.</p>
<pre>
Jan 15 15:54:43 pve ceph-osd[29759]: 2018-01-15 15:54:43.557716 7f683157e700 -1 abort: Corruption: block checksum mismatch*** Caught signal (Aborted) **
Jan 15 15:54:43 pve ceph-osd[29759]: in thread 7f683157e700 thread_name:tp_osd_tp
Jan 15 15:54:43 pve ceph-osd[29759]: ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 15 15:54:43 pve ceph-osd[29759]: 1: (()+0xa16664) [0x5626f6077664]
Jan 15 15:54:43 pve ceph-osd[29759]: 2: (()+0x110c0) [0x7f684996f0c0]
Jan 15 15:54:43 pve ceph-osd[29759]: 3: (gsignal()+0xcf) [0x7f6848936fcf]
Jan 15 15:54:43 pve ceph-osd[29759]: 4: (abort()+0x16a) [0x7f68489383fa]
Jan 15 15:54:43 pve ceph-osd[29759]: 5: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer:
:list*)+0x29f) [0x5626f5fb595f]
Jan 15 15:54:43 pve ceph-osd[29759]: 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x5626f5f392ae]
Jan 15 15:54:43 pve ceph-osd[29759]: 7: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned
int)+0xfc) [0x5626f5f64a9c]
Jan 15 15:54:43 pve ceph-osd[29759]: 8: (ECBackend::handle_sub_read(pg_shard_t, ECSubRead const&, ECSubReadReply*, ZTracer::Trace const&)+0x239) [0x5626f5df1209]
Jan 15 15:54:43 pve ceph-osd[29759]: 9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x50d) [0x5626f5df29cd]
Jan 15 15:54:43 pve ceph-osd[29759]: 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5626f5cd1be0]
Jan 15 15:54:43 pve ceph-osd[29759]: 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x503) [0x5626f5c37a73]
Jan 15 15:54:43 pve ceph-osd[29759]: 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x5626f5ab59eb]
Jan 15 15:54:43 pve ceph-osd[29759]: 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x5626f5d53eba]
Jan 15 15:54:43 pve ceph-osd[29759]: 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x5626f5adcf4d]
Jan 15 15:54:43 pve ceph-osd[29759]: 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x5626f60c406f]
Jan 15 15:54:43 pve ceph-osd[29759]: 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5626f60c7370]
Jan 15 15:54:43 pve ceph-osd[29759]: 17: (()+0x7494) [0x7f6849965494]
Jan 15 15:54:43 pve ceph-osd[29759]: 18: (clone()+0x3f) [0x7f68489ecaff]
Jan 15 15:54:43 pve ceph-osd[29759]: 2018-01-15 15:54:43.562224 7f683157e700 -1 *** Caught signal (Aborted) **
Jan 15 15:54:43 pve ceph-osd[29759]: in thread 7f683157e700 thread_name:tp_osd_tp
Jan 15 15:54:43 pve ceph-osd[29759]: ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 15 15:54:43 pve ceph-osd[29759]: 1: (()+0xa16664) [0x5626f6077664]
Jan 15 15:54:43 pve ceph-osd[29759]: 2: (()+0x110c0) [0x7f684996f0c0]
Jan 15 15:54:43 pve ceph-osd[29759]: 3: (gsignal()+0xcf) [0x7f6848936fcf]
Jan 15 15:54:43 pve ceph-osd[29759]: 4: (abort()+0x16a) [0x7f68489383fa]
Jan 15 15:54:43 pve ceph-osd[29759]: 5: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer:
:list*)+0x29f) [0x5626f5fb595f]
Jan 15 15:54:43 pve ceph-osd[29759]: 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x5626f5f392ae]
Jan 15 15:54:43 pve ceph-osd[29759]: 7: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned
int)+0xfc) [0x5626f5f64a9c]
Jan 15 15:54:43 pve ceph-osd[29759]: 8: (ECBackend::handle_sub_read(pg_shard_t, ECSubRead const&, ECSubReadReply*, ZTracer::Trace const&)+0x239) [0x5626f5df1209]
Jan 15 15:54:43 pve ceph-osd[29759]: 9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x50d) [0x5626f5df29cd]
Jan 15 15:54:43 pve ceph-osd[29759]: 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5626f5cd1be0]
Jan 15 15:54:43 pve ceph-osd[29759]: 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x503) [0x5626f5c37a73]
Jan 15 15:54:43 pve ceph-osd[29759]: 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x5626f5ab59eb]
Jan 15 15:54:43 pve ceph-osd[29759]: 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x5626f5d53eba]
Jan 15 15:54:43 pve ceph-osd[29759]: 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x5626f5adcf4d]
Jan 15 15:54:43 pve ceph-osd[29759]: 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x5626f60c406f]
Jan 15 15:54:43 pve ceph-osd[29759]: 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5626f60c7370]
Jan 15 15:54:43 pve ceph-osd[29759]: 17: (()+0x7494) [0x7f6849965494]
Jan 15 15:54:43 pve ceph-osd[29759]: 18: (clone()+0x3f) [0x7f68489ecaff]
Jan 15 15:54:43 pve ceph-osd[29759]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jan 15 15:54:43 pve ceph-osd[29759]: -1> 2018-01-15 15:54:43.557716 7f683157e700 -1 abort: Corruption: block checksum mismatch
Jan 15 15:54:43 pve ceph-osd[29759]: 0> 2018-01-15 15:54:43.562224 7f683157e700 -1 *** Caught signal (Aborted) **
Jan 15 15:54:43 pve ceph-osd[29759]: in thread 7f683157e700 thread_name:tp_osd_tp
Jan 15 15:54:43 pve ceph-osd[29759]: ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 15 15:54:43 pve ceph-osd[29759]: 1: (()+0xa16664) [0x5626f6077664]
Jan 15 15:54:43 pve ceph-osd[29759]: 2: (()+0x110c0) [0x7f684996f0c0]
Jan 15 15:54:43 pve ceph-osd[29759]: 3: (gsignal()+0xcf) [0x7f6848936fcf]
Jan 15 15:54:43 pve ceph-osd[29759]: 4: (abort()+0x16a) [0x7f68489383fa]
Jan 15 15:54:43 pve ceph-osd[29759]: 5: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::list*)+0x29f) [0x5626f5fb595f]
Jan 15 15:54:43 pve ceph-osd[29759]: 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x5626f5f392ae]
Jan 15 15:54:43 pve ceph-osd[29759]: 7: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0xfc) [0x5626f5f64a9c]
Jan 15 15:54:43 pve ceph-osd[29759]: 8: (ECBackend::handle_sub_read(pg_shard_t, ECSubRead const&, ECSubReadReply*, ZTracer::Trace const&)+0x239) [0x5626f5df1209]
Jan 15 15:54:43 pve ceph-osd[29759]: 9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x50d) [0x5626f5df29cd]
Jan 15 15:54:43 pve ceph-osd[29759]: 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5626f5cd1be0]
Jan 15 15:54:43 pve ceph-osd[29759]: 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x503) [0x5626f5c37a73]
Jan 15 15:54:43 pve ceph-osd[29759]: 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x5626f5ab59eb]
Jan 15 15:54:43 pve ceph-osd[29759]: 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x5626f5d53eba]
Jan 15 15:54:43 pve ceph-osd[29759]: 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x5626f5adcf4d]
Jan 15 15:54:43 pve ceph-osd[29759]: 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x5626f60c406f]
Jan 15 15:54:43 pve ceph-osd[29759]: 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5626f60c7370]
Jan 15 15:54:43 pve ceph-osd[29759]: 17: (()+0x7494) [0x7f6849965494]
Jan 15 15:54:43 pve ceph-osd[29759]: 18: (clone()+0x3f) [0x7f68489ecaff]
Jan 15 15:54:43 pve ceph-osd[29759]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre>
<p>I can provide a copy of the dump file if needed but it will not fit inside 1000KB</p>
<p>Happy to provide whatever I can in the way of other detail.</p>
<p>Thanks<br />Mike</p>