Actions
Bug #22678
closedblock checksum mismatch from rocksdb
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi
There seems to be a crash bug in the Luminous OSD code which causes OSDs to crash.
Jan 15 15:54:43 pve ceph-osd[29759]: 2018-01-15 15:54:43.557716 7f683157e700 -1 abort: Corruption: block checksum mismatch*** Caught signal (Aborted) ** Jan 15 15:54:43 pve ceph-osd[29759]: in thread 7f683157e700 thread_name:tp_osd_tp Jan 15 15:54:43 pve ceph-osd[29759]: ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) Jan 15 15:54:43 pve ceph-osd[29759]: 1: (()+0xa16664) [0x5626f6077664] Jan 15 15:54:43 pve ceph-osd[29759]: 2: (()+0x110c0) [0x7f684996f0c0] Jan 15 15:54:43 pve ceph-osd[29759]: 3: (gsignal()+0xcf) [0x7f6848936fcf] Jan 15 15:54:43 pve ceph-osd[29759]: 4: (abort()+0x16a) [0x7f68489383fa] Jan 15 15:54:43 pve ceph-osd[29759]: 5: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer: :list*)+0x29f) [0x5626f5fb595f] Jan 15 15:54:43 pve ceph-osd[29759]: 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x5626f5f392ae] Jan 15 15:54:43 pve ceph-osd[29759]: 7: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0xfc) [0x5626f5f64a9c] Jan 15 15:54:43 pve ceph-osd[29759]: 8: (ECBackend::handle_sub_read(pg_shard_t, ECSubRead const&, ECSubReadReply*, ZTracer::Trace const&)+0x239) [0x5626f5df1209] Jan 15 15:54:43 pve ceph-osd[29759]: 9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x50d) [0x5626f5df29cd] Jan 15 15:54:43 pve ceph-osd[29759]: 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5626f5cd1be0] Jan 15 15:54:43 pve ceph-osd[29759]: 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x503) [0x5626f5c37a73] Jan 15 15:54:43 pve ceph-osd[29759]: 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x5626f5ab59eb] Jan 15 15:54:43 pve ceph-osd[29759]: 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x5626f5d53eba] Jan 15 15:54:43 pve ceph-osd[29759]: 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x5626f5adcf4d] Jan 15 15:54:43 pve ceph-osd[29759]: 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x5626f60c406f] Jan 15 15:54:43 pve ceph-osd[29759]: 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5626f60c7370] Jan 15 15:54:43 pve ceph-osd[29759]: 17: (()+0x7494) [0x7f6849965494] Jan 15 15:54:43 pve ceph-osd[29759]: 18: (clone()+0x3f) [0x7f68489ecaff] Jan 15 15:54:43 pve ceph-osd[29759]: 2018-01-15 15:54:43.562224 7f683157e700 -1 *** Caught signal (Aborted) ** Jan 15 15:54:43 pve ceph-osd[29759]: in thread 7f683157e700 thread_name:tp_osd_tp Jan 15 15:54:43 pve ceph-osd[29759]: ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) Jan 15 15:54:43 pve ceph-osd[29759]: 1: (()+0xa16664) [0x5626f6077664] Jan 15 15:54:43 pve ceph-osd[29759]: 2: (()+0x110c0) [0x7f684996f0c0] Jan 15 15:54:43 pve ceph-osd[29759]: 3: (gsignal()+0xcf) [0x7f6848936fcf] Jan 15 15:54:43 pve ceph-osd[29759]: 4: (abort()+0x16a) [0x7f68489383fa] Jan 15 15:54:43 pve ceph-osd[29759]: 5: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer: :list*)+0x29f) [0x5626f5fb595f] Jan 15 15:54:43 pve ceph-osd[29759]: 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x5626f5f392ae] Jan 15 15:54:43 pve ceph-osd[29759]: 7: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0xfc) [0x5626f5f64a9c] Jan 15 15:54:43 pve ceph-osd[29759]: 8: (ECBackend::handle_sub_read(pg_shard_t, ECSubRead const&, ECSubReadReply*, ZTracer::Trace const&)+0x239) [0x5626f5df1209] Jan 15 15:54:43 pve ceph-osd[29759]: 9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x50d) [0x5626f5df29cd] Jan 15 15:54:43 pve ceph-osd[29759]: 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5626f5cd1be0] Jan 15 15:54:43 pve ceph-osd[29759]: 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x503) [0x5626f5c37a73] Jan 15 15:54:43 pve ceph-osd[29759]: 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x5626f5ab59eb] Jan 15 15:54:43 pve ceph-osd[29759]: 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x5626f5d53eba] Jan 15 15:54:43 pve ceph-osd[29759]: 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x5626f5adcf4d] Jan 15 15:54:43 pve ceph-osd[29759]: 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x5626f60c406f] Jan 15 15:54:43 pve ceph-osd[29759]: 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5626f60c7370] Jan 15 15:54:43 pve ceph-osd[29759]: 17: (()+0x7494) [0x7f6849965494] Jan 15 15:54:43 pve ceph-osd[29759]: 18: (clone()+0x3f) [0x7f68489ecaff] Jan 15 15:54:43 pve ceph-osd[29759]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Jan 15 15:54:43 pve ceph-osd[29759]: -1> 2018-01-15 15:54:43.557716 7f683157e700 -1 abort: Corruption: block checksum mismatch Jan 15 15:54:43 pve ceph-osd[29759]: 0> 2018-01-15 15:54:43.562224 7f683157e700 -1 *** Caught signal (Aborted) ** Jan 15 15:54:43 pve ceph-osd[29759]: in thread 7f683157e700 thread_name:tp_osd_tp Jan 15 15:54:43 pve ceph-osd[29759]: ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) Jan 15 15:54:43 pve ceph-osd[29759]: 1: (()+0xa16664) [0x5626f6077664] Jan 15 15:54:43 pve ceph-osd[29759]: 2: (()+0x110c0) [0x7f684996f0c0] Jan 15 15:54:43 pve ceph-osd[29759]: 3: (gsignal()+0xcf) [0x7f6848936fcf] Jan 15 15:54:43 pve ceph-osd[29759]: 4: (abort()+0x16a) [0x7f68489383fa] Jan 15 15:54:43 pve ceph-osd[29759]: 5: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::list*)+0x29f) [0x5626f5fb595f] Jan 15 15:54:43 pve ceph-osd[29759]: 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x5626f5f392ae] Jan 15 15:54:43 pve ceph-osd[29759]: 7: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0xfc) [0x5626f5f64a9c] Jan 15 15:54:43 pve ceph-osd[29759]: 8: (ECBackend::handle_sub_read(pg_shard_t, ECSubRead const&, ECSubReadReply*, ZTracer::Trace const&)+0x239) [0x5626f5df1209] Jan 15 15:54:43 pve ceph-osd[29759]: 9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x50d) [0x5626f5df29cd] Jan 15 15:54:43 pve ceph-osd[29759]: 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5626f5cd1be0] Jan 15 15:54:43 pve ceph-osd[29759]: 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x503) [0x5626f5c37a73] Jan 15 15:54:43 pve ceph-osd[29759]: 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x5626f5ab59eb] Jan 15 15:54:43 pve ceph-osd[29759]: 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x5626f5d53eba] Jan 15 15:54:43 pve ceph-osd[29759]: 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x5626f5adcf4d] Jan 15 15:54:43 pve ceph-osd[29759]: 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x5626f60c406f] Jan 15 15:54:43 pve ceph-osd[29759]: 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5626f60c7370] Jan 15 15:54:43 pve ceph-osd[29759]: 17: (()+0x7494) [0x7f6849965494] Jan 15 15:54:43 pve ceph-osd[29759]: 18: (clone()+0x3f) [0x7f68489ecaff] Jan 15 15:54:43 pve ceph-osd[29759]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I can provide a copy of the dump file if needed but it will not fit inside 1000KB
Happy to provide whatever I can in the way of other detail.
Thanks
Mike
Actions