Bug #22102: BlueStore crashed on rocksdb checksum mismatch - bluestore - Ceph

Actions

Copy link

Bug #22102

closed

BlueStore crashed on rocksdb checksum mismatch

Added by Artemy Kapitula over 6 years ago. Updated almost 6 years ago.

Status:

Won't Fix

Priority:

Urgent

Assignee:

Target version:

Ceph - v12.2.1

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Bluestore crashed in checksum mismatch:

ноя 10 09:53:59 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 2017-11-10 09:53:59.710381 7f1aab83ad80 -1 osd.10 1977 log_to_monitors {default=true}
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 2017-11-10 09:58:51.250054 7f1a8d2e9700 -1 abort: Corruption: block checksum mismatch
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: * Caught signal (Aborted) *
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: in thread 7f1a8d2e9700 thread_name:tp_osd_tp
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 1: (()+0xa1e5f1) [0x5646f96065f1]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 2: (()+0xf130) [0x7f1aa9139130]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 3: (gsignal()+0x37) [0x7f1aa81645d7]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 4: (abort()+0x148) [0x7f1aa8165cc8]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 5: (RocksDBStore::get(std::string const&, std::string const&, ceph::buffer::list)+0x1c7) [0x5646f9560997]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 6: (()+0x8c1fa1) [0x5646f94a9fa1]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 7: (()+0x8c0f8f) [0x5646f94a8f8f]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 8: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x3bf) [0x5646f950270f]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 9: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x293) [0x5646f9513f03]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 10: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x61a) [0x5646f9516c7a]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 11: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int, ScrubMap::object&, ThreadPool::TPHandle&)+0x247) [0x5646f9389c87]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 12: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> > const&, bool, unsigned int, ThreadPool::TPHandle&)+0x290) [0x5646f92c5860]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x215) [0x5646f9174525]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 14: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x5e6) [0x5646f9174e16]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 15: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x720) [0x5646f92311d0]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x5646f90c1229]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 17: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x5646f93338d7]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x5646f90ec86e]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x5646f964a9a9]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 20: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5646f964c940]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 21: (()+0x7df5) [0x7f1aa9131df5]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 22: (clone()+0x6d) [0x7f1aa82251ad]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon¹⁶⁶⁸⁴: 2017-11-10 09:58:51.262719 7f1a8d2e9700 -1 Caught signal (Aborted) *

The same problem is in ceph-bluestore-tool:

-19> 2017-11-10 11:22:38.495387 7fc0da37bd80  4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/version_set.cc:2859] Recovered from manifest file:db/MANIFEST-003120 succeeded,manifest_file_number is 3120, next_file_number is 3122, last_sequence is 169079700, log_number is 0,prev_log_number is 0,max_column_family is 0

-18> 2017-11-10 11:22:38.495409 7fc0da37bd80  4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/version_set.cc:2867] Column family [default] (ID 0), log number is 3119

-17> 2017-11-10 11:22:38.495571 7fc0da37bd80  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510302158495559, "job": 1, "event": "recovery_started", "log_files": [3121]}
   -16> 2017-11-10 11:22:38.495578 7fc0da37bd80  4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:482] Recovering log #3121 mode 0
   -15> 2017-11-10 11:22:38.602279 7fc0da37bd80  5 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:815] [default] [WriteLevel0TableForRecovery] Level-0 table #3122: started
   -14> 2017-11-10 11:22:38.666667 7fc0da37bd80  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510302158666655, "cf_name": "default", "job": 1, "event": "table_file_creation", "file_number": 3122, "file_size": 3335725, "table_properties": {"data_size": 3264496, "index_size": 23355, "filter_size": 46890, "raw_key_size": 518611, "raw_average_key_size": 32, "raw_value_size": 2842391, "raw_average_value_size": 178, "num_data_blocks": 720, "num_entries": 15906, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "20", "kMergeOperands": "5"}}
   -13> 2017-11-10 11:22:38.666679 7fc0da37bd80  5 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:847] [default] [WriteLevel0TableForRecovery] Level-0 table #3122: 3335725 bytes OK
   -12> 2017-11-10 11:22:38.666719 7fc0da37bd80  4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/version_set.cc:2395] Creating manifest 3123

11> 2017-11-10 11:22:38.673343 7fc0da37bd80  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510302158673341, "job": 1, "event": "recovery_finished"}
   -10> 2017-11-10 11:22:38.673398 7fc0da37bd80  5 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_files.cc:307] [JOB 2] Delete db//MANIFEST-003120 type=3 #3120 - OK

9> 2017-11-10 11:22:38.673404 7fc0da37bd80  5 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_files.cc:307] [JOB 2] Delete db//003121.log type=0 #3121 - OK

8> 2017-11-10 11:22:38.699144 7fc0da37bd80  4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:1063] DB pointer 0x55b761a2c000
    -7> 2017-11-10 11:22:38.699167 7fc0da37bd80  1 bluestore(/var/lib/ceph/osd/dpro63-10) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
    -6> 2017-11-10 11:22:38.788505 7fc0da37bd80  1 freelist init
    -5> 2017-11-10 11:22:38.847417 7fc0da37bd80  1 bluestore(/var/lib/ceph/osd/dpro63-10) _open_alloc opening allocation metadata
    -4> 2017-11-10 11:23:17.983427 7fc0da37bd80  1 bluestore(/var/lib/ceph/osd/dpro63-10) _open_alloc loaded 4241 G in 225 extents
    -3> 2017-11-10 11:23:18.768340 7fc0da37bd80  1 bluefs fsck
    -2> 2017-11-10 11:23:18.768355 7fc0da37bd80  1 bluestore(/var/lib/ceph/osd/dpro63-10) fsck walking object keyspace
    -1> 2017-11-10 11:24:20.957795 7fc0da37bd80 -1 abort: Corruption: block checksum mismatch
     0> 2017-11-10 11:24:20.959164 7fc0da37bd80 -1 ** Caught signal (Aborted) *
 in thread 7fc0da37bd80 thread_name:ceph-bluestore

ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
 1: (()+0x3dcdd1) [0x55b7602bbdd1]
 2: (()+0xf130) [0x7fc0cf400130]
 3: (gsignal()+0x37) [0x7fc0ce0095d7]
 4: (abort()+0x148) [0x7fc0ce00acc8]
 5: (RocksDBStore::get(std::string const&, std::string const&, ceph::buffer::list*)+0x1c7) [0x55b76023b6f7]
 6: (()+0x2ad601) [0x55b76018c601]
 7: (()+0x2ac5ef) [0x55b76018b5ef]
 8: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x3bf) [0x55b7601e4d6f]
 9: (BlueStore::fsck(bool)+0x1d2a) [0x55b760207b8a]
 10: (main()+0xa8e) [0x55b760003bee]
 11: (__libc_start_main()+0xf5) [0x7fc0cdff5af5]
 12: (()+0x1b852f) [0x55b76009752f]

The only way to resolve is to destory OSD. Maybe option to destroy broken objects is required

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Sage Weil over 6 years ago

Project changed from Ceph to bluestore
Category deleted (~~OSD~~)

Actions

Copy link

Updated by Sage Weil over 6 years ago

Status changed from New to Need More Info

Have you seen any other instances of this? this is the first time i've heard of this particular crash. It looks like the crc mismatch is in rocksdb, which is a bit concerning.

Actions

Copy link

Updated by Mike O'Connor over 6 years ago

I seem to be getting something like this also, mostly happens when the sytem is under write load. I have created the file mentioned if its needed.

I can create this on demand by simple writing lots of data via cephfs.

---
root@pve:~# ceph --version
ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
---

Jan 10 15:56:31 pve ceph-osd²⁷²²: 2018-01-10 15:56:31.338068
7efe5eac1700 -1 abort: Corruption: block checksum mismatch
Jan 10 15:56:31 pve ceph-osd²⁷²²: * Caught signal (Aborted) *
Jan 10 15:56:31 pve ceph-osd²⁷²²: in thread 7efe5eac1700
thread_name:tp_osd_tp
Jan 10 15:56:31 pve ceph-osd²⁷²²: ceph version 12.2.2
(215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 10 15:56:31 pve ceph-osd²⁷²²: 1: (()+0xa16664) [0x55a8b396b664]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 2: (()+0x110c0) [0x7efe796b70c0]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 3: (gsignal()+0xcf) [0x7efe7867efcf]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 4: (abort()+0x16a) [0x7efe786803fa]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 5:
(RocksDBStore::get(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, char const,
unsigned long, ceph::buffer::list*)+0x29f) [0x55a8b38a995f]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 6:
(BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae)
[0x55a8b382d2ae]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 7:
(BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x55a8b382e326]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 8:
(PGBackend::objects_get_attr(hobject_t const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, ceph::buffer::list*)+0x106) [0x55a8b35bde26]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 9:
(PrimaryLogPG::get_snapset_context(hobject_t const&, bool,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ceph::buffer::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::list> > > const*, bool)+0x3fb) [0x55a8b35081db]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 10:
(PrimaryLogPG::get_object_context(hobject_t const&, bool,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ceph::buffer::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::list> > > const*)+0xc39) [0x55a8b352fec9]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 11:
(PrimaryLogPG::find_object_context(hobject_t const&,
std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x387)
[0x55a8b3533687]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 12:
(PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2214)
[0x55a8b3571694]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 13:
(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0xec6) [0x55a8b352c436]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 14:
(OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab)
[0x55a8b33a99eb]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 15:
(PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x5a) [0x55a8b3647eba]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 16:
(OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x103d) [0x55a8b33d0f4d]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 17:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef)
[0x55a8b39b806f]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 18:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55a8b39bb370]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 19: (()+0x7494) [0x7efe796ad494]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 20: (clone()+0x3f) [0x7efe78734aff]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 2018-01-10 15:56:31.343532
7efe5eac1700 -1 Caught signal (Aborted) *
Jan 10 15:56:31 pve ceph-osd²⁷²²: in thread 7efe5eac1700
thread_name:tp_osd_tp
Jan 10 15:56:31 pve ceph-osd²⁷²²: ceph version 12.2.2
(215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 10 15:56:31 pve ceph-osd²⁷²²: 1: (()+0xa16664) [0x55a8b396b664]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 2: (()+0x110c0) [0x7efe796b70c0]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 3: (gsignal()+0xcf) [0x7efe7867efcf]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 4: (abort()+0x16a) [0x7efe786803fa]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 5:
(RocksDBStore::get(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, char const*,
unsigned long, ceph::buffer::list*)+0x29f) [0x55a8b38a995f]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 6:
(BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae)
[0x55a8b382d2ae]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 7:
(BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x55a8b382e326]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 8:
(PGBackend::objects_get_attr(hobject_t const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, ceph::buffer::list*)+0x106) [0x55a8b35bde26]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 9:
(PrimaryLogPG::get_snapset_context(hobject_t const&, bool,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ceph::buffer::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::list> > > const*, bool)+0x3fb) [0x55a8b35081db]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 10:
(PrimaryLogPG::get_object_context(hobject_t const&, bool,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ceph::buffer::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::list> > > const*)+0xc39) [0x55a8b352fec9]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 11:
(PrimaryLogPG::find_object_context(hobject_t const&,
std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x387)
[0x55a8b3533687]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 12:
(PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2214)
[0x55a8b3571694]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 13:
(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0xec6) [0x55a8b352c436]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 14:
(OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab)
[0x55a8b33a99eb]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 15:
(PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x5a) [0x55a8b3647eba]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 16:
(OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x103d) [0x55a8b33d0f4d]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 17:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef)
[0x55a8b39b806f]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 18:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55a8b39bb370]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 19: (()+0x7494) [0x7efe796ad494]
Jan 10 15:56:31 pve ceph-osd²⁷²²: 20: (clone()+0x3f) [0x7efe78734aff]
Jan 10 15:56:31 pve ceph-osd²⁷²²: NOTE: a copy of the executable, or
`objdump -rdS <executable>` is needed to interpret this.
Jan 10 15:56:31 pve systemd¹: ceph-osd@12.service: Main process
exited, code=killed, status=6/ABRT
Jan 10 15:56:31 pve systemd¹: ceph-osd@12.service: Unit entered failed
state.
Jan 10 15:56:31 pve systemd¹: ceph-osd@12.service: Failed with result
'signal'.
Jan 10 15:56:31 pve kernel: [171262.263294] libceph: osd12 down
Jan 10 15:56:51 pve systemd¹: ceph-osd@12.service: Service hold-off
time over, scheduling restart.
Jan 10 15:56:51 pve systemd¹: Stopped Ceph object storage daemon osd.12.
Jan 10 15:56:51 pve systemd¹: Starting Ceph object storage daemon
osd.12...
Jan 10 15:56:51 pve systemd¹: Started Ceph object storage daemon osd.12.
Jan 10 15:56:51 pve ceph-osd²⁶¹²¹: starting osd.12 at - osd_data
/var/lib/ceph/osd/ceph-12 /var/lib/ceph/osd/ceph-12/journal

Actions

Copy link

Updated by Mike O'Connor over 6 years ago

Sage Weil wrote:

Have you seen any other instances of this? this is the first time i've heard of this particular crash. It looks like the crc mismatch is in rocksdb, which is a bit concerning.

I have reported this issue also, happening with 12.2.2 not just 12.2.1

Actions

Copy link

Updated by Sage Weil over 6 years ago

Related to Bug #22678: block checksum mismatch from rocksdb added

Actions

Copy link

Updated by Sage Weil over 6 years ago

Subject changed from BlueStore crashed on checksum mismatch to BlueStore crashed on rocksdb checksum mismatch
Priority changed from Normal to Urgent

Actions

Copy link

Updated by Sage Weil about 6 years ago

Status changed from Need More Info to In Progress
Assignee set to Sage Weil

full logs at 5e38cf1e-532a-4aa4-8289-5b9e9c59632a

Actions

Copy link

Updated by Sage Weil about 6 years ago

Priority changed from Urgent to Immediate

Actions

Copy link

Updated by Sage Weil about 6 years ago

Have you seen this bug occur since you filed the bug? One other user was seeing it but we've been able to generate logs that capture the event. If this is something you can reproduce that would be extermely helpful.

Also, can you share a bit about your environment? What distro are you using, what version(s), what kind of storage devices, etc.

Actions

Copy link

#10

Updated by Sage Weil about 6 years ago

Status changed from In Progress to Need More Info

Actions

Copy link

#11

Updated by Artemy Kapitula about 6 years ago

Sage Weil wrote:

Have you seen this bug occur since you filed the bug? One other user was seeing it but we've been able to generate logs that capture the event. If this is something you can reproduce that would be extermely helpful.
Also, can you share a bit about your environment? What distro are you using, what version(s), what kind of storage devices, etc.

The problem had happend twice.

The first time is reported, the second time was the same day.

We had not meet this bug again after bug had been reported.

Node is an x86_64 with ECC RAM installed

Storage device is an ordinary HDD.

Separate WAL/DB is not used.

No DM-cache/bcache and other featured had used.

Distro is CentOS7 with EPEL enabled.

Ceph built from Fedora's SRPM.

Actions

Copy link

#12

Updated by Artemy Kapitula about 6 years ago

Sage Weil wrote:

If this is something you can reproduce that would be extermely helpful.

We can't reproduce bug. The cluster is pre-productive, so OSD was rebuilt after a several days.

SMART didn't show any errors.

Actions

Copy link

#13

Updated by rory shcramm about 6 years ago

I seem to be getting something like this as well. Knocked out 8 osd's in our cluster across multiple hosts. We we're able to track it down to 1 osd and are in the process of migrating the data off of it and replacing it.

OS: Ubuntu 16.04.3
Kernel: 4.13.0-32-generic kernel (from linux-image-generic-hwe-16.04).
CPU: 2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
ceph version: 12.2.4 from ceph.com ubuntu apt repo.
osd devices: 2x Intel P3520 nvme's per host (12 hosts total).
pool config: EC pool with k=10, m=2, host fault tolerance, and isa plugin

we Don't have any bluestore debug logs but osd debug output is below. I'm trying to capture a Seg fault with debug bluestore logging on but we havn't hit another seg fault yet. Initialy we were getting seg faults taking down the cluster every 15-30 minutes.

   -28> 2018-03-01 12:29:32.731094 7fea6a728700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.29:6800/3400092 conn(0x56401a25f000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=274 cs=1 l=0). rx osd.115 seq 203 0x563f11140000 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
   -27> 2018-03-01 12:29:32.731124 7fea6a728700  1 -- 172.30.5.32:6800/1162487 <== osd.115 172.30.5.29:6800/3400092 203 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 844065+0+0 (4190074503 0 0) 0x563f11140000 con 0x56401a25f000
   -26> 2018-03-01 12:29:32.732576 7fea6a728700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.34:6802/835637 conn(0x56401dff5800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=98 cs=1 l=0). rx osd.114 seq 143 0x563f400a9180 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
   -25> 2018-03-01 12:29:32.732596 7fea6a728700  1 -- 172.30.5.32:6800/1162487 <== osd.114 172.30.5.34:6802/835637 143 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 844065+0+0 (1177764585 0 0) 0x563f400a9180 con 0x56401dff5800
   -24> 2018-03-01 12:29:32.732646 7fea6af29700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.28:6800/1410559 conn(0x56401aa68800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=140 cs=1 l=0). rx osd.135 seq 259 0x5640248aeec0 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
   -23> 2018-03-01 12:29:32.732668 7fea6af29700  1 -- 172.30.5.32:6800/1162487 <== osd.135 172.30.5.28:6800/1410559 259 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 844065+0+0 (3561801739 0 0) 0x5640248aeec0 con 0x56401aa68800
   -22> 2018-03-01 12:29:32.732799 7fea5070b700  1 -- 172.30.5.32:6800/1162487 --> 172.30.5.30:6800/2857280 -- MOSDECSubOpRead(54.3bs2 15506/15504 ECSubRead(tid=87, to_read={54:dc00384e:::rbd_data.41.6ea73d2ae8944a.00000000000185cd:head=0,839680,0}, attrs_to_read=)) v3 -- 0x56401b0f4000 con 0
   -21> 2018-03-01 12:29:32.732855 7fea5070b700  1 -- 172.30.5.32:6800/1162487 --> 172.30.5.41:6802/664087 -- MOSDECSubOpRead(54.3bs3 15506/15504 ECSubRead(tid=87, to_read={54:dc00384e:::rbd_data.41.6ea73d2ae8944a.00000000000185cd:head=0,839680,0}, attrs_to_read=)) v3 -- 0x56401b0f4500 con 0
   -20> 2018-03-01 12:29:32.732901 7fea5070b700  1 -- 172.30.5.32:6800/1162487 --> 172.30.5.36:6800/400707 -- MOSDECSubOpRead(54.3bs5 15506/15504 ECSubRead(tid=87, to_read={54:dc00384e:::rbd_data.41.6ea73d2ae8944a.00000000000185cd:head=0,839680,0}, attrs_to_read=)) v3 -- 0x5640148d7e00 con 0
   -19> 2018-03-01 12:29:32.732928 7fea5070b700  1 -- 172.30.5.32:6800/1162487 --> 172.30.5.32:6800/1162487 -- MOSDECSubOpRead(54.3bs0 15506/15504 ECSubRead(tid=87, to_read={54:dc00384e:::rbd_data.41.6ea73d2ae8944a.00000000000185cd:head=0,839680,0}, attrs_to_read=)) v3 -- 0x563f47742080 con 0
   -18> 2018-03-01 12:29:32.732957 7fea5070b700  1 -- 172.30.5.32:6800/1162487 --> 172.30.5.38:6802/192689 -- MOSDECSubOpRead(54.3bs6 15506/15504 ECSubRead(tid=87, to_read={54:dc00384e:::rbd_data.41.6ea73d2ae8944a.00000000000185cd:head=0,839680,0}, attrs_to_read=)) v3 -- 0x563f47742300 con 0
   -17> 2018-03-01 12:29:32.732984 7fea5070b700  1 -- 172.30.5.32:6800/1162487 --> 172.30.5.33:6800/898853 -- MOSDECSubOpRead(54.3bs1 15506/15504 ECSubRead(tid=87, to_read={54:dc00384e:::rbd_data.41.6ea73d2ae8944a.00000000000185cd:head=0,839680,0}, attrs_to_read=)) v3 -- 0x563f47742580 con 0
   -16> 2018-03-01 12:29:32.733020 7fea5070b700  1 -- 172.30.5.32:6800/1162487 --> 172.30.5.37:6802/346289 -- MOSDECSubOpRead(54.3bs8 15506/15504 ECSubRead(tid=87, to_read={54:dc00384e:::rbd_data.41.6ea73d2ae8944a.00000000000185cd:head=0,839680,0}, attrs_to_read=)) v3 -- 0x563f47742800 con 0
   -15> 2018-03-01 12:29:32.733008 7fea5ffa4700  1 -- 172.30.5.32:6800/1162487 <== osd.121 172.30.5.32:6800/1162487 0 ==== MOSDECSubOpRead(54.3bs0 15506/15504 ECSubRead(tid=87, to_read={54:dc00384e:::rbd_data.41.6ea73d2ae8944a.00000000000185cd:head=0,839680,0}, attrs_to_read=)) v3 ==== 0+0+0 (0 0 0) 0x563f47742080 con 0x563ef9dcf800
   -14> 2018-03-01 12:29:32.734684 7fea54713700  1 -- 172.30.5.32:6800/1162487 --> 172.30.5.32:6800/1162487 -- MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 -- 0x563f400a9180 con 0
   -13> 2018-03-01 12:29:32.734742 7fea5ffa4700  1 -- 172.30.5.32:6800/1162487 <== osd.121 172.30.5.32:6800/1162487 0 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 0+0+0 (0 0 0) 0x563f400a9180 con 0x563ef9dcf800
   -12> 2018-03-01 12:29:32.736701 7fea6af29700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.36:6800/400707 conn(0x56401a829800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=154 cs=1 l=0). rx osd.120 seq 124 0x5640248af180 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
   -11> 2018-03-01 12:29:32.736729 7fea6af29700  1 -- 172.30.5.32:6800/1162487 <== osd.120 172.30.5.36:6800/400707 124 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 422079+0+0 (431288731 0 0) 0x5640248af180 con 0x56401a829800
   -10> 2018-03-01 12:29:32.737524 7fea6a728700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.37:6802/346289 conn(0x56401aa67000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=158 cs=1 l=0). rx osd.129 seq 159 0x563ff74b0100 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
    -9> 2018-03-01 12:29:32.737546 7fea6a728700  1 -- 172.30.5.32:6800/1162487 <== osd.129 172.30.5.37:6802/346289 159 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 422079+0+0 (3341526987 0 0) 0x563ff74b0100 con 0x56401aa67000
    -8> 2018-03-01 12:29:32.737563 7fea6af29700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.38:6802/192689 conn(0x56401a260800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=178 cs=1 l=0). rx osd.124 seq 216 0x5640248afc80 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
    -7> 2018-03-01 12:29:32.737577 7fea6af29700  1 -- 172.30.5.32:6800/1162487 <== osd.124 172.30.5.38:6802/192689 216 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 422079+0+0 (3434450348 0 0) 0x5640248afc80 con 0x56401a260800
    -6> 2018-03-01 12:29:32.738412 7fea6af29700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.33:6800/898853 conn(0x56401a5f8800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=100 cs=1 l=0). rx osd.127 seq 155 0x563ef9e2adc0 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
    -5> 2018-03-01 12:29:32.738430 7fea6af29700  1 -- 172.30.5.32:6800/1162487 <== osd.127 172.30.5.33:6800/898853 155 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 422079+0+0 (1198211421 0 0) 0x563ef9e2adc0 con 0x56401a5f8800
    -4> 2018-03-01 12:29:32.746551 7fea6af29700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.30:6800/2857280 conn(0x56401dff7000 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=19 cs=1 l=0). rx osd.118 seq 116 0x563ffd95c000 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
    -3> 2018-03-01 12:29:32.746583 7fea6af29700  1 -- 172.30.5.32:6800/1162487 <== osd.118 172.30.5.30:6800/2857280 116 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 422079+0+0 (3559853294 0 0) 0x563ffd95c000 con 0x56401dff7000
    -2> 2018-03-01 12:29:32.746686 7fea6a728700  5 -- 172.30.5.32:6800/1162487 >> 172.30.5.41:6802/664087 conn(0x56401f414000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=15 cs=1 l=0). rx osd.119 seq 106 0x5640182e2c00 MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2
    -1> 2018-03-01 12:29:32.746711 7fea6a728700  1 -- 172.30.5.32:6800/1162487 <== osd.119 172.30.5.41:6802/664087 106 ==== MOSDECSubOpReadReply(54.3bs0 15506/15504 ECSubReadReply(tid=87, attrs_read=0)) v2 ==== 422079+0+0 (2215858979 0 0) 0x5640182e2c00 con 0x56401f414000
     0> 2018-03-01 12:29:32.753521 7fea5070b700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fea5070b700 thread_name:tp_osd_tp

 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (()+0xa74234) [0x563eef380234]
 2: (()+0x11390) [0x7fea6e2a8390]
 3: (std::__cxx11::list<boost::tuples::tuple<unsigned long, unsigned long, unsigned int, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>, std::allocator<boost::tuples::tuple<unsigned long, unsigned long, unsigned int, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type> > >::list(std::__cxx11::list<boost::tuples::tuple<unsigned long, unsigned long, unsigned int, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>, std::allocator<boost::tuples::tuple<unsigned long, unsigned long, unsigned int, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type> > > const&)+0x44) [0x563eef102ab4]
 4: (ECBackend::send_all_remaining_reads(hobject_t const&, ECBackend::ReadOp&)+0x308) [0x563eef0ea378]
 5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*, ZTracer::Trace const&)+0x11bb) [0x563eef0ebaeb]
 6: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x1c4) [0x563eef0fe1b4]
 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x563eeefe0ca0]
 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x543) [0x563eeef459d3]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x563eeedbf3b9]
 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x563eef062047]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x130e) [0x563eeede79ae]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x563eef3c8664]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x563eef3cb6a0]
 14: (()+0x76ba) [0x7fea6e29e6ba]
 15: (clone()+0x6d) [0x7fea6d31541d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions

Copy link

#14

Updated by Radoslaw Zarzynski about 6 years ago

@Rory

I can't find find a call to RocksDBStore::get() in the attached trace. The process died also because of SIGSEGV, not SIGABRT, so I think it's something different, most likely in the EC pool implementation.

Anyway, there is a branch wip-bug22102-paranoid-checker-v12.2.4 with paranoid checker to detect the RocksDB corruption just after the write, not on a read that can be far away from the real culprit. This applies to corruptions that are permanent like the one reported by Artemy.

Actions

Copy link

#15

Updated by Radoslaw Zarzynski about 6 years ago

Packages with the paranoid checker are available. It might substantially slow down an OSD, so please be careful.

Actions

Copy link

#16

Updated by Sage Weil about 6 years ago

Current theory: bluefs is not protecting against a file open for read that is deleted. Mark observes that he sees this crash during compaction (when sst files are deleted). The crash is relatively rare because the deallocated spaces needs to not just deleted but also overwritten by new data before we see a crc error.

Added an assert that the file is not open on unlink to test the theory.

Actions

Copy link

#17

Updated by Sage Weil about 6 years ago

Scratch that, mark didn't hit the assert for num_readers ==0 , and the core indicates file isn't deleted.

_read_random is trying ot read past end of file, though:

  -188> 2018-04-04 11:10:07.366 7f43fbde0700 10 bluefs _read_random h 0x5608a02ecd00 0x4043e0f~8d925 from file(ino 3028 size 0x40d1769 mtime 2018-04-04 11:08:59.628239 bdev 1 allocated 4100000 extents [1:0x1bab000000+4100000])

$ printf "%x\n" $((0x4043e0f + 0x8d925))
40d1734

either _read_random should be zeroing the end of its buffer, or the size is wrong?

Actions

Copy link

#18

Updated by Sage Weil about 6 years ago

Hrm, this run got a crc near EOF, but not past it.

2018-04-04 23:03:34.215 7fec45e5f700  0 bluefs _flush_range size increase, file now file(ino 2965 size 0x41ad345 mtime 2018-04-04 23:03:33.799810 bdev 1 allocated 4200000 extents [1:0x3936e00000+4200000])
2018-04-04 23:03:34.255 7fec45e5f700  0 bluefs _flush_and_sync_log   op_file_update file(ino 2965 size 0x41ad345 mtime 2018-04-04 23:03:34.216706 bdev 1 allocated 4200000 extents [1:0x3936e00000+4200000])
...
  -747> 2018-04-04 23:03:50.271 7fec3363a700 10 bluefs _read_random h 0x56410e79c900 0x40d572b~d7be5 from file(ino 2965 size 0x41ad345 mtime 2018-04-04 23:03:34.216706 bdev 1 allocated 4200000 extents [1:0x3936e00000+4200000])

but

$ printf "%x\n" $((0x40d572b + 0xd7be5))
41ad310

so, that's 0x35 short of eof, and we have a crc error.

A full bluefs log might give us some clue about the final write/append?

Actions

Copy link

#19

Updated by Sage Weil about 6 years ago

Update:

- The bad data appears to be in the buffer immediately after the pread syscall
- pread is returning the full byte count (it's not a "short" read)
- usually the zeros are at teh end of the buffer, but once we saw it in 3 distinct ranges within the buffer
- the zeros are always block aligned
- doing a memset prior to pread appears to prevent it from happening, although it is hard to be certain because it takes hours to days to reproduce.
- not able to reproduce on incerta

Current best guess is that it is related to swap. The machine where we are reproducing has memory pressure and swap; the incerta node has gobs of memory and is not swapping.

mocha (machine where mark can reproduce) is kernel 4.10.0-42-generic.

Actions

Copy link

#20

Updated by Sage Weil about 6 years ago

Artemy, is it possible the machine where you saw this was swapping?

Actions

Copy link

#21

Updated by Sergey Malinin about 6 years ago

Sorry for interrupting, (I provided related info in comments to issue 22678) but I must to add that no error has appeared in my cluster for a week now since swap usage on affected host went down from 97%.

Actions

Copy link

#22

Updated by Artemy Kapitula about 6 years ago

Sage Weil wrote:

Artemy, is it possible the machine where you saw this was swapping?

Yes, system was in a swap state (nearly 25% of RAM had been swapped)

Actions

Copy link

#23

Updated by Sage Weil about 6 years ago

Related to Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block) added

Actions

Copy link

#24

Updated by Sage Weil almost 6 years ago

Artemy Kapitula wrote:

Sage Weil wrote:

Artemy, is it possible the machine where you saw this was swapping?

Yes, system was in a swap state (nearly 25% of RAM had been swapped)

One other question: did you try rebooting with the same kernel before? (Just hoping for some indication that it wasn't the reboot that made the issue go away.)

Actions

Copy link

#25

Updated by Artemy Kapitula almost 6 years ago

Sage Weil wrote:

Artemy Kapitula wrote:

Sage Weil wrote:

Artemy, is it possible the machine where you saw this was swapping?

Yes, system was in a swap state (nearly 25% of RAM had been swapped)

One other question: did you try rebooting with the same kernel before? (Just hoping for some indication that it wasn't the reboot that made the issue go away.)

The same kernel binaries was used for a several weeks on many hosts.

And because this was a test host it was rebooted.

After reboot issue doesn't go away of course.

So we just had been forced to recreate OSD because bluestore recovery tools crashed with the similar error.

Actions

Copy link

#26

Updated by Sage Weil almost 6 years ago

Assignee deleted (~~Sage Weil~~)
Priority changed from Immediate to Urgent

downgrading the priority here:

- not data loss, "just" osd crash
- appears to be an issue with the kernel, not ceph

Actions

Copy link

#27

Updated by Sage Weil almost 6 years ago

Artemy: which cento srelease and kernel version is it?

Actions

Copy link

#28

Updated by Artemy Kapitula almost 6 years ago

Sage Weil wrote:

Artemy: which cento srelease and kernel version is it?

CentOS 7.2 with custom kernel (a rebuilt SRPM of Fedora kernel 4.13.9)

Actions

Copy link

#29

Updated by Sage Weil almost 6 years ago

Status changed from Need More Info to Won't Fix

This appears to be a kernel bug related to swapping.

So far no indication it affects distro kernels.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Custom queries

Bug #22102

BlueStore crashed on rocksdb checksum mismatch

Updated by Sage Weil over 6 years ago

Updated by Sage Weil over 6 years ago

Updated by Mike O'Connor over 6 years ago

Updated by Mike O'Connor over 6 years ago

Updated by Sage Weil over 6 years ago

Updated by Sage Weil over 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Artemy Kapitula about 6 years ago

Updated by Artemy Kapitula about 6 years ago

Updated by rory shcramm about 6 years ago

Updated by Radoslaw Zarzynski about 6 years ago

Updated by Radoslaw Zarzynski about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sergey Malinin about 6 years ago

Updated by Artemy Kapitula about 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Sage Weil almost 6 years ago

Updated by Artemy Kapitula almost 6 years ago

Updated by Sage Weil almost 6 years ago

Updated by Sage Weil almost 6 years ago

Updated by Artemy Kapitula almost 6 years ago

Updated by Sage Weil almost 6 years ago