Project

General

Profile

Actions

Bug #22102

closed

BlueStore crashed on rocksdb checksum mismatch

Added by Artemy Kapitula over 6 years ago. Updated almost 6 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Bluestore crashed in checksum mismatch:

ноя 10 09:53:59 dpr-2a1713-063-crd rcs-custom-daemon16684: 2017-11-10 09:53:59.710381 7f1aab83ad80 -1 osd.10 1977 log_to_monitors {default=true}
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 2017-11-10 09:58:51.250054 7f1a8d2e9700 -1 abort: Corruption: block checksum mismatch
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: * Caught signal (Aborted) *
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: in thread 7f1a8d2e9700 thread_name:tp_osd_tp
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 1: (()+0xa1e5f1) [0x5646f96065f1]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 2: (()+0xf130) [0x7f1aa9139130]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 3: (gsignal()+0x37) [0x7f1aa81645d7]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 4: (abort()+0x148) [0x7f1aa8165cc8]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 5: (RocksDBStore::get(std::string const&, std::string const&, ceph::buffer::list
)+0x1c7) [0x5646f9560997]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 6: (()+0x8c1fa1) [0x5646f94a9fa1]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 7: (()+0x8c0f8f) [0x5646f94a8f8f]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 8: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x3bf) [0x5646f950270f]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 9: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x293) [0x5646f9513f03]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 10: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x61a) [0x5646f9516c7a]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 11: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int, ScrubMap::object&, ThreadPool::TPHandle&)+0x247) [0x5646f9389c87]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 12: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> > const&, bool, unsigned int, ThreadPool::TPHandle&)+0x290) [0x5646f92c5860]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x215) [0x5646f9174525]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 14: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x5e6) [0x5646f9174e16]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 15: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x720) [0x5646f92311d0]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x5646f90c1229]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 17: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x5646f93338d7]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x5646f90ec86e]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x5646f964a9a9]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 20: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5646f964c940]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 21: (()+0x7df5) [0x7f1aa9131df5]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 22: (clone()+0x6d) [0x7f1aa82251ad]
ноя 10 09:58:51 dpr-2a1713-063-crd rcs-custom-daemon16684: 2017-11-10 09:58:51.262719 7f1a8d2e9700 -1
Caught signal (Aborted) *

The same problem is in ceph-bluestore-tool:

-19> 2017-11-10 11:22:38.495387 7fc0da37bd80  4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/version_set.cc:2859] Recovered from manifest file:db/MANIFEST-003120 succeeded,manifest_file_number is 3120, next_file_number is 3122, last_sequence is 169079700, log_number is 0,prev_log_number is 0,max_column_family is 0
-18> 2017-11-10 11:22:38.495409 7fc0da37bd80  4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/version_set.cc:2867] Column family [default] (ID 0), log number is 3119
-17> 2017-11-10 11:22:38.495571 7fc0da37bd80  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510302158495559, "job": 1, "event": "recovery_started", "log_files": [3121]}
-16> 2017-11-10 11:22:38.495578 7fc0da37bd80 4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:482] Recovering log #3121 mode 0
-15> 2017-11-10 11:22:38.602279 7fc0da37bd80 5 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:815] [default] [WriteLevel0TableForRecovery] Level-0 table #3122: started
-14> 2017-11-10 11:22:38.666667 7fc0da37bd80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510302158666655, "cf_name": "default", "job": 1, "event": "table_file_creation", "file_number": 3122, "file_size": 3335725, "table_properties": {"data_size": 3264496, "index_size": 23355, "filter_size": 46890, "raw_key_size": 518611, "raw_average_key_size": 32, "raw_value_size": 2842391, "raw_average_value_size": 178, "num_data_blocks": 720, "num_entries": 15906, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "20", "kMergeOperands": "5"}}
-13> 2017-11-10 11:22:38.666679 7fc0da37bd80 5 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:847] [default] [WriteLevel0TableForRecovery] Level-0 table #3122: 3335725 bytes OK
-12> 2017-11-10 11:22:38.666719 7fc0da37bd80 4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/version_set.cc:2395] Creating manifest 3123
11> 2017-11-10 11:22:38.673343 7fc0da37bd80  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510302158673341, "job": 1, "event": "recovery_finished"}
-10> 2017-11-10 11:22:38.673398 7fc0da37bd80 5 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_files.cc:307] [JOB 2] Delete db//MANIFEST-003120 type=3 #3120 -
OK
9> 2017-11-10 11:22:38.673404 7fc0da37bd80  5 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_files.cc:307] [JOB 2] Delete db//003121.log type=0 #3121 - OK
8> 2017-11-10 11:22:38.699144 7fc0da37bd80  4 rocksdb: [/root/rpmbuild/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_open.cc:1063] DB pointer 0x55b761a2c000
-7> 2017-11-10 11:22:38.699167 7fc0da37bd80 1 bluestore(/var/lib/ceph/osd/dpro63-10) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
-6> 2017-11-10 11:22:38.788505 7fc0da37bd80 1 freelist init
-5> 2017-11-10 11:22:38.847417 7fc0da37bd80 1 bluestore(/var/lib/ceph/osd/dpro63-10) _open_alloc opening allocation metadata
-4> 2017-11-10 11:23:17.983427 7fc0da37bd80 1 bluestore(/var/lib/ceph/osd/dpro63-10) _open_alloc loaded 4241 G in 225 extents
-3> 2017-11-10 11:23:18.768340 7fc0da37bd80 1 bluefs fsck
-2> 2017-11-10 11:23:18.768355 7fc0da37bd80 1 bluestore(/var/lib/ceph/osd/dpro63-10) fsck walking object keyspace
-1> 2017-11-10 11:24:20.957795 7fc0da37bd80 -1 abort: Corruption: block checksum mismatch
0> 2017-11-10 11:24:20.959164 7fc0da37bd80 -1 ** Caught signal (Aborted) *
in thread 7fc0da37bd80 thread_name:ceph-bluestore
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
1: (()+0x3dcdd1) [0x55b7602bbdd1]
2: (()+0xf130) [0x7fc0cf400130]
3: (gsignal()+0x37) [0x7fc0ce0095d7]
4: (abort()+0x148) [0x7fc0ce00acc8]
5: (RocksDBStore::get(std::string const&, std::string const&, ceph::buffer::list*)+0x1c7) [0x55b76023b6f7]
6: (()+0x2ad601) [0x55b76018c601]
7: (()+0x2ac5ef) [0x55b76018b5ef]
8: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x3bf) [0x55b7601e4d6f]
9: (BlueStore::fsck(bool)+0x1d2a) [0x55b760207b8a]
10: (main()+0xa8e) [0x55b760003bee]
11: (__libc_start_main()+0xf5) [0x7fc0cdff5af5]
12: (()+0x1b852f) [0x55b76009752f]

The only way to resolve is to destory OSD. Maybe option to destroy broken objects is required


Related issues 2 (0 open2 closed)

Related to bluestore - Bug #22678: block checksum mismatch from rocksdbDuplicate01/15/2018

Actions
Related to bluestore - Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)Won't Fix

Actions
Actions

Also available in: Atom PDF