Project

General

Profile

Bug #34526

OSD crash in KernelDevice::direct_read_unaligned while scrubbing

Added by Michael Yang over 5 years ago. Updated about 5 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Luminous 12.2.7
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph version: Luminous 12.2.7

After I migrate one pool date to another pool(by change pool crush rule). I found much osds down when they do pg scrub.
The stack as below:

2018-08-20 14:29:58.310716 7f473f0a0700 -1 ** Caught signal (Aborted) *
in thread 7f473f0a0700 thread_name:tp_osd_tp

ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
1: (()+0xa84aa4) [0x5603e7af5aa4]
2: (()+0x11390) [0x7f475f23b390]
3: (pread64()+0x33) [0x7f475f23ad43]
4: (KernelDevice::direct_read_unaligned(unsigned long, unsigned long, char*)+0x81) [0x5603e7ad0241]
5: (KernelDevice::read_random(unsigned long, unsigned long, char*, bool)+0x563) [0x5603e7ad0d23]
6: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned long, char*)+0x4f2) [0x5603e7aa2312]
7: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x20) [0x5603e7acc1f0]
8: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0xf8f) [0x5603e7f0bf1f]
9: (rocksdb::ReadBlockContents(rocksdb::RandomAccessFileReader*, rocksdb::Footer const&, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockContents*, rocksdb::ImmutableCFOptions const&, bool, rocksdb::Slice const&, rocksdb::PersistentCacheOptions const&)+0x5f3) [0x5603e7edc8f3]
10: (()+0xe5b836) [0x5603e7ecc836]
11: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x2f8) [0x5603e7ece998]
12: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x2ac) [0x5603e7eced9c]
13: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x97) [0x5603e7ed7517]
14: (()+0xe91bbe) [0x5603e7f02bbe]
15: (()+0xe91c86) [0x5603e7f02c86]
16: (()+0xe91e01) [0x5603e7f02e01]
17: (rocksdb::MergingIterator::Next()+0x449) [0x5603e7ee6079]
18: (rocksdb::DBIter::FindNextUserEntryInternal(bool, bool)+0x182) [0x5603e7f832e2]
19: (rocksdb::DBIter::Next()+0x1eb) [0x5603e7f8409b]
20: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x9a) [0x5603e7a322ca]
21: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0x133e) [0x5603e798defe]
22: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >, ghobject_t)+0x25a) [0x5603e798f1ca]
23: (PGBackend::objects_list_range(hobject_t const&, hobject_t const&, snapid_t, std::vector<hobject_t, std::allocator<hobject_t> >, std::vector<ghobject_t, std::allocator<ghobject_t> >)+0x192) [0x5603e774b6c2]
24: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x744) [0x5603e75e6794]
25: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x484) [0x5603e7612c54]
26: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x2cb) [0x5603e76152db]
27: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x106d) [0x5603e7556add]
28: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x5603e7b3d7e4]
29: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5603e7b40820]
30: (()+0x76ba) [0x7f475f2316ba]
31: (clone()+0x6d) [0x7f475e2a841d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ceph-osd.17.log.gz (969 KB) Michael Yang, 09/13/2018 10:41 AM


Related issues

Duplicates bluestore - Bug #36482: High amount of Read I/O on BlueFS/DB when listing omap keys Resolved 10/17/2018

History

#1 Updated by Michael Yang over 5 years ago

My mistake, the issue Affected Versions should be v12.2.7

#2 Updated by Nathan Cutler over 5 years ago

  • Affected Versions v12.2.7 added
  • Affected Versions deleted (v10.2.7)

#3 Updated by Michael Yang over 5 years ago

I found this dump related to scrub thread timeout, not related to bluestore.
There are such log in the osd log:
2018-08-20 14:27:25.897029 7f473f0a0700 0 log_channel(cluster) log [DBG] : 1.37f scrub starts
2018-08-20 14:27:42.365984 7f475c0c4700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f473f0a0700' had timed out after 15
...
2018-08-20 14:29:58.300102 7f475a123700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f473f0a0700' had timed out after 15
2018-08-20 14:29:58.300113 7f475a123700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f473f0a0700' had suicide timed out after 150
2018-08-20 14:29:58.310716 7f473f0a0700 -1 ** Caught signal (Aborted) *
in thread 7f473f0a0700 thread_name:tp_osd_tp

And the related configurations are:
"osd_op_thread_timeout": "15”,
"osd_op_thread_suicide_timeout": "150”,

#4 Updated by Michael Yang over 5 years ago

After I first set osd nodown, then do osd scrub, everything work fine.
I have no idea about why this happen...

Below is the log about the OSD which usually crush when do srubbing, and I found the usually crush pg 1.2ff scrub finished as normal, but cost about 112s...

root@ceph6:/var/log/ceph# grep -E "scrub starts|ok" ceph-osd.9.log
...
2018-09-11 13:58:38.752384 7fb4a4357700 0 log_channel(cluster) log [DBG] : 1.2ff scrub starts <<< before none set osd nodown, this scrub will unfinished in 150s and trigger osd suicide
2018-09-11 14:29:24.974260 7fb4a835f700 0 log_channel(cluster) log [DBG] : 1.2ff scrub starts <<< after set osd nodown, this pg scrub again and finished as normal
2018-09-11 14:31:16.194420 7fb4a4357700 0 log_channel(cluster) log [DBG] : 1.2ff scrub ok
2018-09-11 14:34:26.009298 7fb4a5b5a700 0 log_channel(cluster) log [DBG] : 1.1b4 scrub starts
2018-09-11 14:34:48.316811 7fb4a5b5a700 0 log_channel(cluster) log [DBG] : 1.1b4 scrub ok
2018-09-11 14:34:52.004525 7fb4a9361700 0 log_channel(cluster) log [DBG] : 1.3f5 scrub starts
2018-09-11 14:35:16.297578 7fb4a9361700 0 log_channel(cluster) log [DBG] : 1.3f5 scrub ok
...

#5 Updated by John Spray over 5 years ago

  • Project changed from Ceph to RADOS
  • Subject changed from Ceph OSD Crush when do scrub to OSD crash in KernelDevice::direct_read_unaligned while scrubbing
  • Category deleted (OSD)

#6 Updated by Sage Weil over 5 years ago

  • Status changed from New to Need More Info

Is there anything in the kernel log? These sorts of timeouts usually are caused by a media error or other hardware issue.

#7 Updated by Michael Yang over 5 years ago

Sage Weil wrote:

Is there anything in the kernel log? These sorts of timeouts usually are caused by a media error or other hardware issue.

I don't saw any related error log in dmesg;
There are SATA and SSD OSDs in each hosts of my cluster, and I only saw this issue happened on SATA OSDs, so I think it should not a media error or hardware issue.

This days, this issue still happen when PG is scrubbing. I thought maybe this issue caused by SATA OSDs handle slowly on PG scrubbing...

I found below logs in OSD log: =======================
2018-09-13 09:12:28.399471 7f19198ad700 0 log_channel(cluster) log [DBG] : 1.33f scrub starts
2018-09-13 09:13:50.177533 7f195a919700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f19158a5700' had timed out after 60
2018-09-13 09:13:50.177595 7f1959917700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f19158a5700' had timed out after 60
2018-09-13 09:13:51.073800 7f1959917700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f19158a5700' had timed out after 60
...
2018-09-13 09:15:34.491109 7f1958978700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f19158a5700' had timed out after 60
2018-09-13 09:15:34.491151 7f1958978700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f19198ad700' had timed out after 60
2018-09-13 09:15:38.339577 7f19158a5700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f19158a5700' had timed out after 60
2018-09-13 09:15:38.339673 7f19198ad700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f19198ad700' had timed out after 60
2018-09-13 09:15:38.340909 7f195a919700 0 -- 192.168.213.25:6828/5063932 >> 192.168.213.28:6813/7669705 conn(0x55b8c3b60800 :6828 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=254 cs=1 l=0).handle_connect_reply connect got RESETSESSION
2018-09-13 09:15:38.387928 7f19278c9700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.4 down, but it is still running =====================

I don't know does it related to my OSD configuration showed as below:
-----------
[osd]
bluefs_buffered_io = true
bluestore_cache_size_ssd = 2G
bluestore_csum_type = none
osd_op_num_threads_per_shard = 2
osd_scrub_sleep = 2

bluestore_rocksdb_options = compression=kNoCompression,max_write_buffer_number=64,min_write_buffer_number_to_merge=32,recycle_log_file_num=64,compaction_style=kCompactionStyleLevel,write_buffer_size=4MB,target_file_size_base=4MB,max_background_compactions=64,level0_file_num_compaction_trigger=64,level0_slowdown_writes_trigger=128,level0_stop_writes_trigger=256,max_bytes_for_level_base=6GB,compaction_threads=32,flusher_threads=8,compaction_readahead_size=2MB

#8 Updated by Michael Yang over 5 years ago

I found it's easy to reproduce this issue in my cluster with below steps:
1. Choose one pg which had caused this issue before
2. Trigger pg scrub with command: ceph pg scrub <pgid>
3. Then check the osd log, and mostly I would found the crush info

So I set one osd debug_osd & debug_bluestore level to 10, and collect the osd logs, you can get it from the attachment.
Seems the pg scrubbing only do some parts and then be requeued, and then it hadn't be scheduled again...
So then after 150s, it suicide itself and crush.

After the last time of the pg doing scrub, I saw the osd still busying to handle other pgs normal ops.
I am not sure does the problem(the pg scrub job wouldn't be scheduled again) related to osd_scrub_priority very lower...
On this host, I have 6 SATA OSDs and 8 SSD OSDs, and my cpu cores is 48, memory is 64G, and I only found this issue on the SATA OSDs.

Any other info needed, let me know.

#9 Updated by Michael Yang over 5 years ago

I think I got the reason about this issue:
1). One pg scrub start
2). The scrub job doesn't finished in 15s(osd_op_thread_timeout)
3). OSD Heartbeat check mark it unhealthy
4). Then the OSD will drop other OSDs ping request - call HeartbeatMap::is_healthy() to check and return false as below:
void OSD::handle_osd_ping(MOSDPing *m) {
...
switch (m->op) {
case MOSDPing::PING: {
...
if (!cct->get_heartbeat_map()->is_healthy()) {
dout(10) << "internal heartbeat not healthy, dropping ping request" << dendl;
break;
}

Message *r = new MOSDPing(monc->get_fsid(),
curmap->get_epoch(),
MOSDPing::PING_REPLY, m->stamp,
cct->_conf->osd_heartbeat_min_size);
m->get_connection()->send_message(r);
...
}
...
}
5). As the OSD dropped other OSDs ping requests, so after a while, they will report to monitor to mark the OSD down
6). Then the monitor mark the scrubbing OSD down and populate the new osdmap
7). The scrubbing OSD get the new osdmap, will do something, I am not sure about this process... Does it will impact the OSD scrubbing job?
8). If the scrub job doesn't finished in 150s(osd_op_thread_suicide_timeout), the OSD will suicide...

Maybe I could enhance the osd_op_thread_suicide_timeout to avoid this issue.

That's my analysis according to the before attachment log, any mistake, let me know.

#10 Updated by Michael Yang over 5 years ago

After more dig into, I think it still a bluestore issue.
The suicide thread which timed out stuck when call BlueStore::collection_list().

Below is the debug log related:
---------
2018-09-17 16:59:44.126 7f59f3c80700 20 osd.10 pg_epoch: 35564 pg[1.67f( v 35564'499091 (35479'496085,35564'499091] local-lis/les=35541/35542 n=162 ec=33640/76 lis/c 35541/35541 les/c/f 35542/35542/0 35541/35541/35541) [12,10] r=1 lpr=35541 luod=0'0 lua=35516'499049 crt=35564'499091 lcod 35564'499090 active mbc={}] scrub state BUILD_MAP_REPLICA [1:fe7df207:::rbd_data.51c2e6b8b4567.000000000000b7ff:0,MAX) max_end MAX
2018-09-17 16:59:44.126 7f59f3c80700 10 osd.10 pg_epoch: 35564 pg[1.67f( v 35564'499091 (35479'496085,35564'499091] local-lis/les=35541/35542 n=162 ec=33640/76 lis/c 35541/35541 les/c/f 35542/35542/0 35541/35541/35541) [12,10] r=1 lpr=35541 luod=0'0 lua=35516'499049 crt=35564'499091 lcod 35564'499090 active mbc={}] build_scrub_map_chunk [1:fe7df207:::rbd_data.51c2e6b8b4567.000000000000b7ff:0,MAX) pos (0/0)
2018-09-17 16:59:44.126 7f59f3c80700 15 bluestore(/var/lib/ceph/osd/ceph-10) collection_list 1.67f_head start #1:fe7df207:::rbd_data.51c2e6b8b4567.000000000000b7ff:0# end #MAX# max 2147483647
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list range 0x7f7ffffffffffffffdfe600000 to 0x7f7ffffffffffffffdfe800000 and 0x7f8000000000000001fe600000 to 0x7f8000000000000001fe800000 start #1:fe7df207:::rbd_data.51c2e6b8b4567.000000000000b7ff:0#
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list start from 0x7f8000000000000001fe7df207217262'd_data.51c2e6b8b4567.000000000000b7ff!='0x0000000000000000ffffffffffffffff'o' temp=0
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list pend 0x7f8000000000000001fe800000
2018-09-17 16:59:44.126 7f59f3c80700 30 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list key 0x7f8000000000000001fe7df207217262'd_data.51c2e6b8b4567.000000000000b7ff!='0xfffffffffffffffeffffffffffffffff'o'
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list oid #1:fe7df207:::rbd_data.51c2e6b8b4567.000000000000b7ff:head# end #MAX#
2018-09-17 16:59:44.126 7f59f3c80700 30 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list key 0x7f8000000000000001fe7e0000217262'd_data.51c2e6b8b4567.00000000000087bc!='0xfffffffffffffffeffffffffffffffff'o'
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list oid #1:fe7e0000:::rbd_data.51c2e6b8b4567.00000000000087bc:head# end #MAX#
2018-09-17 16:59:44.126 7f59f3c80700 30 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list key 0x7f8000000000000001fe7e0c'"!rbd_data.51c2e6b8b4567.00000000000009d3!='0xfffffffffffffffeffffffffffffffff'o'
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list oid #1:fe7e0c22:::rbd_data.51c2e6b8b4567.00000000000009d3:head# end #MAX#
2018-09-17 16:59:44.126 7f59f3c80700 30 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list key 0x7f8000000000000001fe7e82be217262'd_data.ae4eb6b8b4567.0000000000002c54!='0xfffffffffffffffeffffffffffffffff'o'
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list oid #1:fe7ec476:::rbd_data.51c2e6b8b4567.000000000001126b:head# end #MAX#
2018-09-17 16:59:44.126 7f59f3c80700 30 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list key 0x7f8000000000000001fe7f108a217262'd_data.47fe26b8b4567.0000000000007d2e!='0xfffffffffffffffeffffffffffffffff'o'
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list oid #1:fe7f108a:::rbd_data.47fe26b8b4567.0000000000007d2e:head# end #MAX#
2018-09-17 16:59:44.126 7f59f3c80700 30 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list key 0x7f8000000000000001fe7f57'T!rbd_data.51c2e6b8b4567.00000000000004bc!='0xfffffffffffffffeffffffffffffffff'o'
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list oid #1:fe7f5754:::rbd_data.51c2e6b8b4567.00000000000004bc:head# end #MAX#
2018-09-17 16:59:44.126 7f59f3c80700 30 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list key 0x7f8000000000000001fe7f92c2217262'd_data.b552c6b8b4567.000000000000403f!='0xfffffffffffffffeffffffffffffffff'o'
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list oid #1:fe7f92c2:::rbd_data.b552c6b8b4567.000000000000403f:head# end #MAX#
2018-09-17 16:59:44.126 7f59f3c80700 30 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list key 0x7f8000000000000001fe7fb411217262'd_data.ae4eb6b8b4567.0000000000003a81!='0xfffffffffffffffeffffffffffffffff'o'
2018-09-17 16:59:44.126 7f59f3c80700 20 bluestore(/var/lib/ceph/osd/ceph-10) _collection_list oid #1:fe7fb411:::rbd_data.ae4eb6b8b4567.0000000000003a81:head# end #MAX#

Then I found nothing log with keyword "collection_list" below until the crush stack.
-------
2018-09-17 17:02:16.220 7f5a39f0f700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f59f3c80700' had timed out after 15
2018-09-17 17:02:16.220 7f5a39f0f700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f59f3c80700' had suicide timed out after 150
2018-09-17 17:02:16.224 7f59f3c80700 -1 ** Caught signal (Aborted) *
in thread 7f59f3c80700 thread_name:tp_osd_tp

ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (()+0x9169e0) [0x5649d7ca89e0]
2: (()+0x11390) [0x7f5a3f4fc390]
3: (rocksdb::DBIter::FindNextUserEntryInternal(bool, bool)+0x5b8) [0x5649d7d35288]
4: (rocksdb::DBIter::Next()+0x1ca) [0x5649d7d3768a]
5: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) [0x5649d7bf5dbd]
6: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector&lt;ghobject_t, std::allocator&lt;ghobject_t&gt; >, ghobject_t)+0xcbd) [0x5649d7b58a5d]
7: (BlueStore::collection_list(boost::intrusive_ptr&lt;ObjectStore::CollectionImpl&gt;&, ghobject_t const&, ghobject_t const&, int, std::vector&lt;ghobject_t, std::allocator&lt;ghobject_t&gt; >, ghobject_t)+0x9f) [0x5649d7b59e2f]
8: (PGBackend::objects_list_range(hobject_t const&, hobject_t const&, std::vector&lt;hobject_t, std::allocator&lt;hobject_t&gt; >, std::vector&lt;ghobject_t, std::allocator&lt;ghobject_t&gt; >)+0x15b) [0x5649d794072b]
9: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x278) [0x5649d77f3d58]
10: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x168d) [0x5649d781b71d]
11: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0xae) [0x5649d781c42e]
12: (PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr&lt;PG&gt;&, ThreadPool::TPHandle&)+0x1a) [0x5649d79c905a]
13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590) [0x5649d776f000]
14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7f5a40e6c41e]
15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f5a40e6e4a0]
16: (()+0x76ba) [0x7f5a3f4f26ba]
17: (clone()+0x6d) [0x7f5a3eb0141d]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

#11 Updated by Igor Fedotov about 5 years ago

  • Project changed from RADOS to bluestore

IMO this is BlueStore (or more precisely BlueFS and/or RocksDB) related.

And I think it's duplicate of #36482

Overall preliminary explanation is that under some circumstances (improper DB layout on disk?) RocksDB takes too long to enumerate omap records for specific object.

This triggers watchdog timeout which kills the OSD.

#12 Updated by Sage Weil about 5 years ago

  • Status changed from Need More Info to Duplicate

#13 Updated by Sage Weil about 5 years ago

  • Duplicates Bug #36482: High amount of Read I/O on BlueFS/DB when listing omap keys added

Also available in: Atom PDF