Project

General

Profile

Actions

Support #53410

closed

bluestore read rocksdb high latency

Added by yite gu over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

problem detail log: https://paste.openstack.org/show/811291/
get a k/v from rocksdb spend 48s, so I gone to check my db stats:
  • DB Stats **
    Uptime(secs): 3.4 total, 3.4 interval
    Cumulative writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 GB, 0.00 MB/s
    Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
    Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
    Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s
    Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s
    Interval stall: 00:00:0.000 H:M:S, 0.0 percent
  • Compaction Stats [default] **
    Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
    ----------------------------------------------------------------------------------------------------------------------------------------------------------
    L0 1/1 3.61 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
    L1 3/3 150.97 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
    L2 42/0 2.14 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
    L3 291/0 18.38 GB 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
    L4 248/0 16.47 GB 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
    Sum 585/4 37.13 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
    Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
    Uptime(secs): 3.4 total, 3.4 interval
    Flush(GB): cumulative 0.000, interval 0.000
    AddFile(GB): cumulative 0.000, interval 0.000
    AddFile(Total Files): cumulative 0, interval 0
    AddFile(L0 Files): cumulative 0, interval 0
    AddFile(Keys): cumulative 0, interval 0
    Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
    Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
    Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

this cluster have no write IO, only have a little read IO, rocksdb can not auto compaction in this status. I guess that read IO product amplification.
ceph cluster info:
osd: 114
ceph df detial:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
docker 20 N/A 97910G 77576G 64.46 42769G 33488845 32703k 3438M 124M 227T
average 293761 objects per osd

can we set level_compaction_dynamic_level_bytes = true for bluestore rocksdb to trigger compact?

Actions #1

Updated by yite gu over 2 years ago

osd scrub object will scan db, to find target key, this is a huge expense.
670 ObjectMap::ObjectMapIterator iter = store->get_omap_iterator(
671 ch,
672 ghobject_t(
673 poid, ghobject_t::NO_GEN, get_parent()->whoami_shard().shard));
674 ceph_assert(iter);
675 if (pos.omap_pos.length()) {
676 iter->lower_bound(pos.omap_pos);
677 } else {
678 iter->seek_to_first();
679 }
680 int max = g_conf()->osd_deep_scrub_keys;
681 while (iter->status() == 0 && iter->valid()) {
682 pos.omap_bytes += iter->value().length();
683 ++pos.omap_keys;
684 --max;
685 // fixme: we can do this more efficiently.
686 bufferlist bl;
687 encode(iter->key(), bl);
688 encode(iter->value(), bl);
689 pos.omap_hash << bl;
690
691 iter->next();
692
693 if (iter->valid() && max == 0) {
694 pos.omap_pos = iter->key();
695 return EINPROGRESS;
696 }
697 if (iter
>status() < 0) {
698 dout(25) << func << " " << poid
699 << " on omap scan, db status error" << dendl;
700 o.read_error = true;
701 return 0;
702 }
703 }

Actions #2

Updated by yite gu over 2 years ago

produce high latency is located to Next() of iterator:
#0 0x00007f3b3a688d03 in pread64 () from /lib64/libpthread.so.0
#1 0x00005584574d5858 in pread (__offset=2000951205888, __nbytes=8192, __buf=<optimized out>, __fd=<optimized out>) at /usr/include/bits/unistd.h:99
#2 KernelDevice::direct_read_unaligned (this=this@entry=0x558461743400, off=off@entry=2000951208773, len=len@entry=4311, buf=buf@entry=0x7f3b1c196850 "") at /usr/src/debug/ceph-12.2.8/src/os/bluestore/KernelDevice.cc:755
#3 0x00005584574d6362 in KernelDevice::read_random (this=0x558461743400, off=2000951208773, len=4311, buf=0x7f3b1c196850 "", buffered=<optimized out>) at /usr/src/debug/ceph-12.2.8/src/os/bluestore/KernelDevice.cc:789
#4 0x00005584574aa058 in BlueFS::_read_random (this=0x55846141e700, h=0x558461c1cd00, off=27028293, len=4311, out=out@entry=0x7f3b1c196850 "") at /usr/src/debug/ceph-12.2.8/src/os/bluestore/BlueFS.cc:915
#5 0x00005584574d2400 in read_random (out=0x7f3b1c196850 "", len=<optimized out>, offset=<optimized out>, h=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-12.2.8/src/os/bluestore/BlueFS.h:425
#6 BlueRocksRandomAccessFile::Read (this=<optimized out>, offset=<optimized out>, n=<optimized out>, result=0x7f3b1c1967b0, scratch=0x7f3b1c196850 "") at /usr/src/debug/ceph-12.2.8/src/os/bluestore/BlueRocksEnv.cc:94
#7 0x00005584578a62bd in rocksdb::RandomAccessFileReader::Read (this=this@entry=0x558461c5f960, offset=27028293, n=n@entry=4311, result=result@entry=0x7f3b1c1967b0, scratch=scratch@entry=0x7f3b1c196850 "") at /usr/src/debug/ceph-12.2.8/src/rocksdb/util/file_reader_writer.cc:92
#8 0x00005584578822ae in ReadBlock (options=..., buf=0x7f3b1c196850 "", contents=0x7f3b1c1967b0, handle=..., footer=..., file=0x558461c5f960) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/format.cc:266
#9 rocksdb::ReadBlockContents (file=0x558461c5f960, footer=..., read_options=..., handle=..., contents=contents@entry=0x7f3b1c197c40, ioptions=..., decompression_requested=decompression_requested@entry=true, compression_dict=..., cache_options=...) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/format.cc:372
#10 0x00005584578734c7 in rocksdb::(anonymous namespace)::ReadBlockFromFile (file=<optimized out>, footer=..., options=..., handle=..., result=result@entry=0x7f3b1c197d70, ioptions=..., do_uncompress=true, compression_dict=..., cache_options=..., global_seqno=18446744073709551615, read_amp_bytes_per_bit=0) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/block_based_table_reader.cc:83
#11 0x00005584578757a8 in rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache (rep=rep@entry=0x558461bf5880, ro=..., handle=..., compression_dict=..., block_entry=block_entry@entry=0x7f3b1c197ec0, is_index=is_index@entry=false) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/block_based_table_reader.cc:1357
#12 0x0000558457875af6 in rocksdb::BlockBasedTable::NewDataBlockIterator (rep=0x558461bf5880, ro=..., handle=..., input_iter=input_iter@entry=0x0, is_index=<optimized out>, s=...) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/block_based_table_reader.cc:1271
#13 0x000055845787e458 in rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator (this=0x5584c1a83dc0, index_value=...) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/block_based_table_reader.cc:1399
#14 0x00005584578a11de in rocksdb::(anonymous namespace)::TwoLevelIterator::InitDataBlock (this=0x5584bfe3b5c0) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/two_level_iterator.cc:244
#15 0x00005584578a14ac in rocksdb::(anonymous namespace)::TwoLevelIterator::SkipEmptyDataBlocksForward (this=0x5584bfe3b5c0) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/two_level_iterator.cc:192
#16 0x00005584578a1546 in Next (this=0x55847c6541d0) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/iterator_wrapper.h:61
#17 rocksdb::(anonymous namespace)::TwoLevelIterator::Next (this=0x55847c654180) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/two_level_iterator.cc:171
#18 0x0000558457888575 in Next (this=0x5584c0dce3a0) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/iterator_wrapper.h:61
#19 rocksdb::MergingIterator::Next (this=0x55849d63dd48) at /usr/src/debug/ceph-12.2.8/src/rocksdb/table/merging_iterator.cc:182
#20 0x000055845790b4d3 in rocksdb::DBIter::FindNextUserEntryInternal (this=this@entry=0x55849d63db38, skipping=skipping@entry=true, prefix_check=prefix_check@entry=false) at /usr/src/debug/ceph-12.2.8/src/rocksdb/db/db_iter.cc:510
#21 0x000055845790c71f in FindNextUserEntry (prefix_check=false, skipping=true, this=0x55849d63db38) at /usr/src/debug/ceph-12.2.8/src/rocksdb/db/db_iter.cc:358
#22 rocksdb::DBIter::Next (this=0x55849d63db38) at /usr/src/debug/ceph-12.2.8/src/rocksdb/db/db_iter.cc:335
#23 0x000055845744cfea in RocksDBStore::RocksDBWholeSpaceIteratorImpl::next (this=0x5584c1ce2cb8) at /usr/src/debug/ceph-12.2.8/src/kv/RocksDBStore.cc:971

Actions #4

Updated by Igor Fedotov over 2 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF