Fix #5232: osd: slow peering due to pg log rewrites - Ceph - Ceph

Custom queries

Backports: mimic
Backports: missing release
Backports: nautilus
Bluestore
Bug queue
Bug queue - no subprojects
Bug triage
Ceph backlog
Crash queue
Crash triage
Feature Requests
Feedback
My issues
Need Review
Pending backports
Priority queue
Product Backlog Scrub
Project Triage
Test Failures

Actions

Copy link

Fix #5232

closed

osd: slow peering due to pg log rewrites

Added by Stefan Priebe almost 11 years ago. Updated almost 11 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Samuel Just

Category:

OSD

Target version:

v0.66

% Done:

Source:

other

Tags:

Backport:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I noticed that since cuttlefish the osd recovery process is extremely slow. Also client I/o gets stalled to the recovering osd ending up in slow request log messages.

Things I noticed:
- recovering osd uses 100-200% CPU until recovered
- nearly no disk I/o from that osd
- under bobtail I had much lower CPU usage and high disk I/o (which makes sense)
- recovering from a simple osd restart (getting changed objects since stop) is awful slow even my cluster was only degraded by 0.1% (24tb/3repl) took 10-20min
- I'm using xfs

This is a major blocker as recovering is nearly impossible.

Used version current upstream/cuttlefish

I've provided sage a gdb threads bt all.

Greets
Stefan

Files

Download all files

gdb.txt.gz (31.9 KB) gdb.txt.gz		Sage Weil, 06/02/2013 09:38 AM
ceph-osd.23.tar.gz (1.83 MB) ceph-osd.23.tar.gz		Stefan Priebe, 06/02/2013 11:32 AM

Related issues 1 (0 open — 1 closed)

Related to Ceph - Bug #5238: osd: slow recovery (uselessly dirtying pg logs during peering)

Resolved

Samuel Just

06/03/2013

Actions

Issue # Delay: days Cancel

History
Notes
Property changes
Associated revisions

Actions

Copy link

Updated by Sage Weil almost 11 years ago

File gdb.txt.gz gdb.txt.gz added
Category set to OSD

this thread?

No symbol table info available.
#1  0x00007fb90be5d1f5 in ?? () from /usr/lib/libleveldb.so.1
No symbol table info available.
#2  0x00007fb90be4781a in ?? () from /usr/lib/libleveldb.so.1
No symbol table info available.
#3  0x00007fb90be47c3d in ?? () from /usr/lib/libleveldb.so.1
No symbol table info available.
#4  0x000000000079a75a in LevelDBStore::LevelDBWholeSpaceIteratorImpl::seek_to_first(std::string const&) ()
No symbol table info available.
#5  0x0000000000798b1f in LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix (this=0x2b4f02c0, prefix=...)
    at os/LevelDBStore.cc:141
        it = {<std::tr1::__shared_ptr<KeyValueDB::IteratorImpl, (__gnu_cxx::_Lock_policy)2>> = {_M_ptr = 0x36889740, 
            _M_refcount = {_M_pi = 0x31ab5b20}}, <No data fields>}
#6  0x000000000078a9f6 in DBObjectMap::clear_header (this=0x328ab00, header=
DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.
) at os/DBObjectMap.cc:1158
        keys = {_M_t = {
            _M_impl = {<std::allocator<std::_Rb_tree_node<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {<No data fields
>}, <No data fields>}, 
              _M_key_compare = {<std::binary_function<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool>> = {<No data fields>}, <No data fields>}, _M_header = {
                _M_color = 7894675, _M_parent = 0x7fb904ac73b8, _M_left = 0x7fb90be11fe0, _M_right = 0x0}, 
              _M_node_count = 13107200}}}
#7  0x000000000078af8a in DBObjectMap::_clear (this=0x328ab00, header=
DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.
) at os/DBObjectMap.cc:597
        parent = {<std::tr1::__shared_ptr<DBObjectMap::_Header, (__gnu_cxx::_Lock_policy)2>> = {_M_ptr = 0x3cb0780, _M_refcount = {
              _M_pi = 0x2e08da00}}, <No data fields>}
        __PRETTY_FUNCTION__ = "int DBObjectMap::_clear(std::tr1::shared_ptr<DBObjectMap::_Header>, std::tr1::shared_ptr<KeyValueDB::TransactionImpl>)" 
#8  0x000000000078b63c in DBObjectMap::clear (this=0x328ab00, hoid=<value optimized out>, spos=<value optimized out>)
    at os/DBObjectMap.cc:583
        t = {<std::tr1::__shared_ptr<KeyValueDB::TransactionImpl, (__gnu_cxx::_Lock_policy)2>> = {_M_ptr = 0x2b4f02c0, 
            _M_refcount = {_M_pi = 0x369a6e80}}, <No data fields>}
        header = {<std::tr1::__shared_ptr<DBObjectMap::_Header, (__gnu_cxx::_Lock_policy)2>> = {_M_ptr = 0x3cb0780, _M_refcount = {
              _M_pi = 0x2e08da00}}, <No data fields>}
        __PRETTY_FUNCTION__ = "virtual int DBObjectMap::clear(const hobject_t&, const SequencerPosition*)" 
        r = <value optimized out>
#9  0x000000000073a0a6 in FileStore::lfn_unlink (this=0x31be000, cid=
DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.
) at os/FileStore.cc:350
        path = {<std::tr1::__shared_ptr<CollectionIndex::Path, (__gnu_cxx::_Lock_policy)2>> = {_M_ptr = 0x32e05920, _M_refcount = {
              _M_pi = 0x31ab5ac0}}, <No data fields>}
        exist = 1
        st = {st_dev = 64770, st_ino = 1610612898, st_nlink = 1, st_mode = 33188, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, 
          st_size = 4194304, st_blksize = 4096, st_blocks = 8200, st_atim = {tv_sec = 1369771413, tv_nsec = 196604128}, st_mtim = {
            tv_sec = 1370120522, tv_nsec = 795389833}, st_ctim = {tv_sec = 1370120522, tv_nsec = 795389833}, __unused = {0, 0, 0}}
        index = {<std::tr1::__shared_ptr<CollectionIndex, (__gnu_cxx::_Lock_policy)2>> = {_M_ptr = 0x3b873c0, _M_refcount = {
              _M_pi = 0x29fa4120}}, <No data fields>}
        r = 0
        __PRETTY_FUNCTION__ = "int FileStore::lfn_unlink(coll_t, const hobject_t&, const SequencerPosition&)" 
        __func__ = "lfn_unlink" 
#10 0x000000000073a334 in FileStore::_remove (this=0x31be000, cid=
DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.
) at os/FileStore.cc:2882
        r = <value optimized out>
#11 0x0000000000765ef1 in FileStore::_do_transaction (this=<value optimized out>, t=<value optimized out>, 
    op_seq=<value optimized out>, trans_num=<value optimized out>) at os/FileStore.cc:2408
        cid = {static META_COLL = {static META_COLL = <same as static member of an already seen type>, str = {
              static npos = 18446744073709551615, 
              _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
                _M_p = 0x3197258 "meta"}}}, str = {static npos = 18446744073709551615, 
            _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
              _M_p = 0x2fa64a38 "4.de0_head"}}}
        oid = {oid = {name = {static npos = 18446744073709551615, 
              _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
                _M_p = 0x2a116c48 "rbd_data.57f6766b8b4567.", '0' <repeats 14 times>, "17"}}}, snap = {
            val = 18446744073709551614}, hash = 252288480, max = false, pool = 4, nspace = {static npos = 18446744073709551615, 
            _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
              _M_p = 0xc80398 ""}}, key = {static npos = 18446744073709551615, 
            _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
              _M_p = 0xc80398 ""}}}
        op = <value optimized out>
        r = <value optimized out>
        i = {p = {bl = 0x35163620, ls = 0x35163620, off = 98, p = {_M_node = 0x31be3920}, p_off = 98}, sobject_encoding = false, 
          pool_override = -1, use_pool_override = false, replica = false, _tolerate_collection_add_enoent = false}
        spos = {seq = 174858271, trans = 0, op = 0}
        __PRETTY_FUNCTION__ = "unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)" 
#12 0x0000000000769af1 in FileStore::_do_transactions (this=0x31be000, tls=..., op_seq=174858271, handle=0x7fb904ac8ce0)
    at os/FileStore.cc:2151
        r = 201801232
        trans_num = 0
#13 0x0000000000769d74 in FileStore::_do_op (this=0x31be000, osr=0x2a2ff900, handle=...) at os/FileStore.cc:1985
        o = 0x2ae2f7d0
        r = <value optimized out>
#14 0x0000000000847f91 in ThreadPool::worker (this=0x31bea08, wt=<value optimized out>) at common/WorkQueue.cc:119
        tp_handle = {cct = 0x31a2000, hb = 0x42ff450, grace = 60, suicide_grace = 180}
        item = 0x2a2ff900
        wq = 0x31beb90
        ss = <incomplete type>
        hb = 0x42ff450

it's removing an rbd data object, but stuck in the leveldb iterator?

Actions

Copy link

Updated by Sage Weil almost 11 years ago

stefan: can you also do

ceph --admin-daemon /var/run/ceph/ceph-osd.NNN.asok config set debug_ms 1
 ceph --admin-daemon /var/run/ceph/ceph-osd.NNN.asok config set debug_osd 20
 ceph --admin-daemon /var/run/ceph/ceph-osd.NNN.asok config set debug_filestore 20

to capture some logging when the osd is eating cpu?

thanks!

Actions

Copy link

Updated by Stefan Priebe almost 11 years ago

File ceph-osd.23.tar.gz ceph-osd.23.tar.gz added

Attached you'll find a log with debugging enabled in betweek and two new gdb thread all traces.

Actions

Copy link

Updated by Stefan Priebe almost 11 years ago

May be a hint or just luck i could reduce the effect and time to recover by lowering osd op threads to 2 (default) instead of 4 i had normally.

But there are still slow requests.

Actions

Copy link

Updated by Sage Weil almost 11 years ago

Status changed from New to 12

This looks to me like a lot of time is being spent in leveldb clearing and rewriting the pglog. This is probably just due to a naive PG::write_log()...?

Actions

Copy link

Updated by Sage Weil almost 11 years ago

even so, it seems like a lot of time is spent just in the removal phase.. perhaps there is something not quite right with the removal of old keys?

Actions

Copy link

I'm fridudad ;-) Peering works fine but recovery does not. In my initial text of this tracker i also mentioned recovery as the main problem and not peering. Should i open a new report?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Fix #5232

osd: slow peering due to pg log rewrites

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Stefan Priebe almost 11 years ago

Updated by Stefan Priebe almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Stefan Priebe almost 11 years ago

Updated by Stefan Priebe almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Ian Colle almost 11 years ago

Updated by Samuel Just almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Stefan Priebe almost 11 years ago

Updated by Samuel Just almost 11 years ago

Updated by Samuel Just almost 11 years ago

Updated by Stefan Priebe almost 11 years ago