Bug #15758: msgr/async: Messenger thread long time lock hold risk - Messengers - Ceph

Actions

Copy link

Bug #15758

closed

msgr/async: Messenger thread long time lock hold risk

Added by Haomai Wang almost 8 years ago. Updated about 5 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Thread 46 (Thread 0x7fa7365a2700 (LWP 17842)):
#0 0x00007fa73b888705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000009e135d in FileStore::op_queue_reserve_throttle(FileStore::Op*, ThreadPool::TPHandle*) ()
#2 0x00000000009f16e5 in FileStore::queue_transactions(ObjectStore::Sequencer*, std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, std::tr1::shared_ptr<TrackedOp>, ThreadPool::TPHandle*) ()
#3 0x00000000006e9b19 in ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction*, Context*, Context*, Context*, std::tr1::shared_ptr<TrackedOp>, ThreadPool::TPHandle*) ()
#4 0x00000000006a4f42 in OSD::dispatch_context(PG::RecoveryCtx&, PG*, std::tr1::shared_ptr<OSDMap const>, ThreadPool::TPHandle*) ()
#5 0x00000000006c4833 in OSD::handle_pg_peering_evt(spg_t, pg_info_t const&, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > >&, unsigned int, pg_shard_t, bool, std::tr1::shared_ptr<PG::CephPeeringEvt>) ()
#6 0x00000000006c67b3 in OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>) ()
#7 0x00000000006c8cc0 in OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>) ()
#8 0x00000000006c9ede in OSD::_dispatch(Message*) ()
#9 0x00000000006ca5c7 in OSD::ms_dispatch(Message*) ()
#10 0x0000000000d5aa97 in Messenger::ms_deliver_dispatch(Message*) ()
#11 0x0000000000d5b111 in C_handle_dispatch::do_request(int) ()
#12 0x0000000000d109a5 in EventCenter::process_events(int) ()
#13 0x0000000000cece68 in Worker::entry() ()
#14 0x00007fa73b884df5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007fa739c411ad in clone () from /lib64/libc.so.6
Thread 45 (Thread 0x7fa735da1700 (LWP 17843)):
#0 0x00007fa73b88af7d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fa73b886d32 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007fa73b886c38 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000bfdbe8 in Mutex::Lock(bool) ()
#4 0x00000000006ca3a5 in OSD::ms_dispatch(Message*) ()
#5 0x0000000000d5aa97 in Messenger::ms_deliver_dispatch(Message*) ()
#6 0x0000000000d5b111 in C_handle_dispatch::do_request(int) ()
#7 0x0000000000d109a5 in EventCenter::process_events(int) ()
#8 0x0000000000cece68 in Worker::entry() ()
#9 0x00007fa73b884df5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fa739c411ad in clone () from /lib64/libc.so.6

osd_lock is main problem, in heavy io pressure. It's easy to block messenger thread dispatch message for seconds. If we create a big pool under heavy io pressure, it's easy to let osd down/in because of heartbeat message blocking. It will be more easy to show up in async messenger than simple messenger.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Messengers

Custom queries

Bug #15758

msgr/async: Messenger thread long time lock hold risk

Updated by Haomai Wang almost 8 years ago

Updated by Haomai Wang almost 8 years ago

Updated by Haomai Wang almost 8 years ago

Updated by Greg Farnum almost 8 years ago

Updated by Haomai Wang almost 8 years ago

Updated by Sage Weil almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Loïc Dachary over 7 years ago

Updated by Greg Farnum about 5 years ago