Bug #9921
closedmsgr/osd/pg dead lock giant
0%
Description
2d6980570af2226fdee0edfcfe5a8e7f60fae615
/a/teuthology-2014-10-27_02:32:02-rados-giant-distro-basic-multi/573639
/a/samuelj-2014-10-27_21:32:10-rados-giant-wip-testing-vanilla-fixes-basic-multi/575303
Updated by Samuel Just over 9 years ago
- Status changed from New to Duplicate
just kidding, this appears to be 9898
Updated by Samuel Just over 9 years ago
- Subject changed from sync_entry hang to dead lock
- Status changed from Duplicate to 12
nvm, different deadlock
Updated by Samuel Just over 9 years ago
- Subject changed from dead lock to msgr/osd/pg dead lock giant
Updated by Greg Farnum over 9 years ago
SimpleMessenger lock is held by an accepting Pipe trying to replace an old Pipe:
[Switching to thread 100 (Thread 0x7f248b127700 (LWP 9607))] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 185 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory. (gdb) bt #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x0000000000c34575 in Wait (mutex=..., this=0x84c45d8) at ./common/Cond.h:55 #2 Pipe::stop_and_wait (this=this@entry=0x84c43c0) at msg/Pipe.cc:1437 #3 0x0000000000c42b96 in Pipe::accept (this=this@entry=0x84c4940) at msg/Pipe.cc:671 #4 0x0000000000c4654d in Pipe::reader (this=0x84c4940) at msg/Pipe.cc:1448 #5 0x0000000000c4f23d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:50 #6 0x00007f24b0b79182 in start_thread (arg=0x7f248b127700) at pthread_create.c:312 #7 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f248b1279c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154 #8 0x0000000000000000 in ?? ()
It's waiting for the old Pipe to finish dispatching, which is blocked on on the pg_map_lock:
Switching to thread 94 (Thread 0x7f2485f7d700 (LWP 9512))] #0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87 87 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S: No such file or directory. (gdb) bt #0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87 #1 0x00000000006bc800 in RWLock::get_read (this=0x4546168) at ./common/RWLock.h:68 #2 0x000000000067a995 in RLocker (lock=..., this=<synthetic pointer>) at ./common/RWLock.h:122 #3 OSD::get_pg_or_queue_for_pg (this=this@entry=0x4545000, pgid=..., op=...) at osd/OSD.cc:2780 #4 0x00000000006e03d6 in OSD::handle_replica_op<MOSDSubOpReply, 77> (this=this@entry=0x4545000, op=..., osdmap=...) at osd/OSD.cc:8182 #5 0x000000000069ad9e in OSD::dispatch_op_fast (this=this@entry=0x4545000, op=..., osdmap=...) at osd/OSD.cc:5752 #6 0x000000000069afb8 in OSD::dispatch_session_waiting (this=this@entry=0x4545000, session=session@entry=0x9714d00, osdmap=...) at osd/OSD.cc:5402 #7 0x000000000069b39e in OSD::ms_fast_dispatch (this=0x4545000, m=<optimized out>) at osd/OSD.cc:5512 #8 0x0000000000c21b46 in ms_fast_dispatch (m=0x13eeedc0, this=0x447e700) at msg/Messenger.h:503 #9 DispatchQueue::fast_dispatch (this=0x447e8b8, m=0x13eeedc0) at msg/DispatchQueue.cc:71 #10 0x0000000000c465c6 in Pipe::reader (this=0x84c43c0) at msg/Pipe.cc:1591 #11 0x0000000000c4f23d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:50 #12 0x00007f24b0b79182 in start_thread (arg=0x7f2485f7d700) at pthread_create.c:312 #13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f2485f7d9c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154 #14 0x0000000000000000 in ?? ()
And that's held by the main dispatch thread doing handle_osd_map:
[Switching to thread 98 (Thread 0x7f249e053700 (LWP 3067))] #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory. (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f24b0b7b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f24b0b7b480 in __GI___pthread_mutex_lock (mutex=0xfc19060) at ../nptl/pthread_mutex_lock.c:79 #3 0x0000000000b32a7f in Mutex::Lock (this=this@entry=0xfc19050, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:91 #4 0x00000000007a2030 in PG::lock (this=0xfc18c00, no_lockdep=no_lockdep@entry=false) at osd/PG.cc:229 #5 0x00000000006a49b5 in OSD::consume_map (this=this@entry=0x4545000) at osd/OSD.cc:6621 #6 0x00000000006a6c35 in OSD::handle_osd_map (this=this@entry=0x4545000, m=m@entry=0x674b3c0) at osd/OSD.cc:6376 #7 0x00000000006a8c9b in OSD::_dispatch (this=this@entry=0x4545000, m=m@entry=0x674b3c0) at osd/OSD.cc:5806 #8 0x00000000006a91a7 in OSD::ms_dispatch (this=0x4545000, m=0x674b3c0) at osd/OSD.cc:5386 #9 0x0000000000c22af9 in ms_deliver_dispatch (m=0x674b3c0, this=0x447e700) at msg/Messenger.h:532 #10 DispatchQueue::entry (this=0x447e8b8) at msg/DispatchQueue.cc:185 #11 0x0000000000b5ee4d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104 #12 0x00007f24b0b79182 in start_thread (arg=0x7f249e053700) at pthread_create.c:312 #13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f249e0539c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154 #14 0x0000000000000000 in ?? ()
...which is blocked trying to get a PG::_lock, held by this thread
[Switching to thread 109 (Thread 0x7f2496043700 (LWP 3083))] #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory. (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f24b0b7b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f24b0b7b480 in __GI___pthread_mutex_lock (mutex=0x447eb88) at ../nptl/pthread_mutex_lock.c:79 #3 0x0000000000b32a7f in Mutex::Lock (this=this@entry=0x447eb78, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:91 #4 0x0000000000b5a3db in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:115 #5 SimpleMessenger::get_connection (this=0x447e700, dest=...) at msg/SimpleMessenger.cc:385 #6 0x0000000000662872 in OSDService::get_con_osd_cluster (this=this@entry=0x4546710, peer=1, from_epoch=<optimized out>) at osd/OSD.cc:700 #7 0x000000000066a8c7 in OSD::compat_must_dispatch_immediately (this=this@entry=0x4545000, pg=pg@entry=0xfc18c00) at osd/OSD.cc:7117 #8 0x0000000000699069 in OSD::process_peering_events (this=0x4545000, pgs=..., handle=...) at osd/OSD.cc:8459 #9 0x00000000006eead8 in OSD::PeeringWQ::_process (this=<optimized out>, pgs=..., handle=...) at osd/OSD.h:1587 #10 0x0000000000b73206 in ThreadPool::worker (this=0x45454b0, wt=0x4634120) at common/WorkQueue.cc:128 #11 0x0000000000b742b0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:318 #12 0x00007f24b0b79182 in start_thread (arg=0x7f2496043700) at pthread_create.c:312 #13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f24960439c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154 #14 0x0000000000000000 in ?? ()
which is trying to do a Connection lookup.
So that's an exciting 4-"lock" deadlock circle. :(
Updated by Greg Farnum over 9 years ago
From what I recall, none of these are simple locks to get rid of. I'm not actually sure how to go about it; even something extreme like figuring out how to shard up the SimpleMessenger lock and start using RWLocks wouldn't resolve this particular instance.
Updated by Greg Farnum over 9 years ago
- Status changed from 12 to Fix Under Review
- Assignee set to Greg Farnum
wip-9921, totally untested.
Updated by Greg Farnum over 9 years ago
- Assignee changed from Greg Farnum to Sage Weil
Giving Sage this ticket since he took the PR.
Updated by Samuel Just over 9 years ago
I think this is another instance:
ubuntu@teuthology:/a/teuthology-2014-11-13_17:33:44-upgrade:giant-x-next-distro-basic-vps/600134/remote
Updated by Sage Weil over 9 years ago
- Assignee changed from Sage Weil to Samuel Just
passed a smoke test, ready for rados run
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Pending Backport
- Assignee deleted (
Samuel Just) - Priority changed from Urgent to High
- Source changed from other to Q/A
- Backport set to giant
Updated by Loïc Dachary over 9 years ago
- http://pulpito.ceph.com/loic-2015-01-10_11:54:25-rados-giant-backports-testing-basic-vps/693791/ with remote/vpm099/log/ceph-osd.2.log.gz
- http://pulpito.ceph.com/loic-2015-01-10_11:54:25-rados-giant-backports-testing-basic-vps/693986/ with remote/vpm117/log/ceph-osd.3.log.gz
-5> 2015-01-10 20:52:19.108730 7f674ba82700 10 journal commit_start nothing to do -4> 2015-01-10 20:52:19.108732 7f674ba82700 10 journal commit_start -3> 2015-01-10 20:52:19.108740 7f674ba82700 20 filestore(/var/lib/ceph/osd/ceph-2) sync_entry waiting for max_interval 5.000000 -2> 2015-01-10 20:52:23.776586 7f6752a90700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6737756700' had timed out after 60 -1> 2015-01-10 20:52:23.776618 7f6752a90700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6737756700' had suicide timed out after 600 0> 2015-01-10 20:52:23.778171 7f6752a90700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f6752a90700 time 2015-01-10 20:52:23.776640 common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.87-87-g0ea20e6 (0ea20e6c51208d6710f469454ab3f964bfa7c9d2) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaa04bf] 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2e1) [0x9dfd61] 3: (ceph::HeartbeatMap::is_healthy()+0xb7) [0x9e0537] 4: (ceph::HeartbeatMap::check_touch_file()+0x23) [0x9e0ba3] 5: (CephContextServiceThread::entry()+0x131) [0xab4501] 6: (()+0x7e9a) [0x7f6755d5ce9a] 7: (clone()+0x6d) [0x7f67547272ed] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Loïc Dachary over 9 years ago
- giant backport https://github.com/ceph/ceph/pull/3356
Updated by Loïc Dachary about 9 years ago
- Status changed from Pending Backport to Resolved
Updated by Wido den Hollander almost 8 years ago
- Release set to hammer
- Affected Versions v0.94.5 added
Although this bug is marked as resolved I am seeing something similar happening with Hammer 0.94.5
In this 250 OSD cluster I see OSDs crash due to "hit suicide timeout" when one or more clients start to encounter heavy packetloss. In this case it's mostly caused by a full conntrack table on the client, but it can also happen when the line of a client is highly saturated.
OSDs crash with the "hit suicide timeout" or they start to respond very, very, very slowly.
-4> 2016-07-04 15:38:37.794230 7f8c9b63c700 10 monclient: renew_subs -3> 2016-07-04 15:38:37.794235 7f8c9b63c700 10 monclient: _send_mon_message to mon.charlie at [2a00:XX:XX:XX::6789:3]:6789/0 -2> 2016-07-04 15:38:39.944116 7f8cb45a3700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8c95630700' had timed out after 15 -1> 2016-07-04 15:38:39.944146 7f8cb45a3700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8c95630700' had suicide timed out after 150 0> 2016-07-04 15:38:39.952054 7f8cb45a3700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f8cb45a3700 time 2016-07-04 15:38:39.944180 common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb] 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2a9) [0xb02089] 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xb02916] 4: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xb02ff7] 5: (CephContextServiceThread::entry()+0x154) [0xbd61d4] 6: (()+0x8182) [0x7f8cb7a0d182] 7: (clone()+0x6d) [0x7f8cb5f5747d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
As soon as the client is working properly again the cluster continues just fine.
In this case there are about 30 clients on a 250 OSD cluster which does ~10k IOps.