Project

General

Profile

Bug #9921

msgr/osd/pg dead lock giant

Added by Samuel Just over 9 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
giant
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2d6980570af2226fdee0edfcfe5a8e7f60fae615

/a/teuthology-2014-10-27_02:32:02-rados-giant-distro-basic-multi/573639
/a/samuelj-2014-10-27_21:32:10-rados-giant-wip-testing-vanilla-fixes-basic-multi/575303

Associated revisions

Revision 1dbe8f5a (diff)
Added by Greg Farnum about 9 years ago

SimpleMessenger: Pipe: do not block on takeover while holding global lock

We previously were able to cause deadlocks:
1) Existing pipe is fast_dispatching
2) Replacement incoming pipe is accepted
*) blocks on stop_and_wait() of existing Pipe
3) External things are blocked on SimpleMessenger::lock() while
blocking completion of the fast dispatch.

To resolve this, if we detect that an existing Pipe we want to take over is
in the process of fast dispatching, we unlock our locks and wait on it to
finish. Then we go back to the lookup step and retry.

The effect of this should be safe:
1) We are not making any changes to the existing Pipe in new ways
2) We have not registered the new Pipe anywhere
3) We have not sent back any replies based on Messenger state to
the remote endpoint.

Backport: giant
Fixes: #9921
Signed-off-by: Greg Farnum <>
(cherry picked from commit 2d6980570af2226fdee0edfcfe5a8e7f60fae615)

History

#1 Updated by Samuel Just over 9 years ago

  • Status changed from New to Duplicate

just kidding, this appears to be 9898

#2 Updated by Samuel Just over 9 years ago

  • Subject changed from sync_entry hang to dead lock
  • Status changed from Duplicate to 12

nvm, different deadlock

#3 Updated by Samuel Just over 9 years ago

  • Subject changed from dead lock to msgr/osd/pg dead lock giant

#4 Updated by Greg Farnum over 9 years ago

SimpleMessenger lock is held by an accepting Pipe trying to replace an old Pipe:

[Switching to thread 100 (Thread 0x7f248b127700 (LWP 9607))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt 
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000c34575 in Wait (mutex=..., this=0x84c45d8) at ./common/Cond.h:55
#2  Pipe::stop_and_wait (this=this@entry=0x84c43c0) at msg/Pipe.cc:1437
#3  0x0000000000c42b96 in Pipe::accept (this=this@entry=0x84c4940) at msg/Pipe.cc:671
#4  0x0000000000c4654d in Pipe::reader (this=0x84c4940) at msg/Pipe.cc:1448
#5  0x0000000000c4f23d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:50
#6  0x00007f24b0b79182 in start_thread (arg=0x7f248b127700) at pthread_create.c:312
#7  0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f248b1279c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154
#8  0x0000000000000000 in ?? ()

It's waiting for the old Pipe to finish dispatching, which is blocked on on the pg_map_lock:

Switching to thread 94 (Thread 0x7f2485f7d700 (LWP 9512))]
#0  pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87
87      ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S: No such file or directory.
(gdb) bt
#0  pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87
#1  0x00000000006bc800 in RWLock::get_read (this=0x4546168) at ./common/RWLock.h:68
#2  0x000000000067a995 in RLocker (lock=..., this=<synthetic pointer>) at ./common/RWLock.h:122
#3  OSD::get_pg_or_queue_for_pg (this=this@entry=0x4545000, pgid=..., op=...) at osd/OSD.cc:2780
#4  0x00000000006e03d6 in OSD::handle_replica_op<MOSDSubOpReply, 77> (this=this@entry=0x4545000, op=..., osdmap=...) at osd/OSD.cc:8182
#5  0x000000000069ad9e in OSD::dispatch_op_fast (this=this@entry=0x4545000, op=..., osdmap=...) at osd/OSD.cc:5752
#6  0x000000000069afb8 in OSD::dispatch_session_waiting (this=this@entry=0x4545000, session=session@entry=0x9714d00, osdmap=...) at osd/OSD.cc:5402
#7  0x000000000069b39e in OSD::ms_fast_dispatch (this=0x4545000, m=<optimized out>) at osd/OSD.cc:5512
#8  0x0000000000c21b46 in ms_fast_dispatch (m=0x13eeedc0, this=0x447e700) at msg/Messenger.h:503
#9  DispatchQueue::fast_dispatch (this=0x447e8b8, m=0x13eeedc0) at msg/DispatchQueue.cc:71
#10 0x0000000000c465c6 in Pipe::reader (this=0x84c43c0) at msg/Pipe.cc:1591
#11 0x0000000000c4f23d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:50
#12 0x00007f24b0b79182 in start_thread (arg=0x7f2485f7d700) at pthread_create.c:312
#13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f2485f7d9c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154
#14 0x0000000000000000 in ?? ()

And that's held by the main dispatch thread doing handle_osd_map:

[Switching to thread 98 (Thread 0x7f249e053700 (LWP 3067))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f24b0b7b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f24b0b7b480 in __GI___pthread_mutex_lock (mutex=0xfc19060) at ../nptl/pthread_mutex_lock.c:79
#3  0x0000000000b32a7f in Mutex::Lock (this=this@entry=0xfc19050, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:91
#4  0x00000000007a2030 in PG::lock (this=0xfc18c00, no_lockdep=no_lockdep@entry=false) at osd/PG.cc:229
#5  0x00000000006a49b5 in OSD::consume_map (this=this@entry=0x4545000) at osd/OSD.cc:6621
#6  0x00000000006a6c35 in OSD::handle_osd_map (this=this@entry=0x4545000, m=m@entry=0x674b3c0) at osd/OSD.cc:6376
#7  0x00000000006a8c9b in OSD::_dispatch (this=this@entry=0x4545000, m=m@entry=0x674b3c0) at osd/OSD.cc:5806
#8  0x00000000006a91a7 in OSD::ms_dispatch (this=0x4545000, m=0x674b3c0) at osd/OSD.cc:5386
#9  0x0000000000c22af9 in ms_deliver_dispatch (m=0x674b3c0, this=0x447e700) at msg/Messenger.h:532
#10 DispatchQueue::entry (this=0x447e8b8) at msg/DispatchQueue.cc:185
#11 0x0000000000b5ee4d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104
#12 0x00007f24b0b79182 in start_thread (arg=0x7f249e053700) at pthread_create.c:312
#13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f249e0539c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154
#14 0x0000000000000000 in ?? ()

...which is blocked trying to get a PG::_lock, held by this thread

[Switching to thread 109 (Thread 0x7f2496043700 (LWP 3083))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f24b0b7b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f24b0b7b480 in __GI___pthread_mutex_lock (mutex=0x447eb88) at ../nptl/pthread_mutex_lock.c:79
#3  0x0000000000b32a7f in Mutex::Lock (this=this@entry=0x447eb78, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:91
#4  0x0000000000b5a3db in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:115
#5  SimpleMessenger::get_connection (this=0x447e700, dest=...) at msg/SimpleMessenger.cc:385
#6  0x0000000000662872 in OSDService::get_con_osd_cluster (this=this@entry=0x4546710, peer=1, from_epoch=<optimized out>) at osd/OSD.cc:700
#7  0x000000000066a8c7 in OSD::compat_must_dispatch_immediately (this=this@entry=0x4545000, pg=pg@entry=0xfc18c00) at osd/OSD.cc:7117
#8  0x0000000000699069 in OSD::process_peering_events (this=0x4545000, pgs=..., handle=...) at osd/OSD.cc:8459
#9  0x00000000006eead8 in OSD::PeeringWQ::_process (this=<optimized out>, pgs=..., handle=...) at osd/OSD.h:1587
#10 0x0000000000b73206 in ThreadPool::worker (this=0x45454b0, wt=0x4634120) at common/WorkQueue.cc:128
#11 0x0000000000b742b0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:318
#12 0x00007f24b0b79182 in start_thread (arg=0x7f2496043700) at pthread_create.c:312
#13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f24960439c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154
#14 0x0000000000000000 in ?? ()

which is trying to do a Connection lookup.

So that's an exciting 4-"lock" deadlock circle. :(

#5 Updated by Greg Farnum over 9 years ago

From what I recall, none of these are simple locks to get rid of. I'm not actually sure how to go about it; even something extreme like figuring out how to shard up the SimpleMessenger lock and start using RWLocks wouldn't resolve this particular instance.

#6 Updated by Greg Farnum over 9 years ago

  • Status changed from 12 to Fix Under Review
  • Assignee set to Greg Farnum

wip-9921, totally untested.

#8 Updated by Greg Farnum over 9 years ago

  • Assignee changed from Greg Farnum to Sage Weil

Giving Sage this ticket since he took the PR.

#9 Updated by Samuel Just over 9 years ago

I think this is another instance:

ubuntu@teuthology:/a/teuthology-2014-11-13_17:33:44-upgrade:giant-x-next-distro-basic-vps/600134/remote

#10 Updated by Sage Weil over 9 years ago

  • Assignee changed from Sage Weil to Samuel Just

passed a smoke test, ready for rados run

#11 Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee deleted (Samuel Just)
  • Priority changed from Urgent to High
  • Source changed from other to Q/A
  • Backport set to giant

#12 Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)

#13 Updated by Loïc Dachary about 9 years ago

  • http://pulpito.ceph.com/loic-2015-01-10_11:54:25-rados-giant-backports-testing-basic-vps/693791/ with remote/vpm099/log/ceph-osd.2.log.gz
  • http://pulpito.ceph.com/loic-2015-01-10_11:54:25-rados-giant-backports-testing-basic-vps/693986/ with remote/vpm117/log/ceph-osd.3.log.gz
        -5> 2015-01-10 20:52:19.108730 7f674ba82700 10 journal commit_start nothing to do
        -4> 2015-01-10 20:52:19.108732 7f674ba82700 10 journal commit_start
        -3> 2015-01-10 20:52:19.108740 7f674ba82700 20 filestore(/var/lib/ceph/osd/ceph-2) sync_entry waiting for max_interval 5.000000
        -2> 2015-01-10 20:52:23.776586 7f6752a90700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6737756700' had timed out after 60
        -1> 2015-01-10 20:52:23.776618 7f6752a90700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6737756700' had suicide timed out after 600
         0> 2015-01-10 20:52:23.778171 7f6752a90700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f6752a90700 time 2015-01-10 20:52:23.776640
    common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
    
     ceph version 0.87-87-g0ea20e6 (0ea20e6c51208d6710f469454ab3f964bfa7c9d2)
     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaa04bf]
     2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2e1) [0x9dfd61]
     3: (ceph::HeartbeatMap::is_healthy()+0xb7) [0x9e0537]
     4: (ceph::HeartbeatMap::check_touch_file()+0x23) [0x9e0ba3]
     5: (CephContextServiceThread::entry()+0x131) [0xab4501]
     6: (()+0x7e9a) [0x7f6755d5ce9a]
     7: (clone()+0x6d) [0x7f67547272ed]
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    
    

#15 Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved

#16 Updated by Wido den Hollander over 7 years ago

  • Release set to hammer
  • Affected Versions v0.94.5 added

Although this bug is marked as resolved I am seeing something similar happening with Hammer 0.94.5

In this 250 OSD cluster I see OSDs crash due to "hit suicide timeout" when one or more clients start to encounter heavy packetloss. In this case it's mostly caused by a full conntrack table on the client, but it can also happen when the line of a client is highly saturated.

OSDs crash with the "hit suicide timeout" or they start to respond very, very, very slowly.

    -4> 2016-07-04 15:38:37.794230 7f8c9b63c700 10 monclient: renew_subs
    -3> 2016-07-04 15:38:37.794235 7f8c9b63c700 10 monclient: _send_mon_message to mon.charlie at [2a00:XX:XX:XX::6789:3]:6789/0
    -2> 2016-07-04 15:38:39.944116 7f8cb45a3700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8c95630700' had timed out after 15
    -1> 2016-07-04 15:38:39.944146 7f8cb45a3700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8c95630700' had suicide timed out after 150
     0> 2016-07-04 15:38:39.952054 7f8cb45a3700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f8cb45a3700 time 2016-07-04 15:38:39.944180
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2a9) [0xb02089]
 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xb02916]
 4: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xb02ff7]
 5: (CephContextServiceThread::entry()+0x154) [0xbd61d4]
 6: (()+0x8182) [0x7f8cb7a0d182]
 7: (clone()+0x6d) [0x7f8cb5f5747d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

As soon as the client is working properly again the cluster continues just fine.

In this case there are about 30 clients on a 250 OSD cluster which does ~10k IOps.

Also available in: Atom PDF