Project

General

Profile

Actions

Bug #9921

closed

msgr/osd/pg dead lock giant

Added by Samuel Just over 9 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
giant
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2d6980570af2226fdee0edfcfe5a8e7f60fae615

/a/teuthology-2014-10-27_02:32:02-rados-giant-distro-basic-multi/573639
/a/samuelj-2014-10-27_21:32:10-rados-giant-wip-testing-vanilla-fixes-basic-multi/575303

Actions #1

Updated by Samuel Just over 9 years ago

  • Status changed from New to Duplicate

just kidding, this appears to be 9898

Actions #2

Updated by Samuel Just over 9 years ago

  • Subject changed from sync_entry hang to dead lock
  • Status changed from Duplicate to 12

nvm, different deadlock

Actions #3

Updated by Samuel Just over 9 years ago

  • Subject changed from dead lock to msgr/osd/pg dead lock giant
Actions #4

Updated by Greg Farnum over 9 years ago

SimpleMessenger lock is held by an accepting Pipe trying to replace an old Pipe:

[Switching to thread 100 (Thread 0x7f248b127700 (LWP 9607))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt 
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000c34575 in Wait (mutex=..., this=0x84c45d8) at ./common/Cond.h:55
#2  Pipe::stop_and_wait (this=this@entry=0x84c43c0) at msg/Pipe.cc:1437
#3  0x0000000000c42b96 in Pipe::accept (this=this@entry=0x84c4940) at msg/Pipe.cc:671
#4  0x0000000000c4654d in Pipe::reader (this=0x84c4940) at msg/Pipe.cc:1448
#5  0x0000000000c4f23d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:50
#6  0x00007f24b0b79182 in start_thread (arg=0x7f248b127700) at pthread_create.c:312
#7  0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f248b1279c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154
#8  0x0000000000000000 in ?? ()

It's waiting for the old Pipe to finish dispatching, which is blocked on on the pg_map_lock:

Switching to thread 94 (Thread 0x7f2485f7d700 (LWP 9512))]
#0  pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87
87      ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S: No such file or directory.
(gdb) bt
#0  pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87
#1  0x00000000006bc800 in RWLock::get_read (this=0x4546168) at ./common/RWLock.h:68
#2  0x000000000067a995 in RLocker (lock=..., this=<synthetic pointer>) at ./common/RWLock.h:122
#3  OSD::get_pg_or_queue_for_pg (this=this@entry=0x4545000, pgid=..., op=...) at osd/OSD.cc:2780
#4  0x00000000006e03d6 in OSD::handle_replica_op<MOSDSubOpReply, 77> (this=this@entry=0x4545000, op=..., osdmap=...) at osd/OSD.cc:8182
#5  0x000000000069ad9e in OSD::dispatch_op_fast (this=this@entry=0x4545000, op=..., osdmap=...) at osd/OSD.cc:5752
#6  0x000000000069afb8 in OSD::dispatch_session_waiting (this=this@entry=0x4545000, session=session@entry=0x9714d00, osdmap=...) at osd/OSD.cc:5402
#7  0x000000000069b39e in OSD::ms_fast_dispatch (this=0x4545000, m=<optimized out>) at osd/OSD.cc:5512
#8  0x0000000000c21b46 in ms_fast_dispatch (m=0x13eeedc0, this=0x447e700) at msg/Messenger.h:503
#9  DispatchQueue::fast_dispatch (this=0x447e8b8, m=0x13eeedc0) at msg/DispatchQueue.cc:71
#10 0x0000000000c465c6 in Pipe::reader (this=0x84c43c0) at msg/Pipe.cc:1591
#11 0x0000000000c4f23d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:50
#12 0x00007f24b0b79182 in start_thread (arg=0x7f2485f7d700) at pthread_create.c:312
#13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f2485f7d9c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154
#14 0x0000000000000000 in ?? ()

And that's held by the main dispatch thread doing handle_osd_map:

[Switching to thread 98 (Thread 0x7f249e053700 (LWP 3067))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f24b0b7b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f24b0b7b480 in __GI___pthread_mutex_lock (mutex=0xfc19060) at ../nptl/pthread_mutex_lock.c:79
#3  0x0000000000b32a7f in Mutex::Lock (this=this@entry=0xfc19050, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:91
#4  0x00000000007a2030 in PG::lock (this=0xfc18c00, no_lockdep=no_lockdep@entry=false) at osd/PG.cc:229
#5  0x00000000006a49b5 in OSD::consume_map (this=this@entry=0x4545000) at osd/OSD.cc:6621
#6  0x00000000006a6c35 in OSD::handle_osd_map (this=this@entry=0x4545000, m=m@entry=0x674b3c0) at osd/OSD.cc:6376
#7  0x00000000006a8c9b in OSD::_dispatch (this=this@entry=0x4545000, m=m@entry=0x674b3c0) at osd/OSD.cc:5806
#8  0x00000000006a91a7 in OSD::ms_dispatch (this=0x4545000, m=0x674b3c0) at osd/OSD.cc:5386
#9  0x0000000000c22af9 in ms_deliver_dispatch (m=0x674b3c0, this=0x447e700) at msg/Messenger.h:532
#10 DispatchQueue::entry (this=0x447e8b8) at msg/DispatchQueue.cc:185
#11 0x0000000000b5ee4d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104
#12 0x00007f24b0b79182 in start_thread (arg=0x7f249e053700) at pthread_create.c:312
#13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f249e0539c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154
#14 0x0000000000000000 in ?? ()

...which is blocked trying to get a PG::_lock, held by this thread

[Switching to thread 109 (Thread 0x7f2496043700 (LWP 3083))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f24b0b7b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f24b0b7b480 in __GI___pthread_mutex_lock (mutex=0x447eb88) at ../nptl/pthread_mutex_lock.c:79
#3  0x0000000000b32a7f in Mutex::Lock (this=this@entry=0x447eb78, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:91
#4  0x0000000000b5a3db in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:115
#5  SimpleMessenger::get_connection (this=0x447e700, dest=...) at msg/SimpleMessenger.cc:385
#6  0x0000000000662872 in OSDService::get_con_osd_cluster (this=this@entry=0x4546710, peer=1, from_epoch=<optimized out>) at osd/OSD.cc:700
#7  0x000000000066a8c7 in OSD::compat_must_dispatch_immediately (this=this@entry=0x4545000, pg=pg@entry=0xfc18c00) at osd/OSD.cc:7117
#8  0x0000000000699069 in OSD::process_peering_events (this=0x4545000, pgs=..., handle=...) at osd/OSD.cc:8459
#9  0x00000000006eead8 in OSD::PeeringWQ::_process (this=<optimized out>, pgs=..., handle=...) at osd/OSD.h:1587
#10 0x0000000000b73206 in ThreadPool::worker (this=0x45454b0, wt=0x4634120) at common/WorkQueue.cc:128
#11 0x0000000000b742b0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:318
#12 0x00007f24b0b79182 in start_thread (arg=0x7f2496043700) at pthread_create.c:312
#13 0x00007f24af0e3fbd in __qfcvt_r (value=0, ndigit=<optimized out>, decpt=0x7f24960439c0, sign=<optimized out>, buf=0x0, len=0) at efgcvt_r.c:154
#14 0x0000000000000000 in ?? ()

which is trying to do a Connection lookup.

So that's an exciting 4-"lock" deadlock circle. :(

Actions #5

Updated by Greg Farnum over 9 years ago

From what I recall, none of these are simple locks to get rid of. I'm not actually sure how to go about it; even something extreme like figuring out how to shard up the SimpleMessenger lock and start using RWLocks wouldn't resolve this particular instance.

Actions #6

Updated by Greg Farnum over 9 years ago

  • Status changed from 12 to Fix Under Review
  • Assignee set to Greg Farnum

wip-9921, totally untested.

Actions #8

Updated by Greg Farnum over 9 years ago

  • Assignee changed from Greg Farnum to Sage Weil

Giving Sage this ticket since he took the PR.

Actions #9

Updated by Samuel Just over 9 years ago

I think this is another instance:

ubuntu@teuthology:/a/teuthology-2014-11-13_17:33:44-upgrade:giant-x-next-distro-basic-vps/600134/remote

Actions #10

Updated by Sage Weil over 9 years ago

  • Assignee changed from Sage Weil to Samuel Just

passed a smoke test, ready for rados run

Actions #11

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee deleted (Samuel Just)
  • Priority changed from Urgent to High
  • Source changed from other to Q/A
  • Backport set to giant
Actions #12

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #13

Updated by Loïc Dachary over 9 years ago

  • http://pulpito.ceph.com/loic-2015-01-10_11:54:25-rados-giant-backports-testing-basic-vps/693791/ with remote/vpm099/log/ceph-osd.2.log.gz
  • http://pulpito.ceph.com/loic-2015-01-10_11:54:25-rados-giant-backports-testing-basic-vps/693986/ with remote/vpm117/log/ceph-osd.3.log.gz
        -5> 2015-01-10 20:52:19.108730 7f674ba82700 10 journal commit_start nothing to do
        -4> 2015-01-10 20:52:19.108732 7f674ba82700 10 journal commit_start
        -3> 2015-01-10 20:52:19.108740 7f674ba82700 20 filestore(/var/lib/ceph/osd/ceph-2) sync_entry waiting for max_interval 5.000000
        -2> 2015-01-10 20:52:23.776586 7f6752a90700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6737756700' had timed out after 60
        -1> 2015-01-10 20:52:23.776618 7f6752a90700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6737756700' had suicide timed out after 600
         0> 2015-01-10 20:52:23.778171 7f6752a90700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f6752a90700 time 2015-01-10 20:52:23.776640
    common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
    
     ceph version 0.87-87-g0ea20e6 (0ea20e6c51208d6710f469454ab3f964bfa7c9d2)
     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaa04bf]
     2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2e1) [0x9dfd61]
     3: (ceph::HeartbeatMap::is_healthy()+0xb7) [0x9e0537]
     4: (ceph::HeartbeatMap::check_touch_file()+0x23) [0x9e0ba3]
     5: (CephContextServiceThread::entry()+0x131) [0xab4501]
     6: (()+0x7e9a) [0x7f6755d5ce9a]
     7: (clone()+0x6d) [0x7f67547272ed]
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    
    
Actions #15

Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved
Actions #16

Updated by Wido den Hollander almost 8 years ago

  • Release set to hammer
  • Affected Versions v0.94.5 added

Although this bug is marked as resolved I am seeing something similar happening with Hammer 0.94.5

In this 250 OSD cluster I see OSDs crash due to "hit suicide timeout" when one or more clients start to encounter heavy packetloss. In this case it's mostly caused by a full conntrack table on the client, but it can also happen when the line of a client is highly saturated.

OSDs crash with the "hit suicide timeout" or they start to respond very, very, very slowly.

    -4> 2016-07-04 15:38:37.794230 7f8c9b63c700 10 monclient: renew_subs
    -3> 2016-07-04 15:38:37.794235 7f8c9b63c700 10 monclient: _send_mon_message to mon.charlie at [2a00:XX:XX:XX::6789:3]:6789/0
    -2> 2016-07-04 15:38:39.944116 7f8cb45a3700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8c95630700' had timed out after 15
    -1> 2016-07-04 15:38:39.944146 7f8cb45a3700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8c95630700' had suicide timed out after 150
     0> 2016-07-04 15:38:39.952054 7f8cb45a3700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f8cb45a3700 time 2016-07-04 15:38:39.944180
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2a9) [0xb02089]
 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xb02916]
 4: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xb02ff7]
 5: (CephContextServiceThread::entry()+0x154) [0xbd61d4]
 6: (()+0x8182) [0x7f8cb7a0d182]
 7: (clone()+0x6d) [0x7f8cb5f5747d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

As soon as the client is working properly again the cluster continues just fine.

In this case there are about 30 clients on a 250 OSD cluster which does ~10k IOps.

Actions

Also available in: Atom PDF