Project

General

Profile

Actions

Bug #21662

closed

existing dependency AsyncMessenger::lock (5) -> MonClient::monc_lock (11) at:

Added by Jeff Layton over 6 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
MonClient
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I was running the LibCephFS.ShutdownRace test in an endless loop and eventually the program crashed. Unfortunately the subject line of this bug is the last line in the client log, so we don't have the existing backtrace. I do have a core however. Here's the backtrace of the thread that triggered abort():

(gdb) bt
#0  0x00007ffff662969b in raise () from /lib64/libc.so.6
#1  0x00007ffff662b4a0 in abort () from /lib64/libc.so.6
#2  0x00007fffedee146b in lockdep_will_lock (name=<optimized out>, id=<optimized out>, force_backtrace=false) at /home/jlayton/git/ceph/src/common/lockdep.cc:319
#3  0x00007fffedb9d0f9 in Mutex::_will_lock (this=0x7ffde8094770) at /home/jlayton/git/ceph/src/common/Mutex.h:56
#4  Mutex::Lock (this=this@entry=0x7ffde8094770, no_lockdep=no_lockdep@entry=false) at /home/jlayton/git/ceph/src/common/Mutex.cc:92
#5  0x00007fffedd49cd2 in Mutex::Locker::Locker (m=..., this=<synthetic pointer>) at /home/jlayton/git/ceph/src/common/Mutex.h:115
#6  AsyncMessenger::get_connection (this=this@entry=0x7ffde8094260, dest=...) at /home/jlayton/git/ceph/src/msg/async/AsyncMessenger.cc:531
#7  0x00007fffedc1b7e7 in MonClient::_add_conn (this=this@entry=0x7ffde8093c20, rank=0, global_id=global_id@entry=0) at /home/jlayton/git/ceph/src/mon/MonClient.cc:633
#8  0x00007fffedc1e4f3 in MonClient::_add_conns (this=this@entry=0x7ffde8093c20, global_id=0) at /home/jlayton/git/ceph/src/mon/MonClient.cc:665
#9  0x00007fffedc1ef87 in MonClient::_reopen_session (this=this@entry=0x7ffde8093c20, rank=rank@entry=-1) at /home/jlayton/git/ceph/src/mon/MonClient.cc:600
#10 0x00007fffedc1eb1d in MonClient::_renew_subs (this=this@entry=0x7ffde8093c20) at /home/jlayton/git/ceph/src/mon/MonClient.cc:823
#11 0x00007fffedf1bae2 in MonClient::renew_subs (this=0x7ffde8093c20) at /home/jlayton/git/ceph/src/mon/MonClient.h:286
#12 Objecter::_maybe_request_map (this=this@entry=0x7ffde81208d0) at /home/jlayton/git/ceph/src/osdc/Objecter.cc:1978
#13 0x00007fffedf36817 in Objecter::start (this=0x7ffde81208d0, o=o@entry=0x0) at /home/jlayton/git/ceph/src/osdc/Objecter.cc:404
#14 0x00007ffff7ad6aa7 in StandaloneClient::init (this=0x7ffde811f830) at /home/jlayton/git/ceph/src/client/Client.cc:13841
#15 0x00007ffff7ad0c44 in ceph_mount_info::init (this=0x7ffde808be40) at /home/jlayton/git/ceph/src/libcephfs.cc:98
#16 ceph_mount_info::mount (perms=..., mount_root="/", this=0x7ffde808be40) at /home/jlayton/git/ceph/src/libcephfs.cc:119
#17 ceph_mount (cmount=0x7ffde808be40, root=root@entry=0x555555637431 "/") at /home/jlayton/git/ceph/src/libcephfs.cc:448
#18 0x00005555555b0ca1 in shutdown_racer_func () at /home/jlayton/git/ceph/src/test/libcephfs/test.cc:1874
#19 0x00007ffff6fad01f in ?? () from /lib64/libstdc++.so.6
#20 0x00007ffff78a636d in start_thread () from /lib64/libpthread.so.0
#21 0x00007ffff6703bbf in clone () from /lib64/libc.so.6

This is basically just today's master branch + a patch that I have to fix bug #21512.

Actions #1

Updated by Jeff Layton over 6 years ago

I made the test race even harder, and I've hit several other bugs. e.g.:

2017-10-03 12:31:41.084906 7fffb67fc700 -1 asok(0x7fffa0140530) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/tmp/ceph-asok.8EFKsy/client.admin.9718.asok': (17) File exists

...and...

existing dependency PerfCounters::client (5) -> Client::client_lock (8) at:

2017-10-03 14:32:23.362998 7fff71ffb700 -1 WARNING: all dangerous and experimental features are enabled.
2017-10-03 14:32:23.363213 7ffd637de700 -1 WARNING: all dangerous and experimental features are enabled.
2017-10-03 14:32:23.363213 7fff517fa700 -1 WARNING: all dangerous and experimental features are enabled.
2017-10-03 14:32:23.363580 7fffb3fff700  0 new dependency Client::client_lock (8) -> MonClient::monc_lock (5) creates a cycle at
 ceph version 12.1.2-2633-g7eab8e9212e1 (7eab8e9212e150a053c678a1186012086cd9510f) mimic (dev)
 1: (MonClient::init()+0x187) [0x7fffedc1b027]
 2: (()+0x18a8d) [0x7ffff7ad6a8d]
 3: (ceph_mount()+0x144) [0x7ffff7ad0c44]
 4: (()+0x5cc89) [0x5555555b0c89]
 5: (()+0xbc01f) [0x7ffff6fad01f]
 6: (()+0x736d) [0x7ffff78a636d]
 7: (clone()+0x3f) [0x7ffff6703bbf]

We may just have to settle with putting a big global mutex around ceph_mount/unmount, but first I'll experiment with turning lockdep off -- it's possible that it's just not able to cope with this sort of concurrent setup/teardown.

Actions #2

Updated by Sage Weil almost 3 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF