Project

General

Profile

Actions

Bug #10637

closed

librbd: async resize down can deadlock (at least with stub librados)

Added by Josh Durgin over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This seems to be from a callback from a finisher waiting on another callback from an operation it runs, which ends up waiting to be executed by the same finisher in the librados test stub. This also happens at higher levels of concurrency, though less often. Backtraces:

thread 1:
Thread 1 (Thread 0x7fedc83ce760 (LWP 13331)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000092034a in Cond::Wait (this=0x7fff81c21e40, mutex=...) at ./common/Cond.h:55
#2  0x0000000000955204 in librbd::resize (ictx=0x7fedb4001c80, size=2097152, prog_ctx=...) at librbd/internal.cc:1594
#3  0x000000000091c482 in rbd_resize (image=0x7fedb4001c80, size=2097152) at librbd/librbd.cc:1055
#4  0x00000000008c0be6 in TestLibRBD_TestClone_Test::TestBody (this=0x4c6bc30) at test/librbd/test_librbd.cc:1282
#5  0x00000000009f263f in testing::Test::Run (this=0x4c6bc30) at ./src/gtest.cc:2095
#6  0x00000000009f2c99 in testing::internal::TestInfoImpl::Run (this=0x4c57640) at ./src/gtest.cc:2314
#7  0x00000000009f34de in testing::TestCase::Run (this=0x4c55c20) at ./src/gtest.cc:2420
#8  0x00000000009f8ddf in testing::internal::UnitTestImpl::RunAllTests (this=0x4c528f0) at ./src/gtest.cc:4024
#9  0x00000000009f6f9a in testing::UnitTest::Run (this=0xf892c0) at ./src/gtest.cc:3687

Thread 16 (Thread 0x7fedc0dad710 (LWP 13343)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000092034a in Cond::Wait (this=0x4c83068, mutex=...) at ./common/Cond.h:55
#2  0x00000000009a41d8 in librados::AioCompletionImpl::wait_for_safe (this=0x4c83010) at ./librados/AioCompletionImpl.h:85
#3  0x00000000009a3860 in librados::TestIoCtxImpl::operate (this=0x7fedb400e900, oid=..., ops=...) at test/librados_test_stub/TestIoCtxImpl.cc:144
#4  0x000000000099029a in librados::IoCtx::operate (this=0x7fedb4001e00, oid=..., op=0x7fedc0dacb90) at test/librados_test_stub/LibradosTestStub.cc:441
#5  0x0000000000966c79 in librbd::AsyncResizeHelperFinishContext::finish (this=0x7fedb400e0a0, r=0) at librbd/internal.cc:1715
#6  0x00000000009144bb in Context::complete (this=0x7fedb400e0a0, r=0) at ./include/Context.h:65
#7  0x0000000000a950fd in ceph::ContextCompletion::finish_op (this=0x7fedb4001510, r=0) at common/ContextCompletion.cc:44
#8  0x0000000000965372 in ceph::C_ContextCompletion::finish (this=0x7fedb4001aa0, r=0) at ./common/ContextCompletion.h:38
#9  0x00000000009144bb in Context::complete (this=0x7fedb4001aa0, r=0) at ./include/Context.h:65
#10 0x000000000096143e in librbd::rados_ctx_cb (c=0x7fedb401cc90, arg=0x7fedb4001aa0) at librbd/internal.cc:3407
#11 0x00000000009a98dd in finish_aio_completion (c=0x7fedb401cc90, r=0) at test/librados_test_stub/TestRadosClient.cc:47
#12 0x00000000009aab5c in librados::AioFunctionContext::finish (this=0x7fedb401ca30, r=0) at test/librados_test_stub/TestRadosClient.cc:72
#13 0x00000000009144bb in Context::complete (this=0x7fedb401ca30, r=0) at ./include/Context.h:65
#14 0x0000000000a5e308 in Finisher::finisher_thread_entry (this=0x4c6b480) at common/Finisher.cc:61
#15 0x0000000000927ec4 in Finisher::FinisherThread::entry (this=0x4c6b598) at ./common/Finisher.h:46
#16 0x0000000000a4e6e8 in Thread::entry_wrapper (this=0x4c6b598) at common/Thread.cc:61
#17 0x0000000000a4e65a in Thread::_entry_func (arg=0x4c6b598) at common/Thread.cc:45
#18 0x00007fedc7b988ba in start_thread (arg=<value optimized out>) at pthread_create.c:300
#19 0x00007fedc668802d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

This can be always be reproduced by:

LIBRADOS_CONCURRENCY=1 ./unittest_librbd --gtest_filter=TestLibRBD.TestClone
Actions #1

Updated by Jason Dillaman over 9 years ago

  • Status changed from New to Fix Under Review
Actions #2

Updated by Josh Durgin over 9 years ago

  • Status changed from Fix Under Review to Resolved
  • Assignee set to Jason Dillaman

commit:5301b2b7057a49eb336dbe2ab3e49f4d5bbdc0a9

Actions

Also available in: Atom PDF