Project

General

Profile

Bug #13396

use after free of cct in ObjectCacher::flusher

Added by Josh Durgin over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

So far only seen in hammer-backports, so it may be an issue with a backport. Trying to reproduce now.

This is the run: http://149.202.171.88:8081/ubuntu-2015-10-05_16:45:42-rbd-hammer-backports---basic-openstack/26/

ceph_test_librbd crashed after getting to this point:

2015-10-05T21:55:16.315 INFO:tasks.workunit.client.0.target173053.stdout:[ RUN      ] TestImageWatcher.NotifyAsyncTimedOut

Threads with non-boring backtraces:

Thread 41 (Thread 0x7f98c288c840 (LWP 10270)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000704469 in Wait (mutex=..., this=0x7ffdde41e660) at ./common/Cond.h:55
#2  librados::IoCtxImpl::operate_read (this=this@entry=0x4c7eb20, oid=..., o=o@entry=0x7ffdde41e760, pbl=pbl@entry=0x0, flags=flags@entry=0) at librados/IoCtxImpl.cc:558
#3  0x00000000007082ad in librados::IoCtxImpl::stat (this=0x4c7eb20, oid=..., psize=0x7ffdde41e758, psize@entry=0x0, pmtime=pmtime@entry=0x0) at librados/IoCtxImpl.cc:984
#4  0x00000000006cb584 in librados::IoCtx::stat (this=this@entry=0x4cb2540, oid=..., psize=psize@entry=0x0, pmtime=pmtime@entry=0x0) at librados/librados.cc:1150
#5  0x0000000000681245 in librbd::detect_format (io_ctx=..., name=..., old_format=old_format@entry=0x0, size=size@entry=0x0) at librbd/internal.cc:169
#6  0x00000000006841c5 in librbd::create (io_ctx=..., imgname=imgname@entry=0x4c86fc8 "image21", size=size@entry=2097152, old_format=old_format@entry=false, features=features@entry=1, order=order@entry=0x7ffdde41ec34, stripe_unit=stripe_unit@entry=0, 
    stripe_count=stripe_count@entry=0) at librbd/internal.cc:1050
#7  0x000000000064c1b8 in librbd::RBD::create2 (this=this@entry=0x4cb2548, io_ctx=..., name=0x4c86fc8 "image21", size=size@entry=2097152, features=1, order=order@entry=0x7ffdde41ec34) at librbd/librbd.cc:262
#8  0x000000000064b3dd in create_image_pp (rbd=..., ioctx=..., name=..., size=size@entry=2097152) at test/librbd/test_support.cc:29
#9  0x000000000064ac43 in TestFixture::SetUp (this=0x4cb2530) at test/librbd/test_fixture.cc:40
#10 0x00000000009d7d23 in HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0xaab08e "SetUp()", method=<optimized out>, object=<optimized out>) at ./src/gtest.cc:2078
#11 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x4cb2530, method=<optimized out>, location=location@entry=0xaab08e "SetUp()") at ./src/gtest.cc:2114
#12 0x00000000009caa90 in testing::Test::Run (this=this@entry=0x4cb2530) at ./src/gtest.cc:2146
#13 0x00000000009cab9e in testing::TestInfo::Run (this=0x4b885e0) at ./src/gtest.cc:2326
#14 0x00000000009caca5 in testing::TestCase::Run (this=0x4b868e0) at ./src/gtest.cc:2444
#15 0x00000000009caf58 in testing::internal::UnitTestImpl::RunAllTests (this=0x4b82dd0) at ./src/gtest.cc:4315
#16 0x00000000009cb204 in HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=<optimized out>, method=<optimized out>, object=<optimized out>) at ./src/gtest.cc:2078
#17 HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0xaac438 "auxiliary test code (environments or event listeners)", 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x9cad10 <testing::internal::UnitTestImpl::RunAllTests()>, object=0x4b82dd0) at ./src/gtest.cc:2114
#18 testing::UnitTest::Run (this=<optimized out>) at ./src/gtest.cc:3929
#19 0x00000000005ca51b in RUN_ALL_TESTS () at ../src/gmock/gtest/include/gtest/gtest.h:2288
#20 main (argc=1, argv=<optimized out>) at test/librbd/test_main.cc:45

Thread 32 (Thread 0x7f98b45f6700 (LWP 14654)):
#0  0x00007f98c1f61745 in ?? () from /usr/lib/x86_64-linux-gnu/libnss3.so
#1  0x00007f98c1f6190a in ?? () from /usr/lib/x86_64-linux-gnu/libnss3.so
#2  0x00007f98c1f6266a in PK11_ImportSymKey () from /usr/lib/x86_64-linux-gnu/libnss3.so
#3  0x0000000000930aad in nss_aes_operation (op=260, secret=..., in=..., out=..., error=...) at auth/Crypto.cc:127
#4  0x000000000092f8dd in CryptoKey::encrypt (this=this@entry=0x7f9890000c08, cct=cct@entry=0x4c7c450, in=..., out=..., error=...) at auth/Crypto.cc:358
#5  0x000000000092f2ac in encode_encrypt_enc_bl<ceph::buffer::list> (error=..., out=..., key=..., t=..., cct=0x4c7c450) at auth/cephx/CephxProtocol.h:465
#6  encode_encrypt<ceph::buffer::list> (cct=0x4c7c450, t=..., key=..., out=..., error=...) at auth/cephx/CephxProtocol.h:490
#7  0x000000000092d24f in CephxSessionHandler::check_message_signature (this=0x7f9890000bf0, m=0x7f9888006fe0) at auth/cephx/CephxSessionHandler.cc:98
#8  0x00000000008e0875 in Pipe::read_message (this=this@entry=0x4c8d000, pm=pm@entry=0x7f98b45f5cb0, auth_handler=auth_handler@entry=0x7f9890000bf0) at msg/simple/Pipe.cc:2085
#9  0x00000000008edf89 in Pipe::reader (this=0x4c8d000) at msg/simple/Pipe.cc:1581
#10 0x00000000008f7a0d in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:50
#11 0x00007f98c2258182 in start_thread (arg=0x7f98b45f6700) at pthread_create.c:312
#12 0x00007f98c078f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7f986fff7700 (LWP 14524)):
#0  ceph::log::SubsystemMap::should_gather (this=0x90, sub=19, level=11) at ./log/SubsystemMap.h:62
#1  0x000000000074b7ce in ObjectCacher::flusher_entry (this=0x4c72b80) at osdc/ObjectCacher.cc:1527
#2  0x000000000075c29d in ObjectCacher::FlusherThread::entry (this=<optimized out>) at osdc/ObjectCacher.h:374
#3  0x00007f98c2258182 in start_thread (arg=0x7f986fff7700) at pthread_create.c:312
#4  0x00007f98c078f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This ObjectCacher->cct is clearly bogus (zero ref count, nonsense for most other values):

p *this->cct
$3 = {nref = {val = 0}, _conf = 0x0, _log = 0x0, _module_type = 65, _crypto_inited = false, _service_thread = 0x1b, _log_obs = 0x1b, _admin_socket = 0x0, _service_thread_lock = {lock = 1396986703}, _perf_counters_collection = 0x706d6f633a3a6e6f, 
  _perf_counters_conf_obs = 0x6c5f6e6f6974656c, _admin_hook = 0x6b636f <librbd::WatchNotify::NotifyMessage::decode(ceph::buffer::list::iterator&)+3183>, _heartbeat_map = 0x61, _associated_objs_lock = {lock = 79982328}, _associated_objs = {_M_t = {
      _M_impl = {<std::allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, CephContext::AssociatedSingletonObject*> > >> = {<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, CephContext::AssociatedSingletonObject*> > >> = {<No data fields>}, <No data fields>}, 
        _M_key_compare = {<std::binary_function<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool>> = {<No data fields>}, <No data fields>}, _M_header = {_M_color = std::_S_red, 
          _M_parent = 0x0, _M_left = 0x2, _M_right = 0x0}, _M_node_count = 0}}}, _crypto_none = 0x0, _crypto_aes = 0x0, _cct_obs = 0x0, _feature_lock = {lock = 0}, _experimental_features = {_M_t = {
      _M_impl = {<std::allocator<std::_Rb_tree_node<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {<No data fields>}, <No data fields>}, _M_key_compare = {<std::binary_function<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool>> = {<No data fields>}, <No data fields>}, _M_header = {
          _M_color = (std::_S_black | unknown: 26), _M_parent = 0x1b, _M_left = 0x0, _M_right = 0x697373655344534f}, _M_node_count = 8101253776678743663}}}, _lockdep_obs = 0x6c5f6e6f6974656c}

History

#1 Updated by Josh Durgin over 8 years ago

reproduced the first time on hammer-backports, trying against plain hammer

#2 Updated by Josh Durgin over 8 years ago

Doesn't seem to be reproducing on hammer, though I don't see any suspicious changes in hammer-backports. I'm guessing it's an existing race that's made easier to trigger.

In the original core, it's odd that the test-running thread is creating image21, while the object cacher with the invalid cct references image44:

#1  0x000000000074b7ce in ObjectCacher::flusher_entry (this=0x4c72b80) at osdc/ObjectCacher.cc:1527
1527    osdc/ObjectCacher.cc: No such file or directory.
(gdb) p this->name
$6 = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x4c7c608 "librbd--test-rados-api-target173053.teuthology-10270-1/image44"}}
(gdb) thread 41
(gdb) bt
...
#6  0x00000000006841c5 in librbd::create (io_ctx=..., imgname=imgname@entry=0x4c86fc8 "image21", size=size@entry=2097152, old_format=old_format@entry=false, features=features@entry=1, order=order@entry=0x7ffdde41ec34, stripe_unit=stripe_unit@entry=0, 
...

#3 Updated by Josh Durgin over 8 years ago

  • Status changed from New to Resolved

This was only an issue with the backport.

It should be fixed in https://github.com/ceph/ceph/pull/6097 now, with the additional backport of commit:a5e88fcf11ebb16d4a1405b21cf6f895b37f185e

Also available in: Atom PDF