Project

General

Profile

Bug #44362

osd: uninitialized memory in sendmsg

Added by Sage Weil 9 months ago. Updated about 2 months ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

  <kind>SyscallParam</kind>
  <what>Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)</what>
  <stack>
    <frame>
      <ip>0xC24CAF7</ip>
      <obj>/usr/lib64/libpthread-2.28.so</obj>
      <fn>sendmsg</fn>
    </frame>
    <frame>
      <ip>0x10C685F</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>do_sendmsg</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>PosixStack.cc</file>
      <line>80</line>
    </frame>
    <frame>
      <ip>0x10C685F</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PosixConnectedSocketImpl::send(ceph::buffer::v14_2_0::list&amp;, bool)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>PosixStack.cc</file>
      <line>129</line>
    </frame>
    <frame>
      <ip>0x107C8D0</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>send</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>Stack.h</file>
      <line>100</line>
    </frame>
    <frame>
      <ip>0x107C8D0</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>AsyncConnection::_try_send(bool)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>AsyncConnection.cc</file>
      <line>330</line>
    </frame>
    <frame>
      <ip>0x107CEFA</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>AsyncConnection::write(ceph::buffer::v14_2_0::list&amp;, std::function&lt;void (long)&gt;, bool)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>AsyncConnection.cc</file>
      <line>309</line>
    </frame>
    <frame>
      <ip>0x10A7BA5</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ProtocolV2::write(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, CtFun&lt;ProtocolV2&gt;&amp;, ceph::buffer::v14_2_0::list&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>777</line>
    </frame>
...
  <auxwhat>Address 0xfc93680 is 0 bytes inside a block of size 4,096 alloc'd</auxwhat>
  <stack>
    <frame>
      <ip>0xA80751C</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>memalign</fn>
      <dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>908</line>
    </frame>
    <frame>
      <ip>0xA807629</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>posix_memalign</fn>
      <dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>1072</line>
    </frame>
    <frame>
      <ip>0xEE4B8D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>create</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir>
      <file>buffer.cc</file>
      <line>120</line>
    </frame>
    <frame>
      <ip>0xEE4B8D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::buffer::v14_2_0::list::refill_append_space(unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir>
      <file>buffer.cc</file>
      <line>1324</line>
    </frame>
    <frame>
      <ip>0xEE4FFA</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::buffer::v14_2_0::list::append_hole(unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir>
      <file>buffer.cc</file>
      <line>1444</line>
    </frame>
    <frame>
      <ip>0x10C7690</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::crypto::onwire::AES128GCM_OnWireTxHandler::authenticated_encrypt_final()</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>crypto_onwire.cc</file>
      <line>121</line>
    </frame>
    <frame>
      <ip>0x10C48DE</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>get_buffer</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>frames_v2.h</file>
      <line>274</line>
    </frame>
    <frame>
      <ip>0x10C48DE</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>Ct&lt;ProtocolV2&gt;* ProtocolV2::write&lt;ceph::msgr::v2::AuthSignatureFrame&gt;(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, CtFun&lt;ProtocolV2&gt;&amp;, ceph::msgr::v2::AuthSignatureFrame&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>763</line>
    </frame>
    <frame>
      <ip>0x10B27B8</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ProtocolV2::handle_auth_done(ceph::buffer::v14_2_0::list&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>1873</line>
    </frame>

/a/sage-2020-03-01_17:33:08-rados-wip-sage2-testing-2020-03-01-0811-distro-basic-smithi/4816360

Related issues

Related to RADOS - Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final() Resolved 03/20/2019

History

#1 Updated by Sage Weil 9 months ago

The regression is between these commits: d27f512d1731988cf7f369559f2fc324f1592047..7b0e18c09eb6060ee23f00c06dac4203a2b99608

bad: http://pulpito.ceph.com/sage-2020-03-02_17:19:00-rados:verify-master-distro-basic-smithi/
good: http://pulpito.ceph.com/sage-2020-03-02_17:20:04-rados:verify-master-distro-basic-smithi/

I assume it's this commit:

commit dc8946199983342217e30032b18ab7b90b8a83c6
Author: Yehuda Sadeh <ysadehwe@redhat.com>
Date:   Tue Feb 11 18:42:39 2020 -0800

    auth: treat mgr the same as mon when selecting auth mode

    Also use mon_cluster_modes (and not cluster_modes) when peer is mon/mgr.

    Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

diff --git a/src/auth/AuthRegistry.cc b/src/auth/AuthRegistry.cc
index 2229c8ab906..897aa81f089 100644
--- a/src/auth/AuthRegistry.cc
+++ b/src/auth/AuthRegistry.cc
@@ -196,6 +196,7 @@ void AuthRegistry::get_supported_methods(
     if (modes) {
       switch (peer_type) {
       case CEPH_ENTITY_TYPE_MON:
+      case CEPH_ENTITY_TYPE_MGR:
        *modes = mon_client_modes;
        break;
       default:
@@ -204,10 +205,12 @@ void AuthRegistry::get_supported_methods(
     }
     return;
   case CEPH_ENTITY_TYPE_MON:
-    // i am mon
+  case CEPH_ENTITY_TYPE_MGR:
+    // i am mon/mgr
     switch (peer_type) {
     case CEPH_ENTITY_TYPE_MON:
-      // they are mon
+    case CEPH_ENTITY_TYPE_MGR:
+      // they are mon/mgr
       if (methods) {
        *methods = cluster_methods;
       }
@@ -230,6 +233,14 @@ void AuthRegistry::get_supported_methods(
     switch (peer_type) {
     case CEPH_ENTITY_TYPE_MON:
     case CEPH_ENTITY_TYPE_MGR:
+      // they are a mon daemon
+      if (methods) {
+       *methods = cluster_methods;
+      }
+      if (modes) {
+       *modes = mon_cluster_modes;
+      }
+      break;
     case CEPH_ENTITY_TYPE_MDS:
     case CEPH_ENTITY_TYPE_OSD:
       // they are another daemon

which makes sense since the bad read is in the auth code somewhere. i'm not sure why it's happening though.

#2 Updated by Sage Weil 9 months ago

the takeaway from http://pulpito.ceph.com/sage-2020-03-02_17:19:00-rados:verify-master-distro-basic-smithi/ is that the mon_recovery (mon thrashing tests) trigger it, but the rados cls tests do not. both tests have ms failure injections. but the cls tests have msgr2 disabled... which is probably the reason?

#3 Updated by Yehuda Sadeh 9 months ago

It seems to me that the specific commit just exposed an existing issue that for some reason did't show up before (likely that the specific conditions weren't tested under valgrind).
Looking at the code here:

ceph::bufferlist AES128GCM_OnWireTxHandler::authenticated_encrypt_final()
{
  int final_len = 0;
  auto filler = buffer.append_hole(AESGCM_BLOCK_LEN);
  if(1 != EVP_EncryptFinal_ex(ectx.get(),
    reinterpret_cast<unsigned char*>(filler.c_str()),
    &final_len)) {
    throw std::runtime_error("EVP_EncryptFinal_ex failed");
  }
  ceph_assert_always(final_len == 0);

The assumption is that final_len is zero, therefore EVP_EncryptFinal_ex() isn't expected to write anything. That means that buffer created through buffer.append_hole() isn't getting initialized. I'm not sure why filler is even needed. Will try to reset buffer before call.

#4 Updated by Sage Weil 9 months ago

  • Status changed from New to In Progress
  • Assignee set to Yehuda Sadeh

#5 Updated by Radoslaw Zarzynski 9 months ago

  • Related to Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final() added

#6 Updated by Radoslaw Zarzynski 9 months ago

The hole represented by filler is supposed to carry two things:
  • zero-byte long ciphertext's fragment acquired from EVP_EncryptFinal_ex. Generally speaking, the function should be provided with one cipher's block of extra space for the sake of padding. GCM doesn't pad but the interface is unclear whether it's safe to provide nullptr in such case. Because of interpreting the documentation literally, we pass valid pointer with enough space, and later ensure nothing has been really written to it.
  • The authentication tag which should be exactly AESGCM_BLOCK_LEN long. This aspect is much more important that the previous one. EVP_CTRL_GCM_GET_TAG should truly fill the buffer.
  auto filler = buffer.append_hole(AESGCM_BLOCK_LEN);
  // ...
  static_assert(AESGCM_BLOCK_LEN == AESGCM_TAG_LEN);
  if(1 != EVP_CIPHER_CTX_ctrl(ectx.get(),
        EVP_CTRL_GCM_GET_TAG, AESGCM_TAG_LEN,
        filler.c_str())) {
    throw std::runtime_error("EVP_CIPHER_CTX_ctrl failed");
  }

#7 Updated by Sage Weil 9 months ago

Merged https://github.com/ceph/ceph/pull/33757 ... should we keep this open or close it?

#8 Updated by Yehuda Sadeh 9 months ago

@sage I think we can close it. It seems that my research tracks @rzarzynski's, so I'll take his original conclusions.

#9 Updated by Sage Weil 9 months ago

hmm, seeing this now on master, after the existing whitelist was updated to the new symbols in 31a7a461382a3a979c12e114c9266366285487ca:

  <kind>SyscallParam</kind>
  <what>Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)</what>
  <stack>
    <frame>
      <ip>0xC24FAF7</ip>
      <obj>/usr/lib64/libpthread-2.28.so</obj>
      <fn>sendmsg</fn>
    </frame>
    <frame>
      <ip>0x10C977F</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>do_sendmsg</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
      <file>PosixStack.cc</file>
      <line>80</line>
    </frame>
    <frame>
      <ip>0x10C977F</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PosixConnectedSocketImpl::send(ceph::buffer::v15_2_0::list&amp;, bool)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
      <file>PosixStack.cc</file>
      <line>129</line>
    </frame>
    <frame>
      <ip>0x107F7D0</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>send</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
      <file>Stack.h</file>
      <line>100</line>
    </frame>
    <frame>
      <ip>0x107F7D0</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>AsyncConnection::_try_send(bool)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
      <file>AsyncConnection.cc</file>
      <line>330</line>
    </frame>
    <frame>
      <ip>0x107FDFA</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>AsyncConnection::write(ceph::buffer::v15_2_0::list&amp;, std::function&lt;void (long)&gt;, bool)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
      <file>AsyncConnection.cc</file>
      <line>309</line>
    </frame>
...
  <auxwhat>Address 0xfc805b0 is 0 bytes inside a block of size 4,096 alloc'd</auxwhat>
  <stack>
    <frame>
      <ip>0xA80A51C</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>memalign</fn>
      <dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>908</line>
    </frame>
    <frame>
      <ip>0xA80A629</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>posix_memalign</fn>
      <dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>1072</line>
    </frame>
    <frame>
      <ip>0xEE7A6D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>create</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir>
      <file>buffer.cc</file>
      <line>120</line>
    </frame>
    <frame>
      <ip>0xEE7A6D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::buffer::v15_2_0::list::refill_append_space(unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir>
      <file>buffer.cc</file>
      <line>1324</line>
    </frame>
    <frame>
      <ip>0xEE7EDA</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::buffer::v15_2_0::list::append_hole(unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir>
      <file>buffer.cc</file>
      <line>1444</line>
    </frame>
    <frame>
      <ip>0x10CA5B0</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::crypto::onwire::AES128GCM_OnWireTxHandler::authenticated_encrypt_final()</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
      <file>crypto_onwire.cc</file>
      <line>121</line>
    </frame>

/a/sage-2020-03-07_13:02:09-rados-master-distro-basic-smithi/4834609
/a/sage-2020-03-06_21:03:21-rados-wip-sage2-testing-2020-03-06-1128-distro-basic-smithi/4832076
and others

#10 Updated by Radoslaw Zarzynski 9 months ago

@Yehuda: what's the status of this ticket?
Are you able to replicate the issue locally or it happens solely at sepia?

#11 Updated by Neha Ojha about 2 months ago

  • Status changed from In Progress to Can't reproduce

Also available in: Atom PDF