Bug #44362
osd: uninitialized memory in sendmsg
0%
Description
<kind>SyscallParam</kind> <what>Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)</what> <stack> <frame> <ip>0xC24CAF7</ip> <obj>/usr/lib64/libpthread-2.28.so</obj> <fn>sendmsg</fn> </frame> <frame> <ip>0x10C685F</ip> <obj>/usr/bin/ceph-osd</obj> <fn>do_sendmsg</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>PosixStack.cc</file> <line>80</line> </frame> <frame> <ip>0x10C685F</ip> <obj>/usr/bin/ceph-osd</obj> <fn>PosixConnectedSocketImpl::send(ceph::buffer::v14_2_0::list&, bool)</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>PosixStack.cc</file> <line>129</line> </frame> <frame> <ip>0x107C8D0</ip> <obj>/usr/bin/ceph-osd</obj> <fn>send</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>Stack.h</file> <line>100</line> </frame> <frame> <ip>0x107C8D0</ip> <obj>/usr/bin/ceph-osd</obj> <fn>AsyncConnection::_try_send(bool)</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>AsyncConnection.cc</file> <line>330</line> </frame> <frame> <ip>0x107CEFA</ip> <obj>/usr/bin/ceph-osd</obj> <fn>AsyncConnection::write(ceph::buffer::v14_2_0::list&, std::function<void (long)>, bool)</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>AsyncConnection.cc</file> <line>309</line> </frame> <frame> <ip>0x10A7BA5</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ProtocolV2::write(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CtFun<ProtocolV2>&, ceph::buffer::v14_2_0::list&)</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>ProtocolV2.cc</file> <line>777</line> </frame> ... <auxwhat>Address 0xfc93680 is 0 bytes inside a block of size 4,096 alloc'd</auxwhat> <stack> <frame> <ip>0xA80751C</ip> <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj> <fn>memalign</fn> <dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir> <file>vg_replace_malloc.c</file> <line>908</line> </frame> <frame> <ip>0xA807629</ip> <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj> <fn>posix_memalign</fn> <dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir> <file>vg_replace_malloc.c</file> <line>1072</line> </frame> <frame> <ip>0xEE4B8D</ip> <obj>/usr/bin/ceph-osd</obj> <fn>create</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir> <file>buffer.cc</file> <line>120</line> </frame> <frame> <ip>0xEE4B8D</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::buffer::v14_2_0::list::refill_append_space(unsigned int)</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir> <file>buffer.cc</file> <line>1324</line> </frame> <frame> <ip>0xEE4FFA</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::buffer::v14_2_0::list::append_hole(unsigned int)</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir> <file>buffer.cc</file> <line>1444</line> </frame> <frame> <ip>0x10C7690</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::crypto::onwire::AES128GCM_OnWireTxHandler::authenticated_encrypt_final()</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>crypto_onwire.cc</file> <line>121</line> </frame> <frame> <ip>0x10C48DE</ip> <obj>/usr/bin/ceph-osd</obj> <fn>get_buffer</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>frames_v2.h</file> <line>274</line> </frame> <frame> <ip>0x10C48DE</ip> <obj>/usr/bin/ceph-osd</obj> <fn>Ct<ProtocolV2>* ProtocolV2::write<ceph::msgr::v2::AuthSignatureFrame>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CtFun<ProtocolV2>&, ceph::msgr::v2::AuthSignatureFrame&)</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>ProtocolV2.cc</file> <line>763</line> </frame> <frame> <ip>0x10B27B8</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ProtocolV2::handle_auth_done(ceph::buffer::v14_2_0::list&)</fn> <dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir> <file>ProtocolV2.cc</file> <line>1873</line> </frame>
/a/sage-2020-03-01_17:33:08-rados-wip-sage2-testing-2020-03-01-0811-distro-basic-smithi/4816360
Related issues
History
#1 Updated by Sage Weil over 3 years ago
The regression is between these commits: d27f512d1731988cf7f369559f2fc324f1592047..7b0e18c09eb6060ee23f00c06dac4203a2b99608
bad: http://pulpito.ceph.com/sage-2020-03-02_17:19:00-rados:verify-master-distro-basic-smithi/
good: http://pulpito.ceph.com/sage-2020-03-02_17:20:04-rados:verify-master-distro-basic-smithi/
I assume it's this commit:
commit dc8946199983342217e30032b18ab7b90b8a83c6 Author: Yehuda Sadeh <ysadehwe@redhat.com> Date: Tue Feb 11 18:42:39 2020 -0800 auth: treat mgr the same as mon when selecting auth mode Also use mon_cluster_modes (and not cluster_modes) when peer is mon/mgr. Signed-off-by: Yehuda Sadeh <yehuda@redhat.com> diff --git a/src/auth/AuthRegistry.cc b/src/auth/AuthRegistry.cc index 2229c8ab906..897aa81f089 100644 --- a/src/auth/AuthRegistry.cc +++ b/src/auth/AuthRegistry.cc @@ -196,6 +196,7 @@ void AuthRegistry::get_supported_methods( if (modes) { switch (peer_type) { case CEPH_ENTITY_TYPE_MON: + case CEPH_ENTITY_TYPE_MGR: *modes = mon_client_modes; break; default: @@ -204,10 +205,12 @@ void AuthRegistry::get_supported_methods( } return; case CEPH_ENTITY_TYPE_MON: - // i am mon + case CEPH_ENTITY_TYPE_MGR: + // i am mon/mgr switch (peer_type) { case CEPH_ENTITY_TYPE_MON: - // they are mon + case CEPH_ENTITY_TYPE_MGR: + // they are mon/mgr if (methods) { *methods = cluster_methods; } @@ -230,6 +233,14 @@ void AuthRegistry::get_supported_methods( switch (peer_type) { case CEPH_ENTITY_TYPE_MON: case CEPH_ENTITY_TYPE_MGR: + // they are a mon daemon + if (methods) { + *methods = cluster_methods; + } + if (modes) { + *modes = mon_cluster_modes; + } + break; case CEPH_ENTITY_TYPE_MDS: case CEPH_ENTITY_TYPE_OSD: // they are another daemon
which makes sense since the bad read is in the auth code somewhere. i'm not sure why it's happening though.
#2 Updated by Sage Weil over 3 years ago
the takeaway from http://pulpito.ceph.com/sage-2020-03-02_17:19:00-rados:verify-master-distro-basic-smithi/ is that the mon_recovery (mon thrashing tests) trigger it, but the rados cls tests do not. both tests have ms failure injections. but the cls tests have msgr2 disabled... which is probably the reason?
#3 Updated by Yehuda Sadeh over 3 years ago
It seems to me that the specific commit just exposed an existing issue that for some reason did't show up before (likely that the specific conditions weren't tested under valgrind).
Looking at the code here:
ceph::bufferlist AES128GCM_OnWireTxHandler::authenticated_encrypt_final() { int final_len = 0; auto filler = buffer.append_hole(AESGCM_BLOCK_LEN); if(1 != EVP_EncryptFinal_ex(ectx.get(), reinterpret_cast<unsigned char*>(filler.c_str()), &final_len)) { throw std::runtime_error("EVP_EncryptFinal_ex failed"); } ceph_assert_always(final_len == 0);
The assumption is that final_len is zero, therefore EVP_EncryptFinal_ex() isn't expected to write anything. That means that buffer created through buffer.append_hole() isn't getting initialized. I'm not sure why filler is even needed. Will try to reset buffer before call.
#4 Updated by Sage Weil over 3 years ago
- Status changed from New to In Progress
- Assignee set to Yehuda Sadeh
#5 Updated by Radoslaw Zarzynski over 3 years ago
- Related to Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final() added
#6 Updated by Radoslaw Zarzynski over 3 years ago
filler
is supposed to carry two things:
- zero-byte long ciphertext's fragment acquired from
EVP_EncryptFinal_ex
. Generally speaking, the function should be provided with one cipher's block of extra space for the sake of padding. GCM doesn't pad but the interface is unclear whether it's safe to providenullptr
in such case. Because of interpreting the documentation literally, we pass valid pointer with enough space, and later ensure nothing has been really written to it. - The authentication tag which should be exactly
AESGCM_BLOCK_LEN
long. This aspect is much more important that the previous one.EVP_CTRL_GCM_GET_TAG
should truly fill the buffer.
auto filler = buffer.append_hole(AESGCM_BLOCK_LEN);
// ...
static_assert(AESGCM_BLOCK_LEN == AESGCM_TAG_LEN);
if(1 != EVP_CIPHER_CTX_ctrl(ectx.get(),
EVP_CTRL_GCM_GET_TAG, AESGCM_TAG_LEN,
filler.c_str())) {
throw std::runtime_error("EVP_CIPHER_CTX_ctrl failed");
}
#7 Updated by Sage Weil over 3 years ago
Merged https://github.com/ceph/ceph/pull/33757 ... should we keep this open or close it?
#8 Updated by Yehuda Sadeh over 3 years ago
@sage I think we can close it. It seems that my research tracks @rzarzynski's, so I'll take his original conclusions.
#9 Updated by Sage Weil over 3 years ago
hmm, seeing this now on master, after the existing whitelist was updated to the new symbols in 31a7a461382a3a979c12e114c9266366285487ca:
<kind>SyscallParam</kind> <what>Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)</what> <stack> <frame> <ip>0xC24FAF7</ip> <obj>/usr/lib64/libpthread-2.28.so</obj> <fn>sendmsg</fn> </frame> <frame> <ip>0x10C977F</ip> <obj>/usr/bin/ceph-osd</obj> <fn>do_sendmsg</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir> <file>PosixStack.cc</file> <line>80</line> </frame> <frame> <ip>0x10C977F</ip> <obj>/usr/bin/ceph-osd</obj> <fn>PosixConnectedSocketImpl::send(ceph::buffer::v15_2_0::list&, bool)</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir> <file>PosixStack.cc</file> <line>129</line> </frame> <frame> <ip>0x107F7D0</ip> <obj>/usr/bin/ceph-osd</obj> <fn>send</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir> <file>Stack.h</file> <line>100</line> </frame> <frame> <ip>0x107F7D0</ip> <obj>/usr/bin/ceph-osd</obj> <fn>AsyncConnection::_try_send(bool)</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir> <file>AsyncConnection.cc</file> <line>330</line> </frame> <frame> <ip>0x107FDFA</ip> <obj>/usr/bin/ceph-osd</obj> <fn>AsyncConnection::write(ceph::buffer::v15_2_0::list&, std::function<void (long)>, bool)</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir> <file>AsyncConnection.cc</file> <line>309</line> </frame> ... <auxwhat>Address 0xfc805b0 is 0 bytes inside a block of size 4,096 alloc'd</auxwhat> <stack> <frame> <ip>0xA80A51C</ip> <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj> <fn>memalign</fn> <dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir> <file>vg_replace_malloc.c</file> <line>908</line> </frame> <frame> <ip>0xA80A629</ip> <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj> <fn>posix_memalign</fn> <dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir> <file>vg_replace_malloc.c</file> <line>1072</line> </frame> <frame> <ip>0xEE7A6D</ip> <obj>/usr/bin/ceph-osd</obj> <fn>create</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir> <file>buffer.cc</file> <line>120</line> </frame> <frame> <ip>0xEE7A6D</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::buffer::v15_2_0::list::refill_append_space(unsigned int)</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir> <file>buffer.cc</file> <line>1324</line> </frame> <frame> <ip>0xEE7EDA</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::buffer::v15_2_0::list::append_hole(unsigned int)</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir> <file>buffer.cc</file> <line>1444</line> </frame> <frame> <ip>0x10CA5B0</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::crypto::onwire::AES128GCM_OnWireTxHandler::authenticated_encrypt_final()</fn> <dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir> <file>crypto_onwire.cc</file> <line>121</line> </frame>
/a/sage-2020-03-07_13:02:09-rados-master-distro-basic-smithi/4834609
/a/sage-2020-03-06_21:03:21-rados-wip-sage2-testing-2020-03-06-1128-distro-basic-smithi/4832076
and others
#10 Updated by Radoslaw Zarzynski over 3 years ago
@Yehuda: what's the status of this ticket?
Are you able to replicate the issue locally or it happens solely at sepia?
#11 Updated by Neha Ojha almost 3 years ago
- Status changed from In Progress to Can't reproduce