Bug #44362
closed
osd: uninitialized memory in sendmsg
Added by Sage Weil about 4 years ago.
Updated over 3 years ago.
Description
<kind>SyscallParam</kind>
<what>Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)</what>
<stack>
<frame>
<ip>0xC24CAF7</ip>
<obj>/usr/lib64/libpthread-2.28.so</obj>
<fn>sendmsg</fn>
</frame>
<frame>
<ip>0x10C685F</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>do_sendmsg</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>PosixStack.cc</file>
<line>80</line>
</frame>
<frame>
<ip>0x10C685F</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>PosixConnectedSocketImpl::send(ceph::buffer::v14_2_0::list&, bool)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>PosixStack.cc</file>
<line>129</line>
</frame>
<frame>
<ip>0x107C8D0</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>send</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>Stack.h</file>
<line>100</line>
</frame>
<frame>
<ip>0x107C8D0</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>AsyncConnection::_try_send(bool)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>AsyncConnection.cc</file>
<line>330</line>
</frame>
<frame>
<ip>0x107CEFA</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>AsyncConnection::write(ceph::buffer::v14_2_0::list&, std::function<void (long)>, bool)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>AsyncConnection.cc</file>
<line>309</line>
</frame>
<frame>
<ip>0x10A7BA5</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>ProtocolV2::write(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CtFun<ProtocolV2>&, ceph::buffer::v14_2_0::list&)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>777</line>
</frame>
...
<auxwhat>Address 0xfc93680 is 0 bytes inside a block of size 4,096 alloc'd</auxwhat>
<stack>
<frame>
<ip>0xA80751C</ip>
<obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
<fn>memalign</fn>
<dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir>
<file>vg_replace_malloc.c</file>
<line>908</line>
</frame>
<frame>
<ip>0xA807629</ip>
<obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
<fn>posix_memalign</fn>
<dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir>
<file>vg_replace_malloc.c</file>
<line>1072</line>
</frame>
<frame>
<ip>0xEE4B8D</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>create</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir>
<file>buffer.cc</file>
<line>120</line>
</frame>
<frame>
<ip>0xEE4B8D</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>ceph::buffer::v14_2_0::list::refill_append_space(unsigned int)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir>
<file>buffer.cc</file>
<line>1324</line>
</frame>
<frame>
<ip>0xEE4FFA</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>ceph::buffer::v14_2_0::list::append_hole(unsigned int)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/common</dir>
<file>buffer.cc</file>
<line>1444</line>
</frame>
<frame>
<ip>0x10C7690</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>ceph::crypto::onwire::AES128GCM_OnWireTxHandler::authenticated_encrypt_final()</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>crypto_onwire.cc</file>
<line>121</line>
</frame>
<frame>
<ip>0x10C48DE</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>get_buffer</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>frames_v2.h</file>
<line>274</line>
</frame>
<frame>
<ip>0x10C48DE</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>Ct<ProtocolV2>* ProtocolV2::write<ceph::msgr::v2::AuthSignatureFrame>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CtFun<ProtocolV2>&, ceph::msgr::v2::AuthSignatureFrame&)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>763</line>
</frame>
<frame>
<ip>0x10B27B8</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>ProtocolV2::handle_auth_done(ceph::buffer::v14_2_0::list&)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1465.g4941ea0.el8.x86_64/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>1873</line>
</frame>
/a/sage-2020-03-01_17:33:08-rados-wip-sage2-testing-2020-03-01-0811-distro-basic-smithi/4816360
The regression is between these commits: d27f512d1731988cf7f369559f2fc324f1592047..7b0e18c09eb6060ee23f00c06dac4203a2b99608
bad: http://pulpito.ceph.com/sage-2020-03-02_17:19:00-rados:verify-master-distro-basic-smithi/
good: http://pulpito.ceph.com/sage-2020-03-02_17:20:04-rados:verify-master-distro-basic-smithi/
I assume it's this commit:
commit dc8946199983342217e30032b18ab7b90b8a83c6
Author: Yehuda Sadeh <ysadehwe@redhat.com>
Date: Tue Feb 11 18:42:39 2020 -0800
auth: treat mgr the same as mon when selecting auth mode
Also use mon_cluster_modes (and not cluster_modes) when peer is mon/mgr.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
diff --git a/src/auth/AuthRegistry.cc b/src/auth/AuthRegistry.cc
index 2229c8ab906..897aa81f089 100644
--- a/src/auth/AuthRegistry.cc
+++ b/src/auth/AuthRegistry.cc
@@ -196,6 +196,7 @@ void AuthRegistry::get_supported_methods(
if (modes) {
switch (peer_type) {
case CEPH_ENTITY_TYPE_MON:
+ case CEPH_ENTITY_TYPE_MGR:
*modes = mon_client_modes;
break;
default:
@@ -204,10 +205,12 @@ void AuthRegistry::get_supported_methods(
}
return;
case CEPH_ENTITY_TYPE_MON:
- // i am mon
+ case CEPH_ENTITY_TYPE_MGR:
+ // i am mon/mgr
switch (peer_type) {
case CEPH_ENTITY_TYPE_MON:
- // they are mon
+ case CEPH_ENTITY_TYPE_MGR:
+ // they are mon/mgr
if (methods) {
*methods = cluster_methods;
}
@@ -230,6 +233,14 @@ void AuthRegistry::get_supported_methods(
switch (peer_type) {
case CEPH_ENTITY_TYPE_MON:
case CEPH_ENTITY_TYPE_MGR:
+ // they are a mon daemon
+ if (methods) {
+ *methods = cluster_methods;
+ }
+ if (modes) {
+ *modes = mon_cluster_modes;
+ }
+ break;
case CEPH_ENTITY_TYPE_MDS:
case CEPH_ENTITY_TYPE_OSD:
// they are another daemon
which makes sense since the bad read is in the auth code somewhere. i'm not sure why it's happening though.
It seems to me that the specific commit just exposed an existing issue that for some reason did't show up before (likely that the specific conditions weren't tested under valgrind).
Looking at the code here:
ceph::bufferlist AES128GCM_OnWireTxHandler::authenticated_encrypt_final()
{
int final_len = 0;
auto filler = buffer.append_hole(AESGCM_BLOCK_LEN);
if(1 != EVP_EncryptFinal_ex(ectx.get(),
reinterpret_cast<unsigned char*>(filler.c_str()),
&final_len)) {
throw std::runtime_error("EVP_EncryptFinal_ex failed");
}
ceph_assert_always(final_len == 0);
The assumption is that final_len is zero, therefore EVP_EncryptFinal_ex() isn't expected to write anything. That means that buffer created through buffer.append_hole() isn't getting initialized. I'm not sure why filler is even needed. Will try to reset buffer before call.
- Status changed from New to In Progress
- Assignee set to Yehuda Sadeh
- Related to Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final() added
The hole represented by
filler
is supposed to carry two things:
- zero-byte long ciphertext's fragment acquired from
EVP_EncryptFinal_ex
. Generally speaking, the function should be provided with one cipher's block of extra space for the sake of padding. GCM doesn't pad but the interface is unclear whether it's safe to provide nullptr
in such case. Because of interpreting the documentation literally, we pass valid pointer with enough space, and later ensure nothing has been really written to it.
- The authentication tag which should be exactly
AESGCM_BLOCK_LEN
long. This aspect is much more important that the previous one. EVP_CTRL_GCM_GET_TAG
should truly fill the buffer.
auto filler = buffer.append_hole(AESGCM_BLOCK_LEN);
// ...
static_assert(AESGCM_BLOCK_LEN == AESGCM_TAG_LEN);
if(1 != EVP_CIPHER_CTX_ctrl(ectx.get(),
EVP_CTRL_GCM_GET_TAG, AESGCM_TAG_LEN,
filler.c_str())) {
throw std::runtime_error("EVP_CIPHER_CTX_ctrl failed");
}
@Sage Weil I think we can close it. It seems that my research tracks @rzarzynski's, so I'll take his original conclusions.
hmm, seeing this now on master, after the existing whitelist was updated to the new symbols in 31a7a461382a3a979c12e114c9266366285487ca:
<kind>SyscallParam</kind>
<what>Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)</what>
<stack>
<frame>
<ip>0xC24FAF7</ip>
<obj>/usr/lib64/libpthread-2.28.so</obj>
<fn>sendmsg</fn>
</frame>
<frame>
<ip>0x10C977F</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>do_sendmsg</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
<file>PosixStack.cc</file>
<line>80</line>
</frame>
<frame>
<ip>0x10C977F</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>PosixConnectedSocketImpl::send(ceph::buffer::v15_2_0::list&, bool)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
<file>PosixStack.cc</file>
<line>129</line>
</frame>
<frame>
<ip>0x107F7D0</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>send</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
<file>Stack.h</file>
<line>100</line>
</frame>
<frame>
<ip>0x107F7D0</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>AsyncConnection::_try_send(bool)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
<file>AsyncConnection.cc</file>
<line>330</line>
</frame>
<frame>
<ip>0x107FDFA</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>AsyncConnection::write(ceph::buffer::v15_2_0::list&, std::function<void (long)>, bool)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
<file>AsyncConnection.cc</file>
<line>309</line>
</frame>
...
<auxwhat>Address 0xfc805b0 is 0 bytes inside a block of size 4,096 alloc'd</auxwhat>
<stack>
<frame>
<ip>0xA80A51C</ip>
<obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
<fn>memalign</fn>
<dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir>
<file>vg_replace_malloc.c</file>
<line>908</line>
</frame>
<frame>
<ip>0xA80A629</ip>
<obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
<fn>posix_memalign</fn>
<dir>/builddir/build/BUILD/valgrind-3.15.0/coregrind/m_replacemalloc</dir>
<file>vg_replace_malloc.c</file>
<line>1072</line>
</frame>
<frame>
<ip>0xEE7A6D</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>create</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir>
<file>buffer.cc</file>
<line>120</line>
</frame>
<frame>
<ip>0xEE7A6D</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>ceph::buffer::v15_2_0::list::refill_append_space(unsigned int)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir>
<file>buffer.cc</file>
<line>1324</line>
</frame>
<frame>
<ip>0xEE7EDA</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>ceph::buffer::v15_2_0::list::append_hole(unsigned int)</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/common</dir>
<file>buffer.cc</file>
<line>1444</line>
</frame>
<frame>
<ip>0x10CA5B0</ip>
<obj>/usr/bin/ceph-osd</obj>
<fn>ceph::crypto::onwire::AES128GCM_OnWireTxHandler::authenticated_encrypt_final()</fn>
<dir>/usr/src/debug/ceph-15.1.0-1858.gf7e5770.el8.x86_64/src/msg/async</dir>
<file>crypto_onwire.cc</file>
<line>121</line>
</frame>
/a/sage-2020-03-07_13:02:09-rados-master-distro-basic-smithi/4834609
/a/sage-2020-03-06_21:03:21-rados-wip-sage2-testing-2020-03-06-1128-distro-basic-smithi/4832076
and others
@Yehuda Sadeh: what's the status of this ticket?
Are you able to replicate the issue locally or it happens solely at sepia?
- Status changed from In Progress to Can't reproduce
Also available in: Atom
PDF