Bug #44430
*: valgrind: UninitCondition
0%
Description
Valgrind: mds (UninitCondition), client (UninitCondition), mon (UninitCondition), osd (UninitCondition) 5 jobs: ['4824922', '4824877', '4824831', '4825012', '4824967'] suites intersection: ['centos_latest.yaml', 'clusters/fixed-2-ucephfs.yaml', 'conf/{client.yaml', 'fs/verify/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}'] suites union: ['centos_latest.yaml', 'clusters/fixed-2-ucephfs.yaml', 'conf/{client.yaml', 'fs/verify/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-bitmap.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
Example:
<error> <unique>0x0</unique> <tid>4</tid> <threadname>msgr-worker-1</threadname> <kind>UninitCondition</kind> <what>Conditional jump or move depends on uninitialised value(s)</what> <stack> <frame> <ip>0xE8F952D</ip> <obj>/usr/lib64/libcrypto.so.1.1.1c</obj> </frame> <frame> <ip>0xE90828F</ip> <obj>/usr/lib64/libcrypto.so.1.1.1c</obj> <fn>EVP_DecryptFinal_ex</fn> </frame> <frame> <ip>0x5622CE5</ip> <obj>/usr/lib64/ceph/libceph-common.so.2</obj> <fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&&, unsigned int)</fn> <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir> <file>crypto_onwire.cc</file> <line>268</line> </frame> <frame> <ip>0x5610301</ip> <obj>/usr/lib64/ceph/libceph-common.so.2</obj> <fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)</fn> <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir> <file>ProtocolV2.cc</file> <line>1335</line> </frame> <frame> <ip>0x55F716B</ip> <obj>/usr/lib64/ceph/libceph-common.so.2</obj> <fn>ProtocolV2::run_continuation(Ct<ProtocolV2>&)</fn> <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir> <file>ProtocolV2.cc</file> <line>45</line> </frame> <frame> <ip>0x55F7336</ip> <obj>/usr/lib64/ceph/libceph-common.so.2</obj> <fn>ProtocolV2::read_event()</fn> <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir> <file>ProtocolV2.cc</file> <line>463</line> </frame> <frame> <ip>0x55BE4AC</ip> <obj>/usr/lib64/ceph/libceph-common.so.2</obj> <fn>AsyncConnection::process()</fn> <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir> <file>AsyncConnection.cc</file> <line>467</line> </frame> <frame> <ip>0x5618365</ip> <obj>/usr/lib64/ceph/libceph-common.so.2</obj> <fn>EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)</fn> <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir> <file>Event.cc</file> <line>433</line> </frame> <frame> <ip>0x561F36B</ip> <obj>/usr/lib64/ceph/libceph-common.so.2</obj> <fn>operator()</fn> <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir> <file>Stack.cc</file> <line>53</line> </frame> <frame> <ip>0x561F36B</ip> <obj>/usr/lib64/ceph/libceph-common.so.2</obj> <fn>std::_Function_handler<void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}>::_M_invoke(std::_Any_data const&)</fn> <dir>/usr/include/c++/8/bits</dir> <file>std_function.h</file> <line>297</line> </frame> <frame> <ip>0xF7C7B22</ip> <obj>/usr/lib64/libstdc++.so.6.0.25</obj> </frame> <frame> <ip>0xEC7E2DD</ip> <obj>/usr/lib64/libpthread-2.28.so</obj> <fn>start_thread</fn> </frame> <frame> <ip>0x10130132</ip> <obj>/usr/lib64/libc-2.28.so</obj> <fn>clone</fn> </frame> </stack> </error>
History
#1 Updated by Radoslaw Zarzynski about 4 years ago
Hmm, this should be covered by one of the whitelist rules we already have in `qa/vulgrind.supp`:
Memcheck:Cond ... fun:EVP_DecryptFinal_ex fun:_ZN4ceph6crypto6onwire25AES128GCM_OnWireRxHandler34authenticated_decrypt_update_finalEONS_6buffer7v14_2_04listEj fun:_ZN10ProtocolV231handle_read_frame_epilogue_mainEOSt10unique_ptrIN4ceph6buffer7v14_2_08ptr_nodeENS4_8disposerEEi fun:_ZN10ProtocolV216run_continuationER2CtIS_E ... fun:_ZN15AsyncConnection7processEv fun:_ZN11EventCenter14process_eventsEjPNSt6chrono8durationImSt5ratioILl1ELl1000000000EEEE ...
Let's strip some junks and process the wildcards manually:
... <frame> <fn>EVP_DecryptFinal_ex</fn> </frame> <frame> <fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&&, unsigned int)</fn> </frame> <frame> <fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)</fn> </frame> <frame> <fn>ProtocolV2::run_continuation(Ct<ProtocolV2>&)</fn> </frame> ... <frame> <fn>AsyncConnection::process()</fn> </frame> <frame> <fn>EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)</fn> </frame> ...
The problem came from the recent bump-up of the symbol version in the buffer
library. It caused that
_ZN4ceph6crypto6onwire25AES128GCM_OnWireRxHandler34authenticated_decrypt_update_finalEONS_6buffer7v14_2_04listEj
doesn't match
ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)
anymore (
v14_2_0
vs v15_2_0
).#2 Updated by Radoslaw Zarzynski about 4 years ago
- Status changed from New to In Progress
#3 Updated by Radoslaw Zarzynski about 4 years ago
Untested: https://github.com/rzarzynski/ceph/commit/3ed76db2d5ff872dd086b063c7361c2ef53c9c7d.valgrind
explodes on my local machine, moved to incerta07 for testing.
#4 Updated by Radoslaw Zarzynski about 4 years ago
- Assignee set to Radoslaw Zarzynski
#5 Updated by Radoslaw Zarzynski about 4 years ago
- Status changed from In Progress to Fix Under Review
#6 Updated by Radoslaw Zarzynski about 4 years ago
BTW (not strictly connected with whitelist update): I'm looking for the underlying cause of the UninitCondition
in isolation. At the moment it's not replicable with the unit test (https://github.com/ceph/ceph/compare/master...rzarzynski:wip-test-msg-onwire-crypto) on incerta07 – still a daemon (e.g. @ceph-mon) is needed to trigger it.
#7 Updated by Sage Weil about 4 years ago
- Status changed from Fix Under Review to Resolved
- Pull request ID set to 33757