Project

General

Profile

Bug #44430

*: valgrind: UninitCondition

Added by Patrick Donnelly 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Immediate
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Pull request ID:
Crash signature:

Description

Valgrind: mds (UninitCondition), client (UninitCondition), mon (UninitCondition), osd (UninitCondition)
5 jobs: ['4824922', '4824877', '4824831', '4825012', '4824967']
suites intersection: ['centos_latest.yaml', 'clusters/fixed-2-ucephfs.yaml', 'conf/{client.yaml', 'fs/verify/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['centos_latest.yaml', 'clusters/fixed-2-ucephfs.yaml', 'conf/{client.yaml', 'fs/verify/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-bitmap.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/cfuse_workunit_suites_fsstress.yaml', 'validater/valgrind.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

From: http://pulpito.ceph.com/pdonnell-2020-03-04_14:24:35-fs-wip-pdonnell-testing-20200303.175301-distro-basic-smithi/

Example:

<error>
  <unique>0x0</unique>
  <tid>4</tid>
  <threadname>msgr-worker-1</threadname>
  <kind>UninitCondition</kind>
  <what>Conditional jump or move depends on uninitialised value(s)</what>
  <stack>
    <frame>
      <ip>0xE8F952D</ip>
      <obj>/usr/lib64/libcrypto.so.1.1.1c</obj>
    </frame>
    <frame>
      <ip>0xE90828F</ip>
      <obj>/usr/lib64/libcrypto.so.1.1.1c</obj>
      <fn>EVP_DecryptFinal_ex</fn>
    </frame>
    <frame>
      <ip>0x5622CE5</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.2</obj>
      <fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&amp;&amp;, unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir>
      <file>crypto_onwire.cc</file>
      <line>268</line>
    </frame>
    <frame>
      <ip>0x5610301</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.2</obj>
      <fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&lt;ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer&gt;&amp;&amp;, int)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>1335</line>
    </frame>
    <frame>
      <ip>0x55F716B</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.2</obj>
      <fn>ProtocolV2::run_continuation(Ct&lt;ProtocolV2&gt;&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>45</line>
    </frame>
    <frame>
      <ip>0x55F7336</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.2</obj>
      <fn>ProtocolV2::read_event()</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>463</line>
    </frame>
    <frame>
      <ip>0x55BE4AC</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.2</obj>
      <fn>AsyncConnection::process()</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir>
      <file>AsyncConnection.cc</file>
      <line>467</line>
    </frame>
    <frame>
      <ip>0x5618365</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.2</obj>
      <fn>EventCenter::process_events(unsigned int, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt;*)</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir>
      <file>Event.cc</file>
      <line>433</line>
    </frame>
    <frame>
      <ip>0x561F36B</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.2</obj>
      <fn>operator()</fn>
      <dir>/usr/src/debug/ceph-15.1.0-1621.g38766d1.el8.x86_64/src/msg/async</dir>
      <file>Stack.cc</file>
      <line>53</line>
    </frame>
    <frame>
      <ip>0x561F36B</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.2</obj>
      <fn>std::_Function_handler&lt;void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}&gt;::_M_invoke(std::_Any_data const&amp;)</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>std_function.h</file>
      <line>297</line>
    </frame>
    <frame>
      <ip>0xF7C7B22</ip>
      <obj>/usr/lib64/libstdc++.so.6.0.25</obj>
    </frame>
    <frame>
      <ip>0xEC7E2DD</ip>
      <obj>/usr/lib64/libpthread-2.28.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0x10130132</ip>
      <obj>/usr/lib64/libc-2.28.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>
</error>

History

#1 Updated by Radoslaw Zarzynski 7 months ago

Hmm, this should be covered by one of the whitelist rules we already have in `qa/vulgrind.supp`:

   Memcheck:Cond
   ...
   fun:EVP_DecryptFinal_ex
   fun:_ZN4ceph6crypto6onwire25AES128GCM_OnWireRxHandler34authenticated_decrypt_update_finalEONS_6buffer7v14_2_04listEj
   fun:_ZN10ProtocolV231handle_read_frame_epilogue_mainEOSt10unique_ptrIN4ceph6buffer7v14_2_08ptr_nodeENS4_8disposerEEi
   fun:_ZN10ProtocolV216run_continuationER2CtIS_E
   ...
   fun:_ZN15AsyncConnection7processEv
   fun:_ZN11EventCenter14process_eventsEjPNSt6chrono8durationImSt5ratioILl1ELl1000000000EEEE
   ...

Let's strip some junks and process the wildcards manually:

    ...
    <frame>
      <fn>EVP_DecryptFinal_ex</fn>
    </frame>
    <frame>
      <fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&amp;&amp;, unsigned int)</fn>
    </frame>
    <frame>
      <fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&lt;ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer&gt;&amp;&amp;, int)</fn>
    </frame>
    <frame>
      <fn>ProtocolV2::run_continuation(Ct&lt;ProtocolV2&gt;&amp;)</fn>
    </frame>
    ...
    <frame>
      <fn>AsyncConnection::process()</fn>
    </frame>
    <frame>
      <fn>EventCenter::process_events(unsigned int, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt;*)</fn>
    </frame>
    ...

The problem came from the recent bump-up of the symbol version in the buffer library. It caused that

_ZN4ceph6crypto6onwire25AES128GCM_OnWireRxHandler34authenticated_decrypt_update_finalEONS_6buffer7v14_2_04listEj

doesn't match
ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&lt;ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer&gt;&amp;&amp;, int)

anymore (v14_2_0 vs v15_2_0).

#2 Updated by Radoslaw Zarzynski 7 months ago

  • Status changed from New to In Progress

#3 Updated by Radoslaw Zarzynski 7 months ago

Untested: https://github.com/rzarzynski/ceph/commit/3ed76db2d5ff872dd086b063c7361c2ef53c9c7d.
valgrind explodes on my local machine, moved to incerta07 for testing.

#4 Updated by Radoslaw Zarzynski 7 months ago

  • Assignee set to Radoslaw Zarzynski

#5 Updated by Radoslaw Zarzynski 7 months ago

  • Status changed from In Progress to Fix Under Review

#6 Updated by Radoslaw Zarzynski 7 months ago

BTW (not strictly connected with whitelist update): I'm looking for the underlying cause of the UninitCondition in isolation. At the moment it's not replicable with the unit test (https://github.com/ceph/ceph/compare/master...rzarzynski:wip-test-msg-onwire-crypto) on incerta07 – still a daemon (e.g. @ceph-mon) is needed to trigger it.

#7 Updated by Sage Weil 7 months ago

  • Status changed from Fix Under Review to Resolved
  • Pull request ID set to 33757

Also available in: Atom PDF