Project

General

Profile

Actions

Bug #18412

closed

multisite: use after free in RGWCloneMetaLogCoroutine::state_read_shard_status()

Added by Casey Bodley over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

from http://qa-proxy.ceph.com/teuthology/sage-2016-12-26_18:32:29-rgw-wip-sage-testing---basic-smithi/668021/remote/smithi031/log/valgrind/client.1.log.gz:

  <kind>InvalidWrite</kind>
  <what>Invalid write of size 4</what>
  <stack>
    <frame>
      <ip>0x3B9976</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>finish</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6257-gd4000dd/src/rgw</dir>
      <file>rgw_metadata.cc</file>
      <line>205</line>
    </frame>
    <frame>
      <ip>0x3B9976</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>_mdlog_info_completion(void*, void*)</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6257-gd4000dd/src/rgw</dir>
      <file>rgw_metadata.cc</file>
      <line>228</line>
    </frame>
...
  <auxwhat>Address 0x731263c8 is 1,464 bytes inside a block of size 1,528 free'd</auxwhat>
  <stack>
    <frame>
      <ip>0x98B418D</ip>
      <obj>/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator delete(void*)</fn>
      <dir>/builddir/build/BUILD/valgrind-3.11.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>576</line>
    </frame>
    <frame>
      <ip>0x5760E2</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWCloneMetaLogCoroutine::~RGWCloneMetaLogCoroutine()</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6257-gd4000dd/src/rgw</dir>
      <file>rgw_sync.cc</file>
      <line>1220</line>
    </frame>
...
  <auxwhat>Block was alloc'd at</auxwhat>
  <stack>
    <frame>
      <ip>0x98B3203</ip>
      <obj>/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator new(unsigned long)</fn>
      <dir>/builddir/build/BUILD/valgrind-3.11.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>334</line>
    </frame>
    <frame>
      <ip>0x5810F1</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWMetaSyncShardCR::incremental_sync()</fn>
      <dir>/usr/src/debug/ceph-11.1.0-6257-gd4000dd/src/rgw</dir>
      <file>rgw_sync.cc</file>
      <line>1562</line>
    </frame>

this coroutine was interrupted by shutdown, according to http://qa-proxy.ceph.com/teuthology/sage-2016-12-26_18:32:29-rgw-wip-sage-testing---basic-smithi/668021/remote/smithi031/log/rgw.client.1.log.gz:

2016-12-26 22:38:40.701980 34a62700 20 cr:s=0x181698a0:op=0x73125e10:24RGWCloneMetaLogCoroutine: operate()
2016-12-26 22:38:40.702042 34a62700 20 meta sync: operate: shard_id=24: reading shard status
2016-12-26 22:38:40.703443 34a62700 20 run: stack=0x181698a0 is io blocked
2016-12-26 22:38:40.703515 34a62700 20 cr:s=0x22f864c0:op=0x72e5cf30:18RGWMetaSyncShardCR: operate()
2016-12-26 22:38:40.703573 34a62700 20 meta sync: incremental_sync:1571: shard_id=61 mdlog_marker= sync_marker.marker=
2016-12-26 22:38:40.703645 34a62700 20 meta sync: incremental_sync:1603: shard_id=61 mdlog_marker= max_marker= sync_marker.marker= period_marker=
2016-12-26 22:38:40.703763 34a62700 20 run: stack=0x22f864c0 is io blocked
2016-12-26 22:38:40.711414 34a62700  0 ERROR: failed to clone shard, completion_mgr.get_next() returned ret=-125
2016-12-26 22:38:40.712671 34a62700  5 run(): was stopped, exiting
2016-12-26 22:38:40.713967 34a62700 20 clearing stack on run() exit: stack=0x18131230 nref=1
2016-12-26 22:38:40.715969 34a62700 20 clearing stack on run() exit: stack=0x18137780 nref=3
2016-12-26 22:38:40.716024 34a62700 20 clearing stack on run() exit: stack=0x18137d30 nref=3
2016-12-26 22:38:40.716064 34a62700 20 clearing stack on run() exit: stack=0x18138b10 nref=3
2016-12-26 22:38:40.716102 34a62700 20 clearing stack on run() exit: stack=0x18139890 nref=3

RGWCloneMetaLogCoroutine::state_read_shard_status() is calling RGWMetadataLog::get_info_async(), and passing pointers to some of its member variables. RGWMetadataLogInfoCompletion stores these pointers and dereferences them on completion. But nothing appears to be holding a reference on RGWCloneMetaLogCoroutine to do this safely.


Related issues 1 (0 open1 closed)

Copied to rgw - Backport #18613: kraken: multisite: use after free in RGWCloneMetaLogCoroutine::state_read_shard_status()ResolvedCasey BodleyActions
Actions #1

Updated by Casey Bodley over 7 years ago

  • Subject changed from multisite: invalid read in RGWCloneMetaLogCoroutine::state_read_shard_status() to multisite: use after free in RGWCloneMetaLogCoroutine::state_read_shard_status()
Actions #3

Updated by Abhishek Lekshmanan over 7 years ago

  • Status changed from New to Pending Backport
Actions #4

Updated by Nathan Cutler about 7 years ago

  • Backport set to kraken
Actions #5

Updated by Nathan Cutler about 7 years ago

  • Copied to Backport #18613: kraken: multisite: use after free in RGWCloneMetaLogCoroutine::state_read_shard_status() added
Actions #6

Updated by Nathan Cutler about 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF