Project

General

Profile

Bug #9294

invalid read of size 8 in ReplicatedPG::start_flush()

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
firefly
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

<error>
  <unique>0x2</unique>
  <tid>46</tid>
  <kind>InvalidRead</kind>
  <what>Invalid read of size 8</what>
  <stack>
    <frame>
      <ip>0x80FAE3</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::start_flush(std::tr1::shared_ptr&lt;OpRequest&gt;, std::tr1::shared_ptr&lt;ObjectContext&gt;, bool, hobject_t*, Context*)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/./include</dir>
      <file>object.h</file>
      <line>117</line>
    </frame>
    <frame>
      <ip>0x81D6A8</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector&lt;OSDOp, std::allocator&lt;OSDOp&gt; &gt;&amp;)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>3446</line>
    </frame>
    <frame>
      <ip>0x823712</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>5295</line>
    </frame>
    <frame>
      <ip>0x8245AE</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>1856</line>
    </frame>
    <frame>
      <ip>0x82F43B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_op(std::tr1::shared_ptr&lt;OpRequest&gt;&amp;)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>1559</line>
    </frame>
    <frame>
      <ip>0x7CA3AE</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_request(std::tr1::shared_ptr&lt;OpRequest&gt;&amp;, ThreadPool::TPHandle&amp;)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>1129</line>
    </frame>
    <frame>
      <ip>0x64B41E</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&amp;)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/osd</dir>
      <file>OSD.cc</file>
      <line>8373</line>
    </frame>
    <frame>
      <ip>0x64BEE0</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/osd</dir>
      <file>OSD.cc</file>
      <line>8267</line>
    </frame>
    <frame>
      <ip>0xA8C831</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::shardedthreadpool_worker(unsigned int)</fn>
      <dir>/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-0.84-823-gf807a7e/src/common</dir>

ubuntu@teuthology:/a/sage-2014-08-28_16:08:59-rados-master-testing-basic-multi/458989

Associated revisions

Revision ded1cf4a (diff)
Added by Sage Weil over 9 years ago

osd/ReplicatedPG: avoid dereferencing iterator at end()

The preceding loop could terminate with p == snapset.clones.end(), which
we assign to dnewest. We can't dereference the iterator in that case.

For example:

start_flush ffe627f3/foo/a/test-rados-api-plana05-22080-18/83 v430'42 uv130 blocking
snapset b=[b,a]:[a,b]+head
start_flush no older clones

prev_snapc will be 0, oi.snaps will be [a], p will end up at end(), get
assigned to dnewest, and we'll dereference. It's only sometime harmful
though because we may still take the right (else) branch...

Fixes: #9294
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 9 years ago

This is somewhere in start_flush(). A quick re-read of the code for places where we dereference a snapid_t iterator turns up this:

diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
index b52f2e7..1f1cdd6 100644
--- a/src/osd/ReplicatedPG.cc
+++ b/src/osd/ReplicatedPG.cc
@@ -6441,7 +6441,7 @@ int ReplicatedPG::start_flush(
     vector<snapid_t>::iterator dnewest = p;

     // we may need to send a delete first
-    if (prev_snapc + 1 < *dnewest) {
+    if (dnewest != snapset.snaps.end() && prev_snapc + 1 < *dnewest) {
       while (p != snapset.snaps.end() && *p > prev_snapc)
        ++p;
       dsnapc.snaps = vector<snapid_t>(p, snapset.snaps.end());

#2 Updated by Sage Weil over 9 years ago

  • Status changed from New to Fix Under Review
  • Backport set to firefly

#3 Updated by Sage Weil over 9 years ago

(gdb) bt
#0  0x00007fd29286620b in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x0000000000a503da in reraise_fatal (signum=11) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=11) at global/signal_handler.cc:105
#3  <signal handler called>
#4  operator uint64_t (this=0x0) at ./include/object.h:117
#5  ReplicatedPG::start_flush (this=this@entry=0x2a8ec00, op=..., obc=..., blocking=blocking@entry=false, pmissing=pmissing@entry=0x0, on_flush=on_flush@entry=0x2f35d40) at osd/ReplicatedPG.cc:6310
#6  0x00000000008325ff in ReplicatedPG::agent_maybe_flush (this=this@entry=0x2a8ec00, obc=...) at osd/ReplicatedPG.cc:11431
#7  0x0000000000833784 in ReplicatedPG::agent_work (this=0x2a8ec00, start_max=4) at osd/ReplicatedPG.cc:11271
#8  0x0000000000639149 in OSDService::agent_entry (this=0x27b96a0) at osd/OSD.cc:532
#9  0x00000000006a26fd in OSDService::AgentThread::entry (this=<optimized out>) at osd/OSD.h:565
#10 0x00007fd29285e182 in start_thread (arg=0x7fd26f964700) at pthread_create.c:312
#11 0x00007fd290dca38d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) f 5
#5  ReplicatedPG::start_flush (this=this@entry=0x2a8ec00, op=..., obc=..., blocking=blocking@entry=false, pmissing=pmissing@entry=0x0, on_flush=on_flush@entry=0x2f35d40) at osd/ReplicatedPG.cc:6310
6310    osd/ReplicatedPG.cc: No such file or directory.

ubuntu@teuthology:/a/teuthology-2014-08-31_02:30:01-rados-next-testing-basic-multi/463029 (saved log and core file)

#4 Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved

#5 Updated by Sage Weil over 9 years ago

  • Status changed from Resolved to Pending Backport

#6 Updated by Sage Weil over 9 years ago

  • Status changed from Pending Backport to Rejected

closed, this patch got reverted

Also available in: Atom PDF