Bug #6346: OSD: do not crash on bad client op input - Ceph - Ceph

Actions

Copy link

Bug #6346

closed

OSD: do not crash on bad client op input

Added by Greg Farnum over 10 years ago. Updated over 10 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

(gdb) bt
#0  0x00007f578c77ab7b in raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x00000000009fb6ae in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:105
#3  <signal handler called>
#4  0x00007f578a848425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5  0x00007f578a84bb8b in __GI_abort () at abort.c:91
#6  0x00007f578b19a69d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f578b198846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f578b198873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f578b19896e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000000000087277f in ceph::__ceph_assert_fail (assertion=0xa1fbb1 "len == data.length()", file=<optimized out>, line=424, 
    func=0xa26360 "void ObjectStore::Transaction::write(coll_t, const hobject_t&, uint64_t, uint64_t, const bufferlist&)") at common/assert.cc:77
#11 0x00000000005d6ee1 in ObjectStore::Transaction::write (data=..., len=<optimized out>, off=<optimized out>, oid=..., cid=..., this=<optimized out>) at ./os/ObjectStore.h:424
#12 0x000000000063ac68 in ObjectStore::Transaction::write (this=0x3ad1f40, cid=..., oid=..., off=0, len=2459525, data=...) at ./os/ObjectStore.h:424
#13 0x0000000000744e70 in ReplicatedPG::do_osd_ops (this=0x2f7b000, ctx=0x3ad1c00, ops=...) at osd/ReplicatedPG.cc:2672
#14 0x0000000000758bac in ReplicatedPG::prepare_transaction (this=0x2f7b000, ctx=0x3ad1c00) at osd/ReplicatedPG.cc:3956
#15 0x000000000075aabd in ReplicatedPG::execute_ctx (this=0x2f7b000, ctx=0x3ad1c00) at osd/ReplicatedPG.cc:994
#16 0x000000000075f7b5 in ReplicatedPG::do_op (this=0x2f7b000, op=...) at osd/ReplicatedPG.cc:887
#17 0x000000000069f419 in PG::do_request (this=0x2f7b000, op=..., handle=...) at osd/PG.cc:1428
#18 0x00000000005e8580 in OSD::dequeue_op (this=0x2e94000, pg=..., op=..., handle=...) at osd/OSD.cc:7204
#19 0x00000000005fea60 in OSD::OpWQ::_process (this=0x2e94e00, pg=..., handle=...) at osd/OSD.cc:7176
#20 0x000000000063d8fc in ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process (this=0x2e94e00, handle=...)
    at ./common/WorkQueue.h:190
#21 0x0000000000865626 in ThreadPool::worker (this=0x2e94468, wt=0x2e9fb80) at common/WorkQueue.cc:125
#22 0x0000000000867430 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:317
#23 0x00007f578c772e9a in start_thread (arg=0x7f57786b4700) at pthread_create.c:308
#24 0x00007f578a905ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#25 0x0000000000000000 in ?? ()

The values it's looking at are pulled straight from a client-provided bufferlist and we don't seem to do any validation prior to this point. See teuthology run at /a/teuthology-2013-09-17_23:01:16-fs-next-testing-basic-plana/1237.

Actions

Copy link

Updated by Sage Weil over 10 years ago

    case CEPH_OSD_OP_WRITE:
      ++ctx->num_write;
      { // write
    if (op.extent.length != osd_op.indata.length()) {
      result = -EINVAL;
      break;
    }

is in master... but it is new, first appears in 0.69

Actions

Copy link

Updated by Greg Farnum over 10 years ago

Well there's something weird going on then, because this is post-0.69 and appears to have passed those checks:

./os/ObjectStore.h: 425: FAILED assert(len == data.length())

 ceph version 0.69-252-g08c386f (08c386f54254bb5652d811e4caf339b619a39109)
 1: ceph-osd() [0x642492]
 2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x91d7) [0x822637]
 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6c) [0x82dfbc]
 4: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x8b9) [0x82fea9]
 5: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x20b5) [0x834745]
 6: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x619) [0x7701e9]
 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x338) [0x5ee128]
 8: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x4a0) [0x604d40]
 9: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6452dc]
 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x937426]
 11: (ThreadPool::WorkThread::entry()+0x10) [0x939230]
 12: (()+0x7e9a) [0x7f6dfe7a6e9a]
 13: (clone()+0x6d) [0x7f6dfcba6ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

And now I'm just confused, because it sure looks like both the callers of Transaction::write() in ReplicatedPG::do_osd_ops() are validating exactly what they pass in. And gdb says this is backtrace is via the CEPH_OSD_OP_WRITE path.

A-hah! We are resetting op.extent.length:

        if (seq && (seq > op.extent.truncate_seq) &&
            (op.extent.offset + op.extent.length > oi.size)) {
      // old write, arrived after trimtrunc
      op.extent.length = (op.extent.offset > oi.size ? 0 : oi.size - op.extent.offset);
      dout(10) << " old truncate_seq " << op.extent.truncate_seq << " < current " << seq
           << ", adjusting write length to " << op.extent.length << dendl;
        }

Looking at the core file indicates that op.extent.length is zero at the time of the crash. Which means the OSD is breaking itself with this combination of the assert and the op updates.

Anyway, this comes from /a/teuthology-2013-09-23_23:01:30-fs-master-testing-basic-plana/14298.

Actions

Copy link