Project

General

Profile

Actions

Bug #16596

closed

osd: memory corruption resulting in invalid CollectionHandle reference

Added by Josh Durgin almost 8 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From http://qa-proxy.ceph.com/teuthology/joshd-2016-07-01_23:34:47-rados-wip-pg-log-errors-9---basic-smithi/287851/remote/smithi054/log/ceph-osd.4.log.gz

in thread 2c839700 thread_name:tp_osd_tp

 ceph version v11.0.0-246-g5f0d0f4 (5f0d0f4e67f2bdbd38e1e9cf08b499d6932921bf)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x1ce05b3]
 2: ceph-osd() [0x1eb4acd]
 3: (()+0x10340) [0xd694340]
 4: (gsignal()+0x39) [0xea7bcc9]
 5: (abort()+0x148) [0xea7f0d8]
 6: (()+0x2fb86) [0xea74b86]
 7: (()+0x2fc32) [0xea74c32]
 8: (boost::intrusive_ptr<ObjectStore::CollectionImpl>::operator->() const+0x37) [0x1c9d5d1]
 9: (ObjectStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, char const*, ceph::buffer::ptr&)+0x39) [0x1c9cefb]
 10: (PGBackend::objects_get_attr(hobject_t const&, std::string const&, ceph::buffer::list*)+0xcd) [0x1b16dd5]
 11: (ReplicatedPG::get_object_context(hobject_t const&, bool, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x584) [0x1a
63eb6]
 12: (ReplicatedPG::find_object_context(hobject_t const&, std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0xcf) [0x1a64e83]
 13: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x284d) [0x1a1ae27]
 14: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x97c) [0x1a17d7e]
 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x4e1) [0x183ab5d]
 16: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> const&)+0x5d) [0x17e2a3f]

Job running again here: http://pulpito.ceph.com/joshd-2016-07-05_16:18:39-rados-wip-pg-log-errors-9---basic-smithi/

Actions #1

Updated by Josh Durgin almost 8 years ago

possibly related (what looks like another memory corruption) from http://qa-proxy.ceph.com/teuthology/joshd-2016-07-01_23:26:44-rados-wip-pg-log-errors-9---basic-mira/287614/remote/mira110/log/ceph-osd.0.log.gz :

     0> 2016-07-03 03:46:18.155826 7fcd9949a700 -1 /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-246-g5f0d0f4/src/osd/ReplicatedPG.cc: In function 'void ReplicatedPG::eval_repop(ReplicatedPG::RepGath
er*)' thread 7fcd9949a700 time 2016-07-03 03:46:18.127644
/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-246-g5f0d0f4/src/osd/ReplicatedPG.cc: 8525: FAILED assert(repop_queue.front() == repop)

 ceph version v11.0.0-246-g5f0d0f4 (5f0d0f4e67f2bdbd38e1e9cf08b499d6932921bf)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x203fde9]
 2: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x11ef) [0x1a5eba7]
 3: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0x20d) [0x1a5d525]
 4: (ReplicatedPG::do_update_log_missing_reply(std::shared_ptr<OpRequest>&)+0x3a8) [0x1a6c9f8]
 5: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xb64) [0x1a1825c]
 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x4e1) [0x183ad5b]
 7: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> const&)+0x5d) [0x17e2c01]

Both this job and the original reported here used the async messenger.

Actions #2

Updated by Sage Weil over 7 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF