Project

General

Profile

Bug #38024

segv, heap corruption in ec encode_and_write

Added by Sage Weil about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-01-23 03:26:21.293 7ffb62fc0700 20 osd.0 pg_epoch: 385 pg[4.3s0( v 385'4482 (334'600,385'4482] local-lis/les=365/366 n=2672 ec=328/328 lis/c 365/365 les/c/f 366/366/0 328/365/328) [0,5,7]p0(0) r=0 lpr=365 luod=385'4481 crt=385'4481 lcod 385'4480 mlcod 385'4480 active+clean] operator(): new_size start 0
2019-01-23 03:26:21.293 7ffb62fc0700 20 osd.0 pg_epoch: 385 pg[4.3s0( v 385'4482 (334'600,385'4482] local-lis/les=365/366 n=2672 ec=328/328 lis/c 365/365 les/c/f 366/366/0 328/365/328) [0,5,7]p0(0) r=0 lpr=365 luod=385'4481 crt=385'4481 lcod 385'4480 mlcod 385'4480 active+clean] operator(): adding buffer_update 0,6
5536
2019-01-23 03:26:21.293 7ffb62fc0700 20 osd.0 pg_epoch: 385 pg[4.3s0( v 385'4482 (334'600,385'4482] local-lis/les=365/366 n=2672 ec=328/328 lis/c 365/365 les/c/f 366/366/0 328/365/328) [0,5,7]p0(0) r=0 lpr=365 luod=385'4481 crt=385'4481 lcod 385'4480 mlcod 385'4480 active+clean] operator(): to_overwrite: {}
2019-01-23 03:26:21.293 7ffb62fc0700 20 osd.0 pg_epoch: 385 pg[4.3s0( v 385'4482 (334'600,385'4482] local-lis/les=365/366 n=2672 ec=328/328 lis/c 365/365 les/c/f 366/366/0 328/365/328) [0,5,7]p0(0) r=0 lpr=365 luod=385'4481 crt=385'4481 lcod 385'4480 mlcod 385'4480 active+clean] operator(): to_append: {0~65536(6553
6)}
2019-01-23 03:26:21.293 7ffb62fc0700 20 osd.0 pg_epoch: 385 pg[4.3s0( v 385'4482 (334'600,385'4482] local-lis/les=365/366 n=2672 ec=328/328 lis/c 365/365 les/c/f 366/366/0 328/365/328) [0,5,7]p0(0) r=0 lpr=365 luod=385'4481 crt=385'4481 lcod 385'4480 mlcod 385'4480 active+clean] operator(): appending 0~65536

then crash.
(gdb) bt
#0  0x00007ffb8c36eee3 in tc_malloc () from /lib64/libtcmalloc.so.4
#1  0x0000560215984a9d in ceph::BackTrace::print (this=this@entry=0x7ffb62fb52a0, out=...) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/common/BackTrace.cc:43
#2  0x000056021596aed6 in handle_fatal_signal (signum=11) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/global/signal_handler.cc:169
#3  <signal handler called>
#4  0x00007ffb8c36fe63 in tc_newarray () from /lib64/libtcmalloc.so.4
#5  0x0000560215365822 in allocate (this=<optimized out>, __n=89) at /opt/rh/devtoolset-7/root/usr/include/c++/7/ext/new_allocator.h:111
#6  std::string::_Rep::_S_create (__capacity=64, __old_capacity=<optimized out>, __alloc=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/basic_string.tcc:1057
#7  0x0000560215365868 in std::string::_Rep::_M_clone (this=0x56022a572700, __res=<optimized out>, __alloc=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/basic_string.tcc:1078
#8  0x0000560215367338 in std::string::reserve (this=0x7ffb62fbb090, __res=<optimized out>) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/basic_string.tcc:960
#9  0x0000560215a3d2db in push_back (__c=56 '8', this=0x7ffb62fbb090) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/basic_string.h:4238
#10 append_out_escaped (in="benchmark_data_smithi104_141058_object83671", out=out@entry=0x7ffb62fbb090) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/common/hobject.cc:212
#11 0x0000560215a3d505 in operator<< (out=..., o=...) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/common/hobject.cc:257
#12 0x00005602156ec5b5 in encode_and_write (pgid=..., oid=..., sinfo=..., ecimpl=..., want=..., offset=0, bl=..., flags=0, hinfo=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<ECUtil::HashInfo*, SharedPtrRegistry<hobject_t, ECUtil::HashInfo, std::less<hobject_t> >::OnRemoval, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_deleter<ECUtil::HashInfo*, SharedPtrRegistry<hobject_t, ECUtil::HashInfo, std::less<hobject_t> >::OnRemoval, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 3, weak 1) 0x5602214af110, written=..., transactions=0x7ffb62fbbd40, 
    dpp=0x5602270a5000) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/ECTransaction.cc:50
#13 0x00005602156efaa1 in ECTransaction::<lambda(std::pair<const hobject_t, PGTransaction::ObjectOperation>&)>::operator()(std::pair<hobject_t const, PGTransaction::ObjectOperation> &) const (
    __closure=__closure@entry=0x7ffb62fbbb50, opair=...) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/ECTransaction.cc:599
#14 0x00005602156f3c62 in safe_create_traverse<ECTransaction::generate_transactions(ECTransaction::WritePlan&, ceph::ErasureCodeInterfaceRef&, pg_t, const ECUtil::stripe_info_t&, const std::map<hobject_t, interval_map<long unsigned int, ceph::buffer::list, bl_split_merge> >&, std::vector<pg_log_entry_t>&, std::map<hobject_t, interval_map<long unsigned int, ceph::buffer::list, bl_split_merge> >*, std::map<shard_id_t, ObjectStore::Transaction>*, std::set<hobject_t>*, std::set<hobject_t>*, DoutPrefixProvider*)::<lambda(std::pair<const hobject_t, PGTransaction::ObjectOperation>&)> > (
    t=<unknown type in /usr/lib/debug/usr/bin/ceph-osd.debug, CU 0x476433f, DIE 0x4a45aae>, this=0x560228abfce0) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/PGTransaction.h:564
#15 ECTransaction::generate_transactions (plan=..., ecimpl=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<ErasureCodeJerasure*, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<ErasureCodeJerasure*, (__gnu_cxx::_Lock_policy)2>'
Python Exception <type 'exceptions.ValueError'> Cannot find type const std::map<hobject_t, interval_map<unsigned long, ceph::buffer::list, bl_split_merge>, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, interval_map<unsigned long, ceph::buffer::list, bl_split_merge> > > >::_Rep_type: 
std::shared_ptr (count 3, weak 0) 0x56022a6d1380, pgid=..., sinfo=..., partial_extents=std::map with 0 elements, 
    entries=std::vector of length 1, capacity 1 = {...}, written_map=<optimized out>, transactions=<optimized out>, temp_added=<optimized out>, temp_removed=<optimized out>, dpp=<optimized out>)
    at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/ECTransaction.cc:124
#16 0x00005602156cf15d in ECBackend::try_reads_to_commit (this=this@entry=0x5602293adb00) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/ECBackend.cc:1965
#17 0x00005602156d2b0c in ECBackend::check_ops (this=this@entry=0x5602293adb00) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/ECBackend.cc:2132
#18 0x00005602156d39b1 in ECBackend::start_rmw(ECBackend::Op*, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&) (this=this@entry=0x5602293adb00, op=op@entry=0x560229387728, 
    t=t@entry=<unknown type in /usr/lib/debug/usr/bin/ceph-osd.debug, CU 0x423853e, DIE 0x46e5a4a>) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/ECBackend.cc:1846
#19 0x00005602156d528b in ECBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>) (this=0x5602293adb00, hoid=..., 
    delta_stats=..., at_version=..., t=<unknown type in /usr/lib/debug/usr/bin/ceph-osd.debug, CU 0x423853e, DIE 0x46fa527>, trim_to=..., roll_forward_to=..., log_entries=std::vector of length 1, capacity 1 = {...}, 
    hset_history=..., on_all_commit=0x560223c352c0, tid=<optimized out>, reqid=..., client_op=...) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/ECBackend.cc:1502
#20 0x0000560215503dfa in PrimaryLogPG::issue_repop (this=this@entry=0x5602270a5000, repop=repop@entry=0x56022832ea20, ctx=ctx@entry=0x5602288c7500) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/PrimaryLogPG.cc:10578
#21 0x000056021555d43e in PrimaryLogPG::execute_ctx (this=this@entry=0x5602270a5000, ctx=ctx@entry=0x5602288c7500) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/PrimaryLogPG.cc:4111
#22 0x00005602155616d5 in PrimaryLogPG::do_op (this=this@entry=0x5602270a5000, op=...) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/PrimaryLogPG.cc:2431
#23 0x0000560215563164 in PrimaryLogPG::do_request (this=0x5602270a5000, op=..., handle=...) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/PrimaryLogPG.cc:1868
#24 0x00005602153a9009 in OSD::dequeue_op (this=this@entry=0x560220c66000, pg=..., op=..., handle=...) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/OSD.cc:9629
#25 0x0000560215638e12 in PGOpItem::run (this=<optimized out>, osd=0x560220c66000, sdata=<optimized out>, pg=..., handle=...) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/OpQueueItem.cc:24
#26 0x00005602153c5abc in run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7ffb62fbd8b0) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/OpQueueItem.h:134
#27 OSD::ShardedOpWQ::_process (this=0x560220c67000, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/osd/OSD.cc:10804
#28 0x00005602159c0ce3 in ShardedThreadPool::shardedthreadpool_worker (this=0x560220c669f8, thread_index=<optimized out>) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/common/WorkQueue.cc:311
#29 0x00005602159c3d80 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.1-2862-gd4c4082/src/common/WorkQueue.h:699
#30 0x00007ffb897a9e25 in start_thread () from /lib64/libpthread.so.0
#31 0x00007ffb88672bad in clone () from /lib64/libc.so.6

/a/nojha-2019-01-23_02:37:14-rados:thrash-erasure-code-master-distro-basic-smithi/3494070


Related issues

Related to RADOS - Bug #38023: segv on FileJournal::prepare_entry in bufferlist Closed 01/23/2019
Related to bluestore - Bug #38230: segv in onode lookup Resolved 02/07/2019
Related to RADOS - Bug #38172: segv in rocksdb NewIterator New

History

#1 Updated by Sage Weil about 5 years ago

  • Related to Bug #38023: segv on FileJournal::prepare_entry in bufferlist added

#2 Updated by Sage Weil about 5 years ago

related? submit_transaction and bufferlist::rebuild()...

/a/sage-2019-02-06_15:56:08-rados-wip-sage-testing-2019-02-06-0659-distro-basic-smithi/3557100

     0> 2019-02-06 17:23:59.158 7f7bff8dc700 -1 *** Caught signal (Aborted) **
 in thread 7f7bff8dc700 thread_name:tp_osd_tp

 ceph version 14.0.1-3321-gfd53e1f (fd53e1ffd80f0d4899aa83670d0ea4dbcc67e734) nautilus (dev)
 1: (()+0x12890) [0x7f7c23b46890]
 2: (gsignal()+0xc7) [0x7f7c227f3e97]
 3: (abort()+0x141) [0x7f7c227f5801]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x5610613b47ed]
 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x5610613b4977]
 6: (ceph::buffer::ptr::end_c_str() const+0) [0x561061ce72f0]
 7: (ceph::buffer::list::rebuild(std::unique_ptr<ceph::buffer::ptr_node, ceph::buffer::ptr_node::disposer>)+0x48) [0x561061ce92b8]
 8: (ceph::buffer::list::rebuild()+0x11e) [0x561061ceb6ae]
 9: (ceph::buffer::list::c_str()+0x17) [0x561061ceb737]
 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x200) [0x5610619a1870]
 11: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x5f7) [0x5610619a8157]
 12: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x5610616f78a4]
 13: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&)+0x850) [0x56106181a710]
 14: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x23d) [0x5610618326fd]
 15: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x56106170a777]
 16: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0x5610616b9db5]
 17: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b3) [0x5610614e5d23]
 18: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x56106178f5c2]
 19: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x561061503a35]
 20: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x561061b1ea3c]
 21: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561061b21b10]
 22: (()+0x76db) [0x7f7c23b3b6db]
 23: (clone()+0x3f) [0x7f7c228d688f]

#3 Updated by Sage Weil about 5 years ago

  • Related to Bug #38230: segv in onode lookup added

#4 Updated by Sage Weil about 5 years ago

  • Related to Bug #38172: segv in rocksdb NewIterator added

#5 Updated by Sage Weil about 5 years ago

  • Status changed from 12 to Resolved

Also available in: Atom PDF