Project

General

Profile

Actions

Bug #36372

closed

OSD:Segmentation fault thread_name:tp_osd_tp--10.2.10

Added by lin zhou over 5 years ago. Updated over 5 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a radosgw cluster,using SSD as index pool.
this is the second time that three ssd osd down with error: Caught signal (Segmentation fault) ** in thread 7f40a31ed700 thread_name:tp_osd_tp

based on the log,it seems these osds are doing gc and leveldb compact.
the omap of all osd all below 1G.

2018-10-09 15:24:10.928644 7f40a49f0700  0 <cls> cls/rgw/cls_rgw.cc:962: rgw_bucket_complete_op(): entry.name=_multipart_
qyygvlnzsysbivasdazt5q/backup/_realm_data.tar.gz.0.2~xhamSQArn7VklJbokArSW8Juz-Y4Ne9.6 entry.instance= entry.meta.categor
y=1

2018-10-09 15:24:11.802039 7f40a31ed700  0 <cls> cls/rgw/cls_rgw.cc:3223: gc_iterate_entries end_key=1_01539069851.802036688

2018-10-09 15:24:11.939506 7f40b2415700  1 leveldb: Compacting 1@0 + 8@1 files
2018-10-09 15:24:12.000742 7f40b2415700  1 leveldb: Generated table #293566: 66090 keys, 1570112 bytes
2018-10-09 15:24:12.095546 7f40b2415700  1 leveldb: Generated table #293567: 110772 keys, 2138845 bytes
2018-10-09 15:24:12.119825 7f40b2415700  1 leveldb: Generated table #293568: 19291 keys, 773711 bytes
2018-10-09 15:24:12.183109 7f40b2415700  1 leveldb: Generated table #293569: 10398 keys, 2123221 bytes
2018-10-09 15:24:12.185634 7f40b2415700  1 leveldb: Generated table #293570: 3080 keys, 54454 bytes
2018-10-09 15:24:12.188960 7f40b2415700  1 leveldb: Generated table #293571: 4822 keys, 85400 bytes
2018-10-09 15:24:12.244189 7f40b2415700  1 leveldb: Generated table #293572: 26542 keys, 2133394 bytes
2018-10-09 15:24:12.248450 7f40b2415700  1 leveldb: Generated table #293573: 4820 keys, 72908 bytes
2018-10-09 15:24:12.248462 7f40b2415700  1 leveldb: Compacted 1@0 + 8@1 files => 8952045 bytes
2018-10-09 15:24:12.248750 7f40b2415700  1 leveldb: compacted to: files[ 0 8 63 285 0 0 0 ]
2018-10-09 15:24:12.568564 7f40a31ed700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f40a31ed700 thread_name:tp_osd_tp

 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
 1: (()+0x961ee7) [0x5579b753eee7]
 2: (()+0xf890) [0x7f40c88e7890]
 3: (std::string::assign(std::string const&)+0x14) [0x7f40c727b2e4]
 4: (()+0xacb8c) [0x7f40b3ebbb8c]
 5: (ClassHandler::ClassMethod::exec(void*, ceph::buffer::list&, ceph::buffer::list&)+0x34) [0x5579b6fdd414]
 6: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x1df8) [0x5579b70dc808]
 7: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x61) [0x5579b70ec851]
 8: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x966) [0x5579b70f4996]
 9: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x323a) [0x5579b70f927a]
 10: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x727) [0x5579b70b13a7]
 11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x420) [0x5579b6f57b10]
12: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6a) [0x5579b6f57d6a]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x787) [0x5579b6f72a57]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8b6) [0x5579b76349a6]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5579b7636960]
 16: (()+0x8064) [0x7f40c88e0064]
 17: (clone()+0x6d) [0x7f40c69e162d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-10000> 2018-10-09 15:23:28.334705 7f4098fad700  1 -- 10.204.22.37:6859/142746 <== osd.155 10.204.22.23:0/29542 5316124 ==== osd_ping(ping e16677 stamp 2018-10-09 15:23:28.330696) v3 ==== 2004+0+0 (3938300591 0 0) 0x5579d6a94600 con 0x5579d036d080


Files

ceph-osd.323.log.gz (955 KB) ceph-osd.323.log.gz lin zhou, 10/10/2018 09:04 AM

Related issues 1 (1 open0 closed)

Related to rgw - Bug #26882: jewel: cls_rgw: avoid undefined iterator accessFix Under ReviewMatt Benjamin

Actions
Actions #2

Updated by John Spray over 5 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)
Actions #3

Updated by Greg Farnum over 5 years ago

  • Project changed from RADOS to rgw

Looks like a cls_rgw bug?

Actions #4

Updated by Casey Bodley over 5 years ago

  • Related to Bug #26882: jewel: cls_rgw: avoid undefined iterator access added
Actions #5

Updated by Casey Bodley over 5 years ago

  • Status changed from New to Duplicate

this bug only showed up in jewel, and there's a fix staged at https://github.com/ceph/ceph/pull/23495

Actions

Also available in: Atom PDF