Project

General

Profile

Bug #23198

osd coredump ClassHandler::ClassMethod::exec

Added by Yong Wang 9 months ago. Updated 9 months ago.

Status:
Triaged
Priority:
Normal
Assignee:
Target version:
Start date:
03/02/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:

Description

ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
1: (()+0x92b18a) [0x7fe6696f018a]
2: (()+0xf370) [0x7fe66774e370]
3: (std::string::assign(std::string const&)+0x2c) [0x7fe666671fdc]
4: (()+0xb0e4d) [0x7fe62fd40e4d]
5: (ClassHandler::ClassMethod::exec(void*, ceph::buffer::list&, ceph::buffer::list&)+0x34) [0x7fe6691dd514]
6: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x3563) [0x7fe6692d0b83]
7: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0xbf) [0x7fe6692e568f]
8: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x920) [0x7fe6692e6570]
9: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x2843) [0x7fe6692ea4e3]
10: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x747) [0x7fe6692a63f7]
11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7fe6691596fd]
12: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d) [0x7fe66915994d]
13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x77b) [0x7fe66915d32b]
14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887) [0x7fe6697dd9f7]
15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe6697df960]
16: (()+0x7dc5) [0x7fe667746dc5]
17: (clone()+0x6d) [0x7fe665dd173d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

osd_op(client.18415392.0:115 7.eba9ff7b gc.27 [call rgw.gc_list] snapc 0=[] RETRY=31 ack+retry+read+known_if_redirected e19567)
-49> 2018-03-02 10:38:09.715214 7f1ad87fe700 0 <cls> cls/rgw/cls_rgw.cc:3223: gc_iterate_entries end_key=1_01519958289.715210348

=============
call rgw.gc_remove

call rgw.gc_list

===========

it seems like libcls_rgw.so has bug.

History

#1 Updated by John Spray 9 months ago

  • Project changed from Ceph to rgw
  • Category deleted (OSD)

#2 Updated by Yehuda Sadeh 9 months ago

is that the same pool that you removed?

#3 Updated by Yong Wang 9 months ago

the env didn't remove pool。
from communication with my colleague,He just delete objs from s3cmd and run radodgw-admin gc list.

It just happen one times not frequence like the radosgw

from bt ,it seems like string has invalid extrrnal ptr.

from assemble address。it is diffcult to confirm to which cls and method.
pmap -d not output in ceph self backtrace

I have test send kill -11 to my a.out that started systemd . but same way is invalid to radosgw and osd.

below is new setted in systemd configure files
LimitCORE=infinity

#4 Updated by Yong Wang 9 months ago

they are in the same env .
but diffrent times. more than 1months.

#5 Updated by Yong Wang 9 months ago

I have successful got the coredump configure with systemd.
If I get the coredump cls and method in the future,will paste it to here.

#6 Updated by Matt Benjamin 9 months ago

  • Status changed from New to Triaged
  • Assignee set to Matt Benjamin

Also available in: Atom PDF