Bug #23198
osd coredump ClassHandler::ClassMethod::exec
0%
Description
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
1: (()+0x92b18a) [0x7fe6696f018a]
2: (()+0xf370) [0x7fe66774e370]
3: (std::string::assign(std::string const&)+0x2c) [0x7fe666671fdc]
4: (()+0xb0e4d) [0x7fe62fd40e4d]
5: (ClassHandler::ClassMethod::exec(void*, ceph::buffer::list&, ceph::buffer::list&)+0x34) [0x7fe6691dd514]
6: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x3563) [0x7fe6692d0b83]
7: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0xbf) [0x7fe6692e568f]
8: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x920) [0x7fe6692e6570]
9: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x2843) [0x7fe6692ea4e3]
10: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x747) [0x7fe6692a63f7]
11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7fe6691596fd]
12: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d) [0x7fe66915994d]
13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x77b) [0x7fe66915d32b]
14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887) [0x7fe6697dd9f7]
15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe6697df960]
16: (()+0x7dc5) [0x7fe667746dc5]
17: (clone()+0x6d) [0x7fe665dd173d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
osd_op(client.18415392.0:115 7.eba9ff7b gc.27 [call rgw.gc_list] snapc 0=[] RETRY=31 ack+retry+read+known_if_redirected e19567)
-49> 2018-03-02 10:38:09.715214 7f1ad87fe700 0 <cls> cls/rgw/cls_rgw.cc:3223: gc_iterate_entries end_key=1_01519958289.715210348
=============
call rgw.gc_remove
call rgw.gc_list
===========
it seems like libcls_rgw.so has bug.
History
#1 Updated by John Spray about 6 years ago
- Project changed from Ceph to rgw
- Category deleted (
OSD)
#2 Updated by Yehuda Sadeh about 6 years ago
is that the same pool that you removed?
#3 Updated by Yong Wang about 6 years ago
the env didn't remove pool。
from communication with my colleague,He just delete objs from s3cmd and run radodgw-admin gc list.
It just happen one times not frequence like the radosgw
from bt ,it seems like string has invalid extrrnal ptr.
from assemble address。it is diffcult to confirm to which cls and method.
pmap -d not output in ceph self backtrace
I have test send kill -11 to my a.out that started systemd . but same way is invalid to radosgw and osd.
below is new setted in systemd configure files
LimitCORE=infinity
#4 Updated by Yong Wang about 6 years ago
they are in the same env .
but diffrent times. more than 1months.
#5 Updated by Yong Wang about 6 years ago
I have successful got the coredump configure with systemd.
If I get the coredump cls and method in the future,will paste it to here.
#6 Updated by Matt Benjamin about 6 years ago
- Status changed from New to Triaged
- Assignee set to Matt Benjamin