Actions
Bug #21006
closedassert in can_discard_replica_op
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
recently, we got an assert in function can_discard_replica_op() when
osd is handling replica op reply. The assert is caused by
get_down_at() which checks if the source osd is still exists(),
otherwise it assert.
seems in our testing environment, the source osd send an op reply to
primary osd and then died.
should we first check exists() and avoid the assert happen
in get_down_at() or it's expected to be always exists() at this
situation?
1: (()+0x9322fd) [0x7f6c7bbe52fd]
2: (()+0xf100) [0x7f6c79a1e100]
3: (gsignal()+0x37) [0x7f6c77fe05f7]
4: (abort()+0x148) [0x7f6c77fe1ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7f6c7bce2b16]
6: (()+0x30cc20) [0x7f6c7b5bfc20]
7: (bool PG::can_discard_replica_op<MOSDRepOpReply, 113>(std::shared_ptr<OpRequest>&)+0xd5) [0x7f6c7b73a595]
8: (PG::can_discard_request(std::shared_ptr<OpRequest>&)+0x1c5) [0x7f6c7b6f5095]
9: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x99) [0x7f6c7b797419]
10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x405) [0x7f6c7b6493b5]
11: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d) [0x7f6c7b6495cd]
12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x7f6c7b64e1e9]
13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887) [0x7f6c7bcd2907]
14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f6c7bcd4870]
15: (()+0x7dc5) [0x7f6c79a16dc5]
16: (clone()+0x6d) [0x7f6c780a1ced]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions