Bug #18021: Assertion "needs_recovery" fails when balance_read reaches a replica OSD where the target object is not recovered yet. - RADOS - Ceph

Actions

Copy link

Bug #18021

closed

Assertion "needs_recovery" fails when balance_read reaches a replica OSD where the target object is not recovered yet.

Added by Xuehan Xu over 7 years ago. Updated almost 7 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Xuehan Xu

Category:

Dev Interfaces

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v0.94.5

ceph-qa-suite:

Component(RADOS):

OSD, Objecter, librados

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2016-10-25 19:00:00.626567 7f9a63bff700 -1 error_msg osd/ReplicatedPG.cc: In function 'void ReplicatedPG::wait_for_unreadable_object(const hobject_t&, OpRequestRef)' thread 7f9a63bff700 time 2016-10-25 19:00:00.624499
osd/ReplicatedPG.cc: 387: FAILED assert(needs_recovery)

ceph version 0.94.5-12-g83f56a1 (83f56a1c84e3dbd95a4c394335a7b1dc926dd1c4)
 1: (ReplicatedPG::wait_for_unreadable_object(hobject_t const&, std::tr1::shared_ptr&lt;OpRequest&gt;)+0x3f5) [0x8b5a65]
 2: (ReplicatedPG::do_op(std::tr1::shared_ptr&lt;OpRequest&gt;&)+0x5e9) [0x8f0c79]
 3: (ReplicatedPG::do_request(std::tr1::shared_ptr&lt;OpRequest&gt;&, ThreadPool::TPHandle&)+0x4e3) [0x87fdc3]
 4: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x178) [0x66b3f8]
 5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59e) [0x66f8ee]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x795) [0xa76d85]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa7a610]
 8: /lib64/libpthread.so.0() [0x3471407a51]
 9: (clone()+0x6d) [0x34710e893d]
 NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

Actions

Copy link

Updated by Xuehan Xu over 7 years ago

In my test, when encountering a large number of "balance_reads", the OSDs can be so busy that they can't send heartbeats in time, which could lead to monitors wrongly mark them down and triggers other OSDs to go through peering+recovery+process during which, on the replica OSDs, the assertion "needs_recovery" at ReplicatedPG.cc:387 has a large probability to fail.

To find the cause to this, I did some extra testing. If I write extra code to make the recovery of some object wait for those ops with the type "CEPH_MSG_OSD_OP" targeting this object to finish, the assertion "needs_recovery" at ReplicatedPG.cc:387 will always fail. And on the other hand, if I make those ops with the type "CEPH_MSG_OSD_OP" targeting some object wait for the corresponding recovery to finish, the assertion won't be triggered.

Actions

Copy link