Actions
Bug #12523
closedosd suicide timeout during peering - search for missing objects
% Done:
0%
Source:
Community (dev)
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Peering thread hit suicide timeout, and the logs show that the thread was doing (more than 150 seconds) PG::MissingLoc::add_source_info, which should be able to reset the timeout.
Looking at PG::RecoveryState::start_handle, if there is messages to flush (messages_pending_flush = true), it will create a new RecoveryCtx which lose the original thread handle, as a result, the above procedure will not reset the timeout and triggered the crash.
Actions