Project

General

Profile

Actions

Bug #12523

closed

osd suicide timeout during peering - search for missing objects

Added by Guang Yang almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Peering thread hit suicide timeout, and the logs show that the thread was doing (more than 150 seconds) PG::MissingLoc::add_source_info, which should be able to reset the timeout.

Looking at PG::RecoveryState::start_handle, if there is messages to flush (messages_pending_flush = true), it will create a new RecoveryCtx which lose the original thread handle, as a result, the above procedure will not reset the timeout and triggered the crash.


Related issues 2 (0 open2 closed)

Related to Ceph - Bug #9128: Newly-restarted OSD may suicide itself after hitting suicide time out value because it may need to search huge amount of objectsResolved08/15/2014

Actions
Copied to Ceph - Backport #12844: osd suicide timeout during peering - search for missing objectsResolvedLoïc Dachary07/29/2015Actions
Actions

Also available in: Atom PDF