Actions
Bug #45388
openInsufficient monitor logging to diagnose downed OSDs
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
low-hanging-fruit
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We just had a case where in a Ceph Luminous cluster the monitor forced newly started OSDs to commit suicide. Communication between monitor and OSD were fine, but the OSD went down with a log message that the monitor forced it to commit suicide, Only after increasing the debug level we found that some OSDs reported the OSD down and thus the monitor took action forcing the OSD process to stop.
If a monitor forces an OSD to commit suicide the reason why must be reported in the monitor log at default log level, including the OSDs which reported the OSD down.
Impact: Troubleshooting on production clusters is always in a time crunch, so having the reasons reported often makes the difference between maintaining SLAs and breaking them.
Updated by Sage Weil almost 3 years ago
- Project changed from Ceph to RADOS
- Category deleted (
Monitor)
Updated by Laura Flores almost 2 years ago
- Translation missing: en.field_tag_list set to low-hanging-fruit
Updated by Laura Flores 10 months ago
- Translation missing: en.field_tag_list changed from low-hanging-fruit to low-hanging-fruit, open-source-day
Actions