Project

General

Profile

Bug #40601

osd: osd being wrongly reported down because of getloadavg taking hearbeat_lock for too long

Added by dongdong tao over 4 years ago. Updated over 4 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

currently OSD::heartbeat() will call getloadavg() to get the load info.
Since getloadavg is just open the file /proc/loadavg and read that file
Most of the time getloadavg will return very quickly. but it is possible for getloadavg to take long time since it didn't set NONBLOCK for open and read.

In our case, we found that open("/proc/loadavg") takes long time (multiple seconds)

Wrote a simple C program as below:
---
#define _BSD_SOURCE
#include <stdlib.h>
#include <stdio.h>

int main() {
double loadavgs1;

getloadavg(loadavgs, 1);
printf("loadavgs: %f\n", loadavgs0);

return 0;
}
---

we can get below result from time to time.

$ time ./getloadavg
loadavgs: 9.390000

real 0m5.078s
user 0m0.000s
sys 0m0.012s

We found it could be stalled at open("/proc/loadavg"), however, it's not clear why it takes so long time, might be many process on the machine that tried to access directory /proc at the same time.
But, I think as long as we know that getloadavg might stuck at open or read which might be blocked for some reason.
We can't call getloadavg() at OSD::heartbeat() which is a time sensitive thread and it will hold the heartbeat_lock.
When the heartbeat_lock is being hold, heartbeat_check() will wait for it, and unable to process the osd_ping_reply message in time which will cause the osd report other osd failure.
And this will cause osd flapping.

History

#2 Updated by Kefu Chai over 4 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to dongdong tao
  • Pull request ID set to 28799

Also available in: Atom PDF