Bug #65660
openmds: drop client metrics during recovery
0%
Description
When the rank is coming up, there's little reason to record historical metrics from the clients. We've also seen floods of these metrics messages slow down up:rejoin significantly.
Updated by Christopher Hoffman 9 days ago
there's little reason to record historical metrics from the clients
Can you expand on this? Are we losing anything by dropping them?
Updated by Patrick Donnelly 9 days ago
Xiubo Li wrote in #note-1:
Is this new in the upstream master ? As I remembered we have improved this and the clients will only send the metrics when the MDS is ready, which is in active state.
That may be but the MDS should be resilient to older clients.
Updated by Patrick Donnelly 9 days ago
Christopher Hoffman wrote in #note-2:
there's little reason to record historical metrics from the clients
Can you expand on this? Are we losing anything by dropping them?
The metrics are there to provide a real-time view of performance for clients. Jos or Venky may correct me if I'm wrong but I don't think there is any use for past metrics.
Updated by Venky Shankar 2 days ago
Patrick Donnelly wrote in #note-4:
Christopher Hoffman wrote in #note-2:
there's little reason to record historical metrics from the clients
Can you expand on this? Are we losing anything by dropping them?
The metrics are there to provide a real-time view of performance for clients. Jos or Venky may correct me if I'm wrong but I don't think there is any use for past metrics.
Unless the metrics are persisted by the exporter daemon, which is currently in the works. So, we do not loose anything by dropping them.
Updated by Dhairya Parmar 2 days ago
Venky Shankar wrote in #note-5:
Patrick Donnelly wrote in #note-4:
Christopher Hoffman wrote in #note-2:
there's little reason to record historical metrics from the clients
Can you expand on this? Are we losing anything by dropping them?
The metrics are there to provide a real-time view of performance for clients. Jos or Venky may correct me if I'm wrong but I don't think there is any use for past metrics.
Unless the metrics are persisted by the exporter daemon, which is currently in the works. So, we do not loose anything by dropping them.
So you mean It'd be fine to have past metrics once they are being persisted by the exporter daemon? So once the exporter daemon is operational, we'd need to have this code adjusted again, right?