Actions
Bug #62529
openPrimaryLogPG::log_op_stats uses `now` time vs op end time when calculating op latency
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
low-hanging-fruit
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
- With `debug_osd` 15 or higher, a log line called `log_op_stats` is output, which ends with a reported latency value `lat`.
- This line can be output several 10's of milliseconds after the op actually completes.
- However, the `PrimaryLogPG::log_op_stats` uses `ceph_clock_now` as the time for calculating the op duration (lat) instead of the time when the op completed.
- This can result in an arbitrarily higher latency value being reported.
https://github.com/ceph/ceph/blob/main/src/osd/PrimaryLogPG.cc#L4384-L4387
Updated by Radoslaw Zarzynski 8 months ago
IIUC the linked method is being called asynchronously, after the op got completed.
However, the debug there claims the op has the latency calculated in the following way:
const utime_t now = ceph_clock_now(); const utime_t latency = now - m->get_recv_stamp();
now
should be actually the op's completion time.
Updated by Laura Flores 8 months ago
- Translation missing: en.field_tag_list set to low-hanging-fruit
Updated by Laura Flores 8 months ago
I'll bring this to Grace Hopper Open Source Day!
Actions