Actions
Feature #1217
closedidentify key performance/health metrics for osd
Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:
Description
per node-
ops/sec
bw/sec
read/write latency
per cluster-
missing/lost/degraded objects
pg status over time
osd status over time
recovery progress
Updated by Sage Weil almost 13 years ago
- Translation missing: en.field_position set to 702
Updated by Sage Weil almost 13 years ago
- Translation missing: en.field_story_points set to 5
- Translation missing: en.field_position deleted (
702) - Translation missing: en.field_position set to 702
Updated by Sage Weil almost 13 years ago
the current set of osd metrics:
osd_logtype.add_set(l_osd_opq, "opq"); // op queue length (waiting to be processed yet) osd_logtype.add_set(l_osd_op_wip, "op_wip"); // rep ops currently being processed (primary) osd_logtype.add_inc(l_osd_op, "op"); // client ops osd_logtype.add_inc(l_osd_op_inb, "op_inb"); // client op in bytes (writes) osd_logtype.add_inc(l_osd_op_outb, "op_outb"); // client op out bytes (reads) osd_logtype.add_inc(l_osd_op_lat, "op_lat"); // client op latency osd_logtype.add_inc(l_osd_op_r, "op_r"); // client reads osd_logtype.add_inc(l_osd_op_r_outb, "op_r_outb"); // client read out bytes osd_logtype.add_inc(l_osd_op_r_lat, "op_r_lat"); // client read latency osd_logtype.add_inc(l_osd_op_w, "op_w"); // client writes osd_logtype.add_inc(l_osd_op_w_inb, "op_w_inb"); // client write in bytes osd_logtype.add_inc(l_osd_op_w_rlat, "op_w_rlat"); // client write readable/applied latency osd_logtype.add_inc(l_osd_op_w_lat, "op_w_lat"); // client write latency osd_logtype.add_inc(l_osd_op_rw, "op_rw"); // client rmw osd_logtype.add_inc(l_osd_op_rw_inb, "op_rw_inb"); // client rmw in bytes osd_logtype.add_inc(l_osd_op_rw_outb,"op_rw_outb"); // client rmw out bytes osd_logtype.add_inc(l_osd_op_rw_rlat,"op_rw_rlat"); // client rmw readable/applied latency osd_logtype.add_inc(l_osd_op_rw_lat, "op_rw_lat"); // client rmw latency osd_logtype.add_inc(l_osd_sop, "sop"); // subops osd_logtype.add_inc(l_osd_sop_inb, "sop_inb"); // subop in bytes osd_logtype.add_inc(l_osd_sop_lat, "sop_lat"); // subop latency osd_logtype.add_inc(l_osd_sop_w, "sop_w"); // replicated (client) writes osd_logtype.add_inc(l_osd_sop_w_inb, "sop_w_inb"); // replicated write in bytes osd_logtype.add_inc(l_osd_sop_w_lat, "sop_w_lat"); // replicated write latency osd_logtype.add_inc(l_osd_sop_pull, "sop_pull"); // pull request osd_logtype.add_inc(l_osd_sop_pull_lat, "sop_pull_lat"); osd_logtype.add_inc(l_osd_sop_push, "sop_push"); // push (write) osd_logtype.add_inc(l_osd_sop_push_inb, "sop_push_inb"); osd_logtype.add_inc(l_osd_sop_push_lat, "sop_push_lat"); osd_logtype.add_inc(l_osd_pull, "pull"); // pull requests sent osd_logtype.add_inc(l_osd_push, "push"); // push messages osd_logtype.add_inc(l_osd_push_outb, "push_outb"); // pushed bytes osd_logtype.add_inc(l_osd_rop, "rop"); // recovery ops (started) osd_logtype.add_set(l_osd_loadavg, "loadavg"); osd_logtype.add_set(l_osd_buf, "buf"); // total ceph::buffer bytes osd_logtype.add_set(l_osd_pg, "numpg"); // num pgs osd_logtype.add_set(l_osd_pg_primary, "numpg_primary"); // num primary pgs osd_logtype.add_set(l_osd_pg_replica, "numpg_replica"); // num replica pgs osd_logtype.add_set(l_osd_pg_stray, "numpg_stray"); // num stray pgs osd_logtype.add_set(l_osd_hb_to, "hbto"); // heartbeat peers we send to osd_logtype.add_set(l_osd_hb_from, "hbfrom"); // heartbeat peers we recv from osd_logtype.add_inc(l_osd_map, "map"); // osdmap messages osd_logtype.add_inc(l_osd_mape, "mape"); // osdmap epochs osd_logtype.add_inc(l_osd_mape_dup, "mape_dup"); // dup osdmap epochs
Updated by Sage Weil almost 13 years ago
- Status changed from New to Resolved
the per-cluster stats are all included in pg dump: per-pg missing, degraded, unfound. sampling that and tracking per-pg progress over time is more easily done outside the monitor.
the current pg dump is tab-separated. we probably want to dump json or something (same goes for proflogger)...
anyway, closing this piece.
Actions