Project

General

Profile

Actions

Feature #1217

closed

identify key performance/health metrics for osd

Added by Sage Weil almost 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

per node-
ops/sec
bw/sec
read/write latency

per cluster-
missing/lost/degraded objects
pg status over time
osd status over time
recovery progress

Actions #1

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_position set to 702
Actions #2

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_story_points set to 5
  • Translation missing: en.field_position deleted (702)
  • Translation missing: en.field_position set to 702
Actions #3

Updated by Sage Weil almost 13 years ago

  • Assignee set to Sage Weil
Actions #4

Updated by Sage Weil almost 13 years ago

the current set of osd metrics:


    osd_logtype.add_set(l_osd_opq, "opq");       // op queue length (waiting to be processed yet)
    osd_logtype.add_set(l_osd_op_wip, "op_wip");   // rep ops currently being processed (primary)

    osd_logtype.add_inc(l_osd_op,       "op");           // client ops
    osd_logtype.add_inc(l_osd_op_inb,   "op_inb");       // client op in bytes (writes)
    osd_logtype.add_inc(l_osd_op_outb,  "op_outb");      // client op out bytes (reads)
    osd_logtype.add_inc(l_osd_op_lat,   "op_lat");       // client op latency

    osd_logtype.add_inc(l_osd_op_r,      "op_r");        // client reads
    osd_logtype.add_inc(l_osd_op_r_outb, "op_r_outb");   // client read out bytes
    osd_logtype.add_inc(l_osd_op_r_lat,  "op_r_lat");    // client read latency
    osd_logtype.add_inc(l_osd_op_w,      "op_w");        // client writes
    osd_logtype.add_inc(l_osd_op_w_inb,  "op_w_inb");    // client write in bytes
    osd_logtype.add_inc(l_osd_op_w_rlat, "op_w_rlat");   // client write readable/applied latency
    osd_logtype.add_inc(l_osd_op_w_lat,  "op_w_lat");    // client write latency
    osd_logtype.add_inc(l_osd_op_rw,     "op_rw");       // client rmw
    osd_logtype.add_inc(l_osd_op_rw_inb, "op_rw_inb");   // client rmw in bytes
    osd_logtype.add_inc(l_osd_op_rw_outb,"op_rw_outb");  // client rmw out bytes
    osd_logtype.add_inc(l_osd_op_rw_rlat,"op_rw_rlat");  // client rmw readable/applied latency
    osd_logtype.add_inc(l_osd_op_rw_lat, "op_rw_lat");   // client rmw latency

    osd_logtype.add_inc(l_osd_sop,       "sop");         // subops
    osd_logtype.add_inc(l_osd_sop_inb,   "sop_inb");     // subop in bytes
    osd_logtype.add_inc(l_osd_sop_lat,   "sop_lat");     // subop latency

    osd_logtype.add_inc(l_osd_sop_w,     "sop_w");          // replicated (client) writes
    osd_logtype.add_inc(l_osd_sop_w_inb, "sop_w_inb");      // replicated write in bytes
    osd_logtype.add_inc(l_osd_sop_w_lat, "sop_w_lat");      // replicated write latency
    osd_logtype.add_inc(l_osd_sop_pull,     "sop_pull");       // pull request
    osd_logtype.add_inc(l_osd_sop_pull_lat, "sop_pull_lat");
    osd_logtype.add_inc(l_osd_sop_push,     "sop_push");       // push (write)
    osd_logtype.add_inc(l_osd_sop_push_inb, "sop_push_inb");
    osd_logtype.add_inc(l_osd_sop_push_lat, "sop_push_lat");

    osd_logtype.add_inc(l_osd_pull,      "pull");       // pull requests sent
    osd_logtype.add_inc(l_osd_push,      "push");       // push messages
    osd_logtype.add_inc(l_osd_push_outb, "push_outb");  // pushed bytes

    osd_logtype.add_inc(l_osd_rop, "rop");       // recovery ops (started)

    osd_logtype.add_set(l_osd_loadavg, "loadavg");
    osd_logtype.add_set(l_osd_buf, "buf");       // total ceph::buffer bytes

    osd_logtype.add_set(l_osd_pg, "numpg");   // num pgs
    osd_logtype.add_set(l_osd_pg_primary, "numpg_primary"); // num primary pgs
    osd_logtype.add_set(l_osd_pg_replica, "numpg_replica"); // num replica pgs
    osd_logtype.add_set(l_osd_pg_stray, "numpg_stray");   // num stray pgs
    osd_logtype.add_set(l_osd_hb_to, "hbto");     // heartbeat peers we send to
    osd_logtype.add_set(l_osd_hb_from, "hbfrom"); // heartbeat peers we recv from
    osd_logtype.add_inc(l_osd_map, "map");           // osdmap messages
    osd_logtype.add_inc(l_osd_mape, "mape");         // osdmap epochs
    osd_logtype.add_inc(l_osd_mape_dup, "mape_dup"); // dup osdmap epochs

Actions #5

Updated by Sage Weil almost 13 years ago

  • Status changed from New to Resolved

the per-cluster stats are all included in pg dump: per-pg missing, degraded, unfound. sampling that and tracking per-pg progress over time is more easily done outside the monitor.

the current pg dump is tab-separated. we probably want to dump json or something (same goes for proflogger)...

anyway, closing this piece.

Actions

Also available in: Atom PDF