Feature #50614: [pwl] enhance "rbd status" output and periodically update it - rbd - Ceph

Actions

Copy link

Feature #50614

closed

[pwl] enhance "rbd status" output and periodically update it

Added by Ilya Dryomov almost 3 years ago. Updated almost 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

CONGMIN YIN

Target version:

% Done:

Source:

Tags:

Backport:

pacific,quincy

Reviewed:

Affected Versions:

Pull request ID:

45684

Description

"Image cache state" section is very confusing because it is effectively a snapshot from the time the cache was loaded. It is not updated until the cache is orderly closed. A dirty cache can be reported as clean and so on...

Also, no metrics of any kind are included. It shouldn't take an admin socket, two different configuration options and a grep though debug output and/or raw perf counters to get an idea of how the cache is doing.

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Ilya Dryomov almost 3 years ago

Related to Bug #50613: [pwl] "rbd status" output is incorrect added

Actions

Copy link

Updated by CONGMIN YIN over 2 years ago

Assignee set to CONGMIN YIN

Actions

Copy link

Updated by CONGMIN YIN over 2 years ago

no metrics of any kind are included. It shouldn't take an admin socket, two different configuration options and a grep though debug output and/or raw perf counters to get an idea of how the cache is doing.

Hi @Ilya Dryomov, I don't quite understand the second sentence in the description. Can you explain the problem more?

Actions

Copy link

Updated by Ilya Dryomov over 2 years ago

There is some useful data collected in the form of perf counters, such as the number of hits, the number of bytes read from the cache, various latencies, etc. See AbstractWriteLog::perf_start(). But perf counters are rather hard to access on the client side: an admin socket may not be set up, if it is set up one needs to find the right one and then manually grab the data with "ceph --admin-daemon ... perf dump" or similar. If the workload is restarted, a different admin socket gets created (usually) so automating the collection and aggregation with external tools is a pain.

A "grep through debug output" refers to AbstractWriteLog::periodic_stats(). Again, very useful data, but in order to get to it, one needs to set "debug rbd pwl = 1" and "rbd_persistent_cache_log_periodic_stats = true" and grep the log file for "STATS:". And again, the log file may not be set up, etc.

"rbd status" should be taught to report some of this data. Not all of it -- just what is immediately useful to the end user, some kind of "at a glance" view.

Actions

Copy link