Project

General

Profile

Actions

Feature #50614

closed

[pwl] enhance "rbd status" output and periodically update it

Added by Ilya Dryomov almost 3 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy
Reviewed:
Affected Versions:
Pull request ID:

Description

"Image cache state" section is very confusing because it is effectively a snapshot from the time the cache was loaded. It is not updated until the cache is orderly closed. A dirty cache can be reported as clean and so on...

Also, no metrics of any kind are included. It shouldn't take an admin socket, two different configuration options and a grep though debug output and/or raw perf counters to get an idea of how the cache is doing.


Related issues 3 (0 open3 closed)

Related to rbd - Bug #50613: [pwl] "rbd status" output is incorrectResolvedIlya Dryomov

Actions
Copied to rbd - Backport #55292: pacific: [pwl] enhance "rbd status" output and periodically update itResolvedIlya DryomovActions
Copied to rbd - Backport #55293: quincy: [pwl] enhance "rbd status" output and periodically update itResolvedIlya DryomovActions
Actions #1

Updated by Ilya Dryomov almost 3 years ago

  • Related to Bug #50613: [pwl] "rbd status" output is incorrect added
Actions #2

Updated by CONGMIN YIN over 2 years ago

  • Assignee set to CONGMIN YIN
Actions #3

Updated by CONGMIN YIN over 2 years ago

no metrics of any kind are included. It shouldn't take an admin socket, two different configuration options and a grep though debug output and/or raw perf counters to get an idea of how the cache is doing.

Hi @Ilya Dryomov, I don't quite understand the second sentence in the description. Can you explain the problem more?

Actions #4

Updated by Ilya Dryomov over 2 years ago

There is some useful data collected in the form of perf counters, such as the number of hits, the number of bytes read from the cache, various latencies, etc. See AbstractWriteLog::perf_start(). But perf counters are rather hard to access on the client side: an admin socket may not be set up, if it is set up one needs to find the right one and then manually grab the data with "ceph --admin-daemon ... perf dump" or similar. If the workload is restarted, a different admin socket gets created (usually) so automating the collection and aggregation with external tools is a pain.

A "grep through debug output" refers to AbstractWriteLog::periodic_stats(). Again, very useful data, but in order to get to it, one needs to set "debug rbd pwl = 1" and "rbd_persistent_cache_log_periodic_stats = true" and grep the log file for "STATS:". And again, the log file may not be set up, etc.

"rbd status" should be taught to report some of this data. Not all of it -- just what is immediately useful to the end user, some kind of "at a glance" view.

Actions #5

Updated by Ilya Dryomov about 2 years ago

  • Status changed from New to In Progress
  • Pull request ID set to 45684

The "dirty cache can be reported as clean" part has been addressed in https://github.com/ceph/ceph/pull/45660.

Actions #6

Updated by Ilya Dryomov about 2 years ago

  • Backport set to pacific,quincy
Actions #7

Updated by Ilya Dryomov about 2 years ago

  • Status changed from In Progress to Pending Backport
Actions #8

Updated by Backport Bot about 2 years ago

  • Copied to Backport #55292: pacific: [pwl] enhance "rbd status" output and periodically update it added
Actions #9

Updated by Backport Bot about 2 years ago

  • Copied to Backport #55293: quincy: [pwl] enhance "rbd status" output and periodically update it added
Actions #10

Updated by Ilya Dryomov almost 2 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF