Project

General

Profile

Feature #44055

cephadm: make 'ls' faster

Added by Sage Weil 8 months ago. Updated 20 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
cephadm (binary)
Target version:
-
% Done:

80%

Source:
Tags:
performance
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

For both podman and docker, 'ps' tells you the image name but not its hash.

With podman, you can do:

- podman image list -a --format=json
- podman ps -a --format=json

and cross-reference against the image history list to see if there are multiple images with the name name+tag. If so, inspect those containers.

With docker, you can do

- docker ps --format='{{.ID}},{{.Image}},{{.Names}}'

Unlike podman, the .Image field here is the name+tag (podman only shows this if it is the most recent, AFACIS; otherwise you have to look in the history list). Similarly, though, if we see multiple images with the same name+tag, then we can inspect just those containers.

The net of this would be two very fast (~100ms) commands instead of an inspect for every container (which nets out to ~3.5 seconds on my host with ~25 containers).


Related issues

Related to Orchestrator - Feature #47139: Require a minimum version for podman/docker New

History

#1 Updated by Sebastian Wagner 8 months ago

  • Description updated (diff)
  • Category set to cephadm (binary)

#2 Updated by Sebastian Wagner 6 months ago

  • Tags set to low-hanging-fruit

#3 Updated by Adam King 4 months ago

  • Status changed from New to In Progress
  • Assignee set to Adam King
  • Pull request ID set to 35491

#4 Updated by Sebastian Wagner 3 months ago

  • Tags changed from low-hanging-fruit to performance

#5 Updated by Sebastian Wagner 3 months ago

  • Status changed from In Progress to New
  • Assignee deleted (Adam King)

I think a requirements for future refactorization here is:

  • having a vera good pytest coverage with example outputs form the different tools and versions.
  • read the PR 35491

#6 Updated by Paul Cuzner 28 days ago

  • Assignee set to Paul Cuzner
  • Pull request ID deleted (35491)

I think there are a couple of things that we need to change in our approach
1. Don't inspect the containers, and exec the containers to pick up ceph version
2. use the ps -a approach to gather the bulk of the info that we need in one hit
3. add the ceph version information back within the mgr module. The mgr modules have access to all of the ceph versions for every daemon, so we don't need to exec into the containers at all
4. put more stringent version dependencies on docker/podman to ensure the output we need is consistent. This narrows the testing effort, and has the potential of reducing code complexity that old and new versions of podman/docker may reveal.

I'll add a link here to a gist for people to try. By adopting the above approach the ls command on my physical test system (10 osds, mon, mgr, prometheus, node-exporter, alertmanager etc goes from 10s to .5s.

#7 Updated by Paul Cuzner 28 days ago

  • Status changed from New to In Progress

here's the gist https://gist.github.com/pcuzner/c8940e4af5f817b817640e97bff50e91

to test, just download to a ceph host and run (it will pick the first fsid it finds in /var/lib/ceph)

image_id is visible from podman but not docker - so I've excluded from the o/p for consistency. I don't think we actually need image_id for the orchestrator and UI interactions

ceph_version is also missing. Getting it without a label on the container is the main issue with the elapsed time of the existing code. My suggestion is that we don't supply it in the 'ls' output at all. The call to 'ls' is from orch, which is a mgr module, so the ceph metadata can be added to the 'ls' data, back in the mgr/cephadm code.

If we can keep the runtime low, we can run the 'ls' more frequently - perhaps like a prometheus scrape (every 10s!).

#8 Updated by Sebastian Wagner 28 days ago

  • Related to Feature #47139: Require a minimum version for podman/docker added

#9 Updated by Paul Cuzner 21 days ago

  • % Done changed from 0 to 50

#10 Updated by Paul Cuzner 20 days ago

  • Status changed from In Progress to Fix Under Review
  • % Done changed from 50 to 80
  • Pull request ID set to 36958

PR submitted

Also available in: Atom PDF