cephadm: Support upgrading alertmanager, grafana, prometheus and node_exporter
Right now, we're simply downloading :latest, which might even differ between daemons on different hosts.
#3 Updated by Patrick Seidensal 5 months ago
This might not be an issue for minor version upgrades in Grafana and Prometheus, although it would be hard to guarantee that (if minor versions can be upgraded without us being able to verify that they work as expected). But I think it is necessary to be able to upgrade minor versions for security reasons. I'm not even sure if we can use some kind of tag or label to achieve that. We should prevent upgrades to new and (on our side) untested major versions. If there's no mechanism to achieve that, we might need to stick to fixed versions. But then it is our responsibility to upgrade promptly if security issues have been fixed and those versions have been released.
I'm a little bit concerned about the Node exporter, though. Minor version upgrades have broken metric names in the past. On the other hand, there's a pre-release of v1.0.0 available which might indicate that things as metric names might stay stable within minor version upgrades in the future. Of course this won't be an issue if we decide to use fixed versions and test all (including minor version) upgrades beforehand.
#5 Updated by Alfonso Martínez 5 months ago
These are the monitoring stack versions that we use in our nautilus-based releases:
Not sure, but we can assume that today latest versions of mentioned plugins here are a good start:
#7 Updated by Patrick Seidensal 5 months ago
It currently seems that using fixed versions for monitoring stack containers are the only way to be ensure that major versions of those applications aren't automatically updated. Being responsible to publish updates, that won't break functionality, includes a responsibility to upgrade those applications when security issues arise.
To be at least notified about upcoming security vulnerabilities, Clair can be used to have those images checked, even automatically.
Clair is a tool that checks for security vulnerabilities in container images. It uses a a PostgresQL database, which is by default populated with CVEs from various different sources like the Debian Security Bug Tracker, Ubuntu CVE Tracker, Red Hat Security Data, SUSE OVAL Descriptions and others.
It is used in products like quay.io from Red Hat.
Clair can be integrated into Container registries for automatic security checks, like the aforementioned product.
Through tools like klar, clair can be used without an integration to registries and is capable of checking local and remote container images for security vulnerabilities.
This is how it such a check might look like:
user@home ~ » CLAIR_ADDR=localhost klar grafana/grafana:latest clair timeout 1m0s docker timeout: 1m0s no whitelist file Analysing 8 layers Got results from Clair API v1 Found 0 vulnerabilities