Project

General

Profile

Actions

Feature #45410

closed

cephadm: Support upgrading alertmanager, grafana, prometheus and node_exporter

Added by Sebastian Wagner almost 4 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
cephadm/monitoring
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Right now, we're simply downloading :latest, which might even differ between daemons on different hosts.


Subtasks 5 (0 open5 closed)

Feature #45463: cephadm: allow custom images for grafana, prometheus, alertmanager and node_exporterResolvedPatrick Seidensal

Actions
Feature #45859: cephadm: use fixed versionsResolvedPatrick Seidensal

Actions
Documentation #45860: cephadm: document upgrades of monitoring componentsRejectedPatrick Seidensal

Actions
Feature #45864: cephadm: include monitoring components in usual upgrade processResolved

Actions
Feature #46499: Requesting a "ceph orch redeploy monitoring" command, as an option, so user does not have to issue four separate commands to update the monitoring stack to the latest versionsRejected

Actions

Related issues 2 (0 open2 closed)

Related to Orchestrator - Documentation #45411: cephadm: add section about container imagesResolvedZac Dover

Actions
Related to Dashboard - Bug #45908: monitoring: Status Panel breaks with Grafana 6.7.0 (maybe 7.x too)Can't reproduce

Actions
Actions #1

Updated by Sebastian Wagner almost 4 years ago

  • Category changed from cephadm to cephadm/monitoring
Actions #2

Updated by Juan Miguel Olmo Martínez almost 4 years ago

It would be nice to have this two things:
1. Use by default fixed versions images of the different components of the monitoring stack
2. Provide an easy way to use other images.

Actions #3

Updated by Patrick Seidensal almost 4 years ago

This might not be an issue for minor version upgrades in Grafana and Prometheus, although it would be hard to guarantee that (if minor versions can be upgraded without us being able to verify that they work as expected). But I think it is necessary to be able to upgrade minor versions for security reasons. I'm not even sure if we can use some kind of tag or label to achieve that. We should prevent upgrades to new and (on our side) untested major versions. If there's no mechanism to achieve that, we might need to stick to fixed versions. But then it is our responsibility to upgrade promptly if security issues have been fixed and those versions have been released.

I'm a little bit concerned about the Node exporter, though. Minor version upgrades have broken metric names in the past. On the other hand, there's a pre-release of v1.0.0 available which might indicate that things as metric names might stay stable within minor version upgrades in the future. Of course this won't be an issue if we decide to use fixed versions and test all (including minor version) upgrades beforehand.

Actions #4

Updated by Sebastian Wagner almost 4 years ago

Actions #5

Updated by Alfonso Martínez almost 4 years ago

These are the monitoring stack versions that we use in our nautilus-based releases:
grafana: 5.4.3
prometheus: v2.7.2
alertmanager: 0.16.2
node_exporter: 0.17.0

grafana plugins:
https://docs.ceph.com/docs/master/mgr/dashboard/#enabling-the-embedding-of-grafana-dashboards
Not sure, but we can assume that today latest versions of mentioned plugins here are a good start:
grafana-piechart-panel: 1.4.0
vonage-status-panel: 1.0.9

Actions #6

Updated by Patrick Seidensal almost 4 years ago

This are our current versions

Grafana 5.3.3
Alertmanager 0.16.2
Prometheus 2.11.1
Node exporter 0.17.0

grafana-piechart-panel 1.3.6
grafana-status-panel 1.0.9

Actions #7

Updated by Patrick Seidensal almost 4 years ago

It currently seems that using fixed versions for monitoring stack containers are the only way to be ensure that major versions of those applications aren't automatically updated. Being responsible to publish updates, that won't break functionality, includes a responsibility to upgrade those applications when security issues arise.

To be at least notified about upcoming security vulnerabilities, Clair can be used to have those images checked, even automatically.

Clair is a tool that checks for security vulnerabilities in container images. It uses a a PostgresQL database, which is by default populated with CVEs from various different sources like the Debian Security Bug Tracker, Ubuntu CVE Tracker, Red Hat Security Data, SUSE OVAL Descriptions and others.

It is used in products like quay.io from Red Hat.

Clair can be integrated into Container registries for automatic security checks, like the aforementioned product.

Through tools like klar, clair can be used without an integration to registries and is capable of checking local and remote container images for security vulnerabilities.

This is how it such a check might look like:

user@home ~ » CLAIR_ADDR=localhost klar grafana/grafana:latest                                                                                                        

clair timeout 1m0s
docker timeout: 1m0s
no whitelist file
Analysing 8 layers
Got results from Clair API v1
Found 0 vulnerabilities
Actions #8

Updated by Patrick Seidensal almost 4 years ago

  • Related to Bug #45908: monitoring: Status Panel breaks with Grafana 6.7.0 (maybe 7.x too) added
Actions #9

Updated by Sebastian Wagner almost 3 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF