Project

General

Profile

Bug #45628

cephadm qa: smoke should verify daemons are actually running

Added by Sebastian Wagner 8 months ago. Updated 5 days ago.

Status:
Fix Under Review
Priority:
Urgent
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
low-hanging-fruit
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

RGW failed:

2020-05-20T13:08:09.186 INFO:teuthology.orchestra.run.smithi203.stdout:NAME              HOST       STATUS          REFRESHED  AGE   VERSION                   IMAGE NAME                                                     IMAGE ID      CONTAINER ID
2020-05-20T13:08:09.186 INFO:teuthology.orchestra.run.smithi203.stdout:alertmanager.a    smithi203  running (47s)   33s ago    75s   0.20.0                    docker.io/prom/alertmanager:latest                             0881eb8f169f  9bcf1765c9f6
2020-05-20T13:08:09.187 INFO:teuthology.orchestra.run.smithi203.stdout:grafana.a         smithi060  running (58s)   31s ago    58s   6.6.2                     docker.io/ceph/ceph-grafana:latest                             87a51ecf0b1c  8731e3e51a0c
2020-05-20T13:08:09.187 INFO:teuthology.orchestra.run.smithi203.stdout:mgr.x             smithi060  running (4m)    31s ago    4m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  1cd43976a17e
2020-05-20T13:08:09.187 INFO:teuthology.orchestra.run.smithi203.stdout:mgr.y             smithi203  running (5m)    33s ago    5m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  678f88e3c420
2020-05-20T13:08:09.187 INFO:teuthology.orchestra.run.smithi203.stdout:mon.a             smithi203  running (5m)    33s ago    6m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  68e1b9162747
2020-05-20T13:08:09.187 INFO:teuthology.orchestra.run.smithi203.stdout:mon.b             smithi060  running (4m)    31s ago    4m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  d1383c8a0cf6
2020-05-20T13:08:09.188 INFO:teuthology.orchestra.run.smithi203.stdout:mon.c             smithi203  running (4m)    33s ago    4m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  27a1a4d7af30
2020-05-20T13:08:09.188 INFO:teuthology.orchestra.run.smithi203.stdout:node-exporter.a   smithi203  running (80s)   33s ago    85s   0.18.1                    docker.io/prom/node-exporter:latest                            e5a616e4b9cf  e725ba55bfd7
2020-05-20T13:08:09.188 INFO:teuthology.orchestra.run.smithi203.stdout:node-exporter.b   smithi060  running (82s)   31s ago    86s   0.18.1                    docker.io/prom/node-exporter:latest                            e5a616e4b9cf  da71c458ed71
2020-05-20T13:08:09.188 INFO:teuthology.orchestra.run.smithi203.stdout:osd.0             smithi203  running (3m)    33s ago    3m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  fbd8df58b740
2020-05-20T13:08:09.188 INFO:teuthology.orchestra.run.smithi203.stdout:osd.1             smithi203  running (3m)    33s ago    3m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  f82a0984e8cb
2020-05-20T13:08:09.188 INFO:teuthology.orchestra.run.smithi203.stdout:osd.2             smithi203  running (3m)    33s ago    3m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  885fb5dfd287
2020-05-20T13:08:09.189 INFO:teuthology.orchestra.run.smithi203.stdout:osd.3             smithi203  running (2m)    33s ago    2m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  4e6e5b008f2e
2020-05-20T13:08:09.189 INFO:teuthology.orchestra.run.smithi203.stdout:osd.4             smithi060  running (2m)    31s ago    2m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  f1714bd9a240
2020-05-20T13:08:09.189 INFO:teuthology.orchestra.run.smithi203.stdout:osd.5             smithi060  running (2m)    31s ago    2m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  e00f2801348c
2020-05-20T13:08:09.189 INFO:teuthology.orchestra.run.smithi203.stdout:osd.6             smithi060  running (2m)    31s ago    2m    16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  73b01fddb7dd
2020-05-20T13:08:09.189 INFO:teuthology.orchestra.run.smithi203.stdout:osd.7             smithi060  running (107s)  31s ago    110s  16.0.0-1734-gc1cc5045b00  quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  06207838c6be  cebb5c6bf000
2020-05-20T13:08:09.190 INFO:teuthology.orchestra.run.smithi203.stdout:prometheus.a      smithi060  running (42s)   31s ago    88s   2.18.1                    docker.io/prom/prometheus:latest                               de242295e225  34d837c4f530
2020-05-20T13:08:09.190 INFO:teuthology.orchestra.run.smithi203.stdout:rgw.realm.zone.a  smithi203  unknown         33s ago    102s  <unknown>                 quay.io/ceph-ci/ceph:c1cc5045b00842201e98ed965e87b16c8b2acec8  <unknown>     <unknown>

still the job succeeded:

http://pulpito.ceph.com/swagner-2020-05-20_12:38:40-rados:cephadm-wip-swagner3-testing-2020-05-20-1009-distro-basic-smithi/5072816/

History

#1 Updated by Sebastian Wagner 8 months ago

  • Description updated (diff)

#2 Updated by Sebastian Wagner 7 months ago

I think we should solve this by creating a HEALTH_WARN, if a daemon enters

error

state

#3 Updated by Sebastian Wagner 6 months ago

  • Category changed from cephadm to teuthology

#4 Updated by Nathan Cutler 3 months ago

  • Category changed from teuthology to cephadm

The teuthology project is for tracking issues in teuthology itself, not for tracking missing test cases.

According to https://docs.ceph.com/en/latest/releases/general/#active-stable-releases the fix is to raise a health warning. That would require a code change in mgr/cephadm itself.

#5 Updated by Sebastian Wagner 10 days ago

  • Priority changed from High to Urgent

latest output looks like

2021-01-15T10:05:26.839 INFO:teuthology.orchestra.run.smithi035.stdout:NAME             HOST       STATUS          REFRESHED  AGE   VERSION                IMAGE NAME                                                          IMAGE ID      CONTAINER ID
2021-01-15T10:05:26.839 INFO:teuthology.orchestra.run.smithi035.stdout:alertmanager.a   smithi035  running (38s)   28s ago    65s   0.20.0                 docker.io/prom/alertmanager:v0.20.0                                 0881eb8f169f  f30efcc1075f
2021-01-15T10:05:26.840 INFO:teuthology.orchestra.run.smithi035.stdout:grafana.a        smithi125  running (49s)   28s ago    48s   6.7.4                  docker.io/ceph/ceph-grafana:6.7.4                                   80728b29ad3f  6b54958a9e18
2021-01-15T10:05:26.840 INFO:teuthology.orchestra.run.smithi035.stdout:iscsi.iscsi.a    smithi125  running (91s)   28s ago    91s   3.4                    quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  99f4b7c46a82
2021-01-15T10:05:26.840 INFO:teuthology.orchestra.run.smithi035.stdout:mgr.x            smithi125  running (4m)    28s ago    4m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  334d8d30ff86
2021-01-15T10:05:26.840 INFO:teuthology.orchestra.run.smithi035.stdout:mgr.y            smithi035  running (6m)    28s ago    6m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  e28da1cdab3c
2021-01-15T10:05:26.841 INFO:teuthology.orchestra.run.smithi035.stdout:mon.a            smithi035  running (6m)    28s ago    6m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  13807f3848a4
2021-01-15T10:05:26.841 INFO:teuthology.orchestra.run.smithi035.stdout:mon.b            smithi125  running (4m)    28s ago    4m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  93d11ac3423c
2021-01-15T10:05:26.841 INFO:teuthology.orchestra.run.smithi035.stdout:mon.c            smithi035  running (4m)    28s ago    4m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  c79366aaf0ba
2021-01-15T10:05:26.841 INFO:teuthology.orchestra.run.smithi035.stdout:node-exporter.a  smithi035  stopped         28s ago    -     <unknown>              <unknown>                                                           <unknown>     <unknown>
2021-01-15T10:05:26.841 INFO:teuthology.orchestra.run.smithi035.stdout:node-exporter.b  smithi125  stopped         28s ago    -     <unknown>              <unknown>                                                           <unknown>     <unknown>
2021-01-15T10:05:26.842 INFO:teuthology.orchestra.run.smithi035.stdout:osd.0            smithi035  running (3m)    28s ago    3m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  9820ee2b551e
2021-01-15T10:05:26.842 INFO:teuthology.orchestra.run.smithi035.stdout:osd.1            smithi035  running (3m)    28s ago    3m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  705e4092a285
2021-01-15T10:05:26.842 INFO:teuthology.orchestra.run.smithi035.stdout:osd.2            smithi035  running (3m)    28s ago    3m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  bfea90c8a61f
2021-01-15T10:05:26.842 INFO:teuthology.orchestra.run.smithi035.stdout:osd.3            smithi035  running (2m)    28s ago    2m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  ca2162b1ddf4
2021-01-15T10:05:26.842 INFO:teuthology.orchestra.run.smithi035.stdout:osd.4            smithi125  running (2m)    28s ago    2m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  bb847cfc58ca
2021-01-15T10:05:26.843 INFO:teuthology.orchestra.run.smithi035.stdout:osd.5            smithi125  running (2m)    28s ago    2m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  c64afead358d
2021-01-15T10:05:26.843 INFO:teuthology.orchestra.run.smithi035.stdout:osd.6            smithi125  running (2m)    28s ago    2m    16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  9efad6b8e4ac
2021-01-15T10:05:26.843 INFO:teuthology.orchestra.run.smithi035.stdout:osd.7            smithi125  running (112s)  28s ago    112s  16.0.0-9028-g4702eacd  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43  9ac86af9eaa1
2021-01-15T10:05:26.843 INFO:teuthology.orchestra.run.smithi035.stdout:prometheus.a     smithi125  running (32s)   28s ago    77s   2.18.1                 docker.io/prom/prometheus:v2.18.1                                   de242295e225  812a3710e9e5
2021-01-15T10:05:27.403 DEBUG:teuthology.orchestra.run.smithi035:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 34bcb16a-5718-11eb-8f90-001a4aab830c -- bash -c 'ceph orch ls'
2021-01-15T10:05:29.215 INFO:teuthology.orchestra.run.smithi035.stdout:NAME            RUNNING  REFRESHED  AGE  PLACEMENT                        IMAGE NAME                                                          IMAGE ID
2021-01-15T10:05:29.215 INFO:teuthology.orchestra.run.smithi035.stdout:alertmanager        1/1  30s ago    84s  smithi035=a;count:1              docker.io/prom/alertmanager:v0.20.0                                 0881eb8f169f
2021-01-15T10:05:29.215 INFO:teuthology.orchestra.run.smithi035.stdout:grafana             1/1  30s ago    80s  smithi125=a;count:1              docker.io/ceph/ceph-grafana:6.7.4                                   80728b29ad3f
2021-01-15T10:05:29.215 INFO:teuthology.orchestra.run.smithi035.stdout:iscsi.iscsi         1/1  30s ago    96s  smithi125=iscsi.a;count:1        quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43
2021-01-15T10:05:29.216 INFO:teuthology.orchestra.run.smithi035.stdout:mgr                 2/3  30s ago    4m   smithi035=y;smithi125=x;count:3  quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43
2021-01-15T10:05:29.216 INFO:teuthology.orchestra.run.smithi035.stdout:mon                 3/0  30s ago    -    <unmanaged>                      quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43
2021-01-15T10:05:29.216 INFO:teuthology.orchestra.run.smithi035.stdout:node-exporter       0/2  30s ago    90s  smithi035=a;smithi125=b;count:2  <unknown>                                                           <unknown>
2021-01-15T10:05:29.216 INFO:teuthology.orchestra.run.smithi035.stdout:osd.None            8/0  30s ago    -    <unmanaged>                      quay.ceph.io/ceph-ci/ceph:4702eacd94ba284fdee7d4d250d9c47e84979954  d661bae1eb43
2021-01-15T10:05:29.217 INFO:teuthology.orchestra.run.smithi035.stdout:prometheus          1/1  30s ago    93s  smithi125=a;count:1              docker.io/prom/prometheus:v2.18.1                                   de242295e225
2021-01-15T10:05:29.217 INFO:teuthology.orchestra.run.smithi035.stdout:rgw.realm.zone      0/1  -          -    smithi035=realm.zone.a;count:1   <unknown>                                                           <unknown>

which is super broken.

https://pulpito.ceph.com/swagner-2021-01-15_09:42:49-rados:cephadm-wip-swagner-testing-2021-01-14-1551-distro-basic-smithi/5788840/

#6 Updated by Sage Weil 5 days ago

  • Status changed from New to Fix Under Review
  • Backport set to pacific,octopus
  • Pull request ID set to 38978

Also available in: Atom PDF