Project

General

Profile

Bug #44934

cephadm RGW: scary remove-deploy loop

Added by Sebastian Wagner 7 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Urgent
Category:
cephadm
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

[ceph: root@ceph-001 /]# ceph log last cephadm
2020-04-03T15:10:09.084085+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17284 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.pnrlnt on ceph-001
2020-04-03T15:10:14.405833+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17288 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.pnrlnt...
2020-04-03T15:10:14.406263+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17289 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.pnrlnt from ceph-001
2020-04-03T15:10:17.515543+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17291 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.alszjq on ceph-001
2020-04-03T15:10:22.387848+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17295 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.alszjq...
2020-04-03T15:10:22.388287+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17296 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.alszjq from ceph-001
2020-04-03T15:10:25.758392+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17298 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.skcssh on ceph-001
2020-04-03T15:10:31.296152+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17302 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.skcssh...
2020-04-03T15:10:31.296668+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17303 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.skcssh from ceph-001
2020-04-03T15:10:34.460408+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17306 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.vdavwx on ceph-001
2020-04-03T15:10:39.835795+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17309 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.vdavwx...
2020-04-03T15:10:39.836176+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17310 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.vdavwx from ceph-001
2020-04-03T15:10:43.532714+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17313 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.bnvnby on ceph-001
2020-04-03T15:10:48.353039+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17317 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.bnvnby...
2020-04-03T15:10:48.353354+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17318 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.bnvnby from ceph-001
2020-04-03T15:10:51.440358+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17320 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.wggojs on ceph-001
2020-04-03T15:10:56.439351+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17324 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.wggojs...
2020-04-03T15:10:56.439577+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17325 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.wggojs from ceph-001
2020-04-03T15:10:59.756791+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17327 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.oazpdw on ceph-001
2020-04-03T15:11:05.694262+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17331 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.oazpdw...
2020-04-03T15:11:05.694781+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17332 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.oazpdw from ceph-001
[ceph: root@ceph-001 /]# ceph orch ls --format json

[
    {
        "container_image_id": "0881eb8f169f5556a292b4e2c01d683172b12830a62a9225a98a8e206bb734f0",
        "container_image_name": "docker.io/prom/alertmanager:latest",
        "service_name": "alertmanager",
        "size": 1,
        "running": 1,
        "spec": {
            "placement": {
                "count": 1
            },
            "service_type": "alertmanager" 
        },
        "last_refresh": "2020-04-03T15:12:49.199779",
        "created": "2020-04-02T19:22:48.831535" 
    },
    {
        "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
        "container_image_name": "docker.io/ceph/ceph:v15",
        "service_name": "crash",
        "size": 1,
        "running": 1,
        "spec": {
            "placement": {
                "host_pattern": "*" 
            },
            "service_type": "crash" 
        },
        "last_refresh": "2020-04-03T15:12:49.199859",
        "created": "2020-04-02T19:22:39.377773" 
    },
    {
        "container_image_id": "87a51ecf0b1c9a7b187b21c1b071425dafea0d765a96d5bc371c791169b3d7f4",
        "container_image_name": "docker.io/ceph/ceph-grafana:latest",
        "service_name": "grafana",
        "size": 1,
        "running": 1,
        "spec": {
            "placement": {
                "count": 1
            },
            "service_type": "grafana" 
        },
        "last_refresh": "2020-04-03T15:12:49.199939",
        "created": "2020-04-02T19:22:47.226348" 
    },
    {
        "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
        "container_image_name": "docker.io/ceph/ceph:v15",
        "service_name": "mgr",
        "size": 2,
        "running": 1,
        "spec": {
            "placement": {
                "count": 2
            },
            "service_type": "mgr" 
        },
        "last_refresh": "2020-04-03T15:12:49.199696",
        "created": "2020-04-02T19:22:38.502266" 
    },
    {
        "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
        "container_image_name": "docker.io/ceph/ceph:v15",
        "service_name": "mon",
        "size": 5,
        "running": 1,
        "spec": {
            "placement": {
                "count": 5
            },
            "service_type": "mon" 
        },
        "last_refresh": "2020-04-03T15:12:49.199561",
        "created": "2020-04-02T19:22:37.710117" 
    },
    {
        "container_image_id": "e5a616e4b9cf68dfcad7782b78e118be4310022e874d52da85c55923fb615f87",
        "container_image_name": "docker.io/prom/node-exporter:latest",
        "service_name": "node-exporter",
        "size": 1,
        "running": 1,
        "spec": {
            "placement": {
                "host_pattern": "*" 
            },
            "service_type": "node-exporter" 
        },
        "last_refresh": "2020-04-03T15:12:49.200019",
        "created": "2020-04-02T19:22:47.999166" 
    },
    {
        "container_image_id": "358a0d2395fe711bb8258e8fb4b2d7865c0a9a6463969bcd1452ee8869ea6653",
        "container_image_name": "docker.io/prom/prometheus:latest",
        "service_name": "prometheus",
        "size": 1,
        "running": 1,
        "spec": {
            "placement": {
                "count": 1
            },
            "service_type": "prometheus" 
        },
        "last_refresh": "2020-04-03T15:12:49.200098",
        "created": "2020-04-02T19:22:46.398827" 
    },
    {
        "service_name": "rgw.default-rgw-realm.eu-central-1.1",
        "size": 1,
        "running": 0,
        "spec": {
            "placement": {
                "hosts": [
                    {
                        "hostname": "ceph-001",
                        "network": "",
                        "name": "" 
                    }
                ]
            },
            "service_type": "rgw",
            "service_id": "default-rgw-realm.eu-central-1.1",
            "rgw_realm": "default-rgw-realm",
            "rgw_zone": "eu-central-1",
            "subcluster": "1" 
        }
    }
]
[ceph: root@ceph-001 /]# ceph orch ps
NAME                    HOST      STATUS        REFRESHED  AGE  VERSION  IMAGE NAME                           IMAGE ID      CONTAINER ID
alertmanager.ceph-001   ceph-001  running (7h)  0s ago     19h  0.20.0   docker.io/prom/alertmanager:latest   0881eb8f169f  d94d7969094d
crash.ceph-001          ceph-001  running (7h)  0s ago     19h  15.2.0   docker.io/ceph/ceph:v15              204a01f9b0b6  c4b036202241
grafana.ceph-001        ceph-001  running (7h)  0s ago     19h  6.6.2    docker.io/ceph/ceph-grafana:latest   87a51ecf0b1c  5b7b94b48f31
mgr.ceph-001.gkjwqp     ceph-001  running (7h)  0s ago     19h  15.2.0   docker.io/ceph/ceph:v15              204a01f9b0b6  9ca007280456
mon.ceph-001            ceph-001  running (7h)  0s ago     19h  15.2.0   docker.io/ceph/ceph:v15              204a01f9b0b6  3d1ba9a2b697
node-exporter.ceph-001  ceph-001  running (7h)  0s ago     19h  0.18.1   docker.io/prom/node-exporter:latest  e5a616e4b9cf  36d026c68ba1
osd.0                   ceph-001  running (7h)  0s ago     18h  15.2.0   docker.io/ceph/ceph:v15              204a01f9b0b6  faf76193cbfe
osd.1                   ceph-001  running (7h)  0s ago     18h  15.2.0   docker.io/ceph/ceph:v15              204a01f9b0b6  f82505bae0f1
prometheus.ceph-001     ceph-001  running (7h)  0s ago     19h  2.17.1   docker.io/prom/prometheus:latest     358a0d2395fe  2708d84cd484
[ceph: root@ceph-001 /]# ceph orch host ls
HOST      ADDR      LABELS  STATUS
ceph-001  ceph-001

log is full of

"4/3/20 4:51:09 PM[INF]Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.awxkio..." 

History

#1 Updated by Hurz Hurz 7 months ago

Caught the intermittent container (at the very bottom):

[ceph: root@ceph-001 /]# ceph orch ps --format json

[
    {
        "hostname": "ceph-001",
        "container_id": "d94d7969094d",
        "container_image_id": "0881eb8f169f5556a292b4e2c01d683172b12830a62a9225a98a8e206bb734f0",
        "container_image_name": "docker.io/prom/alertmanager:latest",
        "daemon_id": "ceph-001",
        "daemon_type": "alertmanager",
        "version": "0.20.0",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.725856",
        "created": "2020-04-02T19:23:08.829543",
        "started": "2020-04-03T07:29:16.932838" 
    },
    {
        "hostname": "ceph-001",
        "container_id": "c4b036202241",
        "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
        "container_image_name": "docker.io/ceph/ceph:v15",
        "daemon_id": "ceph-001",
        "daemon_type": "crash",
        "version": "15.2.0",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.725903",
        "created": "2020-04-02T19:23:11.390694",
        "started": "2020-04-03T07:29:16.910897" 
    },
    {
        "hostname": "ceph-001",
        "container_id": "5b7b94b48f31",
        "container_image_id": "87a51ecf0b1c9a7b187b21c1b071425dafea0d765a96d5bc371c791169b3d7f4",
        "container_image_name": "docker.io/ceph/ceph-grafana:latest",
        "daemon_id": "ceph-001",
        "daemon_type": "grafana",
        "version": "6.6.2",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.725950",
        "created": "2020-04-02T19:23:52.025088",
        "started": "2020-04-03T07:29:16.847972" 
    },
    {
        "hostname": "ceph-001",
        "container_id": "9ca007280456",
        "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
        "container_image_name": "docker.io/ceph/ceph:v15",
        "daemon_id": "ceph-001.gkjwqp",
        "daemon_type": "mgr",
        "version": "15.2.0",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.725807",
        "created": "2020-04-02T19:22:18.648584",
        "started": "2020-04-03T07:29:16.856153" 
    },
    {
        "hostname": "ceph-001",
        "container_id": "3d1ba9a2b697",
        "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
        "container_image_name": "docker.io/ceph/ceph:v15",
        "daemon_id": "ceph-001",
        "daemon_type": "mon",
        "version": "15.2.0",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.725715",
        "created": "2020-04-02T19:22:13.863300",
        "started": "2020-04-03T07:29:17.206024" 
    },
    {
        "hostname": "ceph-001",
        "container_id": "36d026c68ba1",
        "container_image_id": "e5a616e4b9cf68dfcad7782b78e118be4310022e874d52da85c55923fb615f87",
        "container_image_name": "docker.io/prom/node-exporter:latest",
        "daemon_id": "ceph-001",
        "daemon_type": "node-exporter",
        "version": "0.18.1",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.725996",
        "created": "2020-04-02T19:23:53.880197",
        "started": "2020-04-03T07:29:16.880044" 
    },
    {
        "hostname": "ceph-001",
        "container_id": "faf76193cbfe",
        "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
        "container_image_name": "docker.io/ceph/ceph:v15",
        "daemon_id": "0",
        "daemon_type": "osd",
        "version": "15.2.0",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.726088",
        "created": "2020-04-02T20:35:02.991435",
        "started": "2020-04-03T07:29:19.373956" 
    },
    {
        "hostname": "ceph-001",
        "container_id": "f82505bae0f1",
        "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
        "container_image_name": "docker.io/ceph/ceph:v15",
        "daemon_id": "1",
        "daemon_type": "osd",
        "version": "15.2.0",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.726134",
        "created": "2020-04-02T20:35:17.142272",
        "started": "2020-04-03T07:29:19.374002" 
    },
    {
        "hostname": "ceph-001",
        "container_id": "2708d84cd484",
        "container_image_id": "358a0d2395fe711bb8258e8fb4b2d7865c0a9a6463969bcd1452ee8869ea6653",
        "container_image_name": "docker.io/prom/prometheus:latest",
        "daemon_id": "ceph-001",
        "daemon_type": "prometheus",
        "version": "2.17.1",
        "status": 1,
        "status_desc": "running",
        "last_refresh": "2020-04-03T15:31:48.726042",
        "created": "2020-04-02T19:24:10.281163",
        "started": "2020-04-03T07:29:16.926292" 
    },
    {
        "hostname": "ceph-001",
        "daemon_id": "default-rgw-realm.eu-central-1.1.ceph-001.ytywjo",
        "daemon_type": "rgw",
        "status": 1,
        "status_desc": "starting" 
    }
]

#2 Updated by Sebastian Wagner 7 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Sebastian Wagner
  • Target version set to v15.2.1
  • Backport set to octopus
  • Pull request ID set to 34415

#3 Updated by Hurz Hurz 7 months ago

The GitHub PR states "turns out users put dot into their RGW service names" - all the dots you can see in the log stem from some ceph naming convention. I only used [a-zA-Z0-9-] as names (for realm, zonegroup, zone).

#4 Updated by Sebastian Wagner 6 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport deleted (octopus)

#5 Updated by Sebastian Wagner 6 months ago

  • Status changed from Pending Backport to Resolved

#6 Updated by Sebastian Wagner 6 months ago

  • Target version changed from v15.2.1 to v15.2.2

Also available in: Atom PDF