Actions
Bug #44934
closedcephadm RGW: scary remove-deploy loop
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
cephadm
Target version:
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Description
[ceph: root@ceph-001 /]# ceph log last cephadm 2020-04-03T15:10:09.084085+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17284 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.pnrlnt on ceph-001 2020-04-03T15:10:14.405833+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17288 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.pnrlnt... 2020-04-03T15:10:14.406263+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17289 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.pnrlnt from ceph-001 2020-04-03T15:10:17.515543+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17291 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.alszjq on ceph-001 2020-04-03T15:10:22.387848+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17295 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.alszjq... 2020-04-03T15:10:22.388287+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17296 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.alszjq from ceph-001 2020-04-03T15:10:25.758392+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17298 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.skcssh on ceph-001 2020-04-03T15:10:31.296152+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17302 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.skcssh... 2020-04-03T15:10:31.296668+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17303 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.skcssh from ceph-001 2020-04-03T15:10:34.460408+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17306 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.vdavwx on ceph-001 2020-04-03T15:10:39.835795+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17309 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.vdavwx... 2020-04-03T15:10:39.836176+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17310 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.vdavwx from ceph-001 2020-04-03T15:10:43.532714+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17313 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.bnvnby on ceph-001 2020-04-03T15:10:48.353039+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17317 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.bnvnby... 2020-04-03T15:10:48.353354+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17318 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.bnvnby from ceph-001 2020-04-03T15:10:51.440358+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17320 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.wggojs on ceph-001 2020-04-03T15:10:56.439351+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17324 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.wggojs... 2020-04-03T15:10:56.439577+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17325 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.wggojs from ceph-001 2020-04-03T15:10:59.756791+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17327 : cephadm [INF] Deploying daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.oazpdw on ceph-001 2020-04-03T15:11:05.694262+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17331 : cephadm [INF] Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.oazpdw... 2020-04-03T15:11:05.694781+0000 mgr.ceph-001.gkjwqp (mgr.24114) 17332 : cephadm [INF] Removing daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.oazpdw from ceph-001
[ceph: root@ceph-001 /]# ceph orch ls --format json [ { "container_image_id": "0881eb8f169f5556a292b4e2c01d683172b12830a62a9225a98a8e206bb734f0", "container_image_name": "docker.io/prom/alertmanager:latest", "service_name": "alertmanager", "size": 1, "running": 1, "spec": { "placement": { "count": 1 }, "service_type": "alertmanager" }, "last_refresh": "2020-04-03T15:12:49.199779", "created": "2020-04-02T19:22:48.831535" }, { "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1", "container_image_name": "docker.io/ceph/ceph:v15", "service_name": "crash", "size": 1, "running": 1, "spec": { "placement": { "host_pattern": "*" }, "service_type": "crash" }, "last_refresh": "2020-04-03T15:12:49.199859", "created": "2020-04-02T19:22:39.377773" }, { "container_image_id": "87a51ecf0b1c9a7b187b21c1b071425dafea0d765a96d5bc371c791169b3d7f4", "container_image_name": "docker.io/ceph/ceph-grafana:latest", "service_name": "grafana", "size": 1, "running": 1, "spec": { "placement": { "count": 1 }, "service_type": "grafana" }, "last_refresh": "2020-04-03T15:12:49.199939", "created": "2020-04-02T19:22:47.226348" }, { "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1", "container_image_name": "docker.io/ceph/ceph:v15", "service_name": "mgr", "size": 2, "running": 1, "spec": { "placement": { "count": 2 }, "service_type": "mgr" }, "last_refresh": "2020-04-03T15:12:49.199696", "created": "2020-04-02T19:22:38.502266" }, { "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1", "container_image_name": "docker.io/ceph/ceph:v15", "service_name": "mon", "size": 5, "running": 1, "spec": { "placement": { "count": 5 }, "service_type": "mon" }, "last_refresh": "2020-04-03T15:12:49.199561", "created": "2020-04-02T19:22:37.710117" }, { "container_image_id": "e5a616e4b9cf68dfcad7782b78e118be4310022e874d52da85c55923fb615f87", "container_image_name": "docker.io/prom/node-exporter:latest", "service_name": "node-exporter", "size": 1, "running": 1, "spec": { "placement": { "host_pattern": "*" }, "service_type": "node-exporter" }, "last_refresh": "2020-04-03T15:12:49.200019", "created": "2020-04-02T19:22:47.999166" }, { "container_image_id": "358a0d2395fe711bb8258e8fb4b2d7865c0a9a6463969bcd1452ee8869ea6653", "container_image_name": "docker.io/prom/prometheus:latest", "service_name": "prometheus", "size": 1, "running": 1, "spec": { "placement": { "count": 1 }, "service_type": "prometheus" }, "last_refresh": "2020-04-03T15:12:49.200098", "created": "2020-04-02T19:22:46.398827" }, { "service_name": "rgw.default-rgw-realm.eu-central-1.1", "size": 1, "running": 0, "spec": { "placement": { "hosts": [ { "hostname": "ceph-001", "network": "", "name": "" } ] }, "service_type": "rgw", "service_id": "default-rgw-realm.eu-central-1.1", "rgw_realm": "default-rgw-realm", "rgw_zone": "eu-central-1", "subcluster": "1" } } ]
[ceph: root@ceph-001 /]# ceph orch ps NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID alertmanager.ceph-001 ceph-001 running (7h) 0s ago 19h 0.20.0 docker.io/prom/alertmanager:latest 0881eb8f169f d94d7969094d crash.ceph-001 ceph-001 running (7h) 0s ago 19h 15.2.0 docker.io/ceph/ceph:v15 204a01f9b0b6 c4b036202241 grafana.ceph-001 ceph-001 running (7h) 0s ago 19h 6.6.2 docker.io/ceph/ceph-grafana:latest 87a51ecf0b1c 5b7b94b48f31 mgr.ceph-001.gkjwqp ceph-001 running (7h) 0s ago 19h 15.2.0 docker.io/ceph/ceph:v15 204a01f9b0b6 9ca007280456 mon.ceph-001 ceph-001 running (7h) 0s ago 19h 15.2.0 docker.io/ceph/ceph:v15 204a01f9b0b6 3d1ba9a2b697 node-exporter.ceph-001 ceph-001 running (7h) 0s ago 19h 0.18.1 docker.io/prom/node-exporter:latest e5a616e4b9cf 36d026c68ba1 osd.0 ceph-001 running (7h) 0s ago 18h 15.2.0 docker.io/ceph/ceph:v15 204a01f9b0b6 faf76193cbfe osd.1 ceph-001 running (7h) 0s ago 18h 15.2.0 docker.io/ceph/ceph:v15 204a01f9b0b6 f82505bae0f1 prometheus.ceph-001 ceph-001 running (7h) 0s ago 19h 2.17.1 docker.io/prom/prometheus:latest 358a0d2395fe 2708d84cd484
[ceph: root@ceph-001 /]# ceph orch host ls HOST ADDR LABELS STATUS ceph-001 ceph-001
log is full of
"4/3/20 4:51:09 PM[INF]Removing orphan daemon rgw.default-rgw-realm.eu-central-1.1.ceph-001.awxkio..."
Updated by Hurz Hurz about 4 years ago
Caught the intermittent container (at the very bottom):
[ceph: root@ceph-001 /]# ceph orch ps --format json [ { "hostname": "ceph-001", "container_id": "d94d7969094d", "container_image_id": "0881eb8f169f5556a292b4e2c01d683172b12830a62a9225a98a8e206bb734f0", "container_image_name": "docker.io/prom/alertmanager:latest", "daemon_id": "ceph-001", "daemon_type": "alertmanager", "version": "0.20.0", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.725856", "created": "2020-04-02T19:23:08.829543", "started": "2020-04-03T07:29:16.932838" }, { "hostname": "ceph-001", "container_id": "c4b036202241", "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1", "container_image_name": "docker.io/ceph/ceph:v15", "daemon_id": "ceph-001", "daemon_type": "crash", "version": "15.2.0", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.725903", "created": "2020-04-02T19:23:11.390694", "started": "2020-04-03T07:29:16.910897" }, { "hostname": "ceph-001", "container_id": "5b7b94b48f31", "container_image_id": "87a51ecf0b1c9a7b187b21c1b071425dafea0d765a96d5bc371c791169b3d7f4", "container_image_name": "docker.io/ceph/ceph-grafana:latest", "daemon_id": "ceph-001", "daemon_type": "grafana", "version": "6.6.2", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.725950", "created": "2020-04-02T19:23:52.025088", "started": "2020-04-03T07:29:16.847972" }, { "hostname": "ceph-001", "container_id": "9ca007280456", "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1", "container_image_name": "docker.io/ceph/ceph:v15", "daemon_id": "ceph-001.gkjwqp", "daemon_type": "mgr", "version": "15.2.0", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.725807", "created": "2020-04-02T19:22:18.648584", "started": "2020-04-03T07:29:16.856153" }, { "hostname": "ceph-001", "container_id": "3d1ba9a2b697", "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1", "container_image_name": "docker.io/ceph/ceph:v15", "daemon_id": "ceph-001", "daemon_type": "mon", "version": "15.2.0", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.725715", "created": "2020-04-02T19:22:13.863300", "started": "2020-04-03T07:29:17.206024" }, { "hostname": "ceph-001", "container_id": "36d026c68ba1", "container_image_id": "e5a616e4b9cf68dfcad7782b78e118be4310022e874d52da85c55923fb615f87", "container_image_name": "docker.io/prom/node-exporter:latest", "daemon_id": "ceph-001", "daemon_type": "node-exporter", "version": "0.18.1", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.725996", "created": "2020-04-02T19:23:53.880197", "started": "2020-04-03T07:29:16.880044" }, { "hostname": "ceph-001", "container_id": "faf76193cbfe", "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1", "container_image_name": "docker.io/ceph/ceph:v15", "daemon_id": "0", "daemon_type": "osd", "version": "15.2.0", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.726088", "created": "2020-04-02T20:35:02.991435", "started": "2020-04-03T07:29:19.373956" }, { "hostname": "ceph-001", "container_id": "f82505bae0f1", "container_image_id": "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1", "container_image_name": "docker.io/ceph/ceph:v15", "daemon_id": "1", "daemon_type": "osd", "version": "15.2.0", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.726134", "created": "2020-04-02T20:35:17.142272", "started": "2020-04-03T07:29:19.374002" }, { "hostname": "ceph-001", "container_id": "2708d84cd484", "container_image_id": "358a0d2395fe711bb8258e8fb4b2d7865c0a9a6463969bcd1452ee8869ea6653", "container_image_name": "docker.io/prom/prometheus:latest", "daemon_id": "ceph-001", "daemon_type": "prometheus", "version": "2.17.1", "status": 1, "status_desc": "running", "last_refresh": "2020-04-03T15:31:48.726042", "created": "2020-04-02T19:24:10.281163", "started": "2020-04-03T07:29:16.926292" }, { "hostname": "ceph-001", "daemon_id": "default-rgw-realm.eu-central-1.1.ceph-001.ytywjo", "daemon_type": "rgw", "status": 1, "status_desc": "starting" } ]
Updated by Sebastian Wagner about 4 years ago
- Status changed from New to Fix Under Review
- Assignee set to Sebastian Wagner
- Target version set to v15.2.1
- Backport set to octopus
- Pull request ID set to 34415
Updated by Hurz Hurz about 4 years ago
The GitHub PR states "turns out users put dot into their RGW service names" - all the dots you can see in the log stem from some ceph naming convention. I only used [a-zA-Z0-9-] as names (for realm, zonegroup, zone).
Updated by Sebastian Wagner about 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport deleted (
octopus)
Updated by Sebastian Wagner about 4 years ago
- Status changed from Pending Backport to Resolved
Updated by Sebastian Wagner about 4 years ago
- Target version changed from v15.2.1 to v15.2.2
Actions