https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-10-11T01:19:41ZCeph Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2272442022-10-11T01:19:41ZAdam King
<ul></ul><p>what does `ceph orch host ls` report for this host? This error should only be raised if we can't find any IP stored for the host. You could also look at "ceph config-key get mgr/cephadm/inventory" which should be a json struct that includes all the hosts with their names, addresses, etc. and see if it lists an actual address for that host (as opposed to just listing the hostname as the addr). If it does look like there is no address for the host, the `ceph orch host set-addr` command might be able to fix it.</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2272462022-10-11T02:16:39ZBrian Woods
<ul></ul><p>So, I did notice that I had set the domain name on one of the nodes to the "oldname.local" (when I was doing the find/replace to scrub this), but that shouldn't impact DNS. I confirmed that all names resolve from all hosts (DNS provided be a DHCP in the case). And it looks like it is seeing all the correct IPs.</p>
<p>ceph orch host ls:<br /><pre>
HOST ADDR LABELS STATUS
ceph03.domain.local 192.168.10.80 rbd
ceph01.domain.local 192.168.10.210 _admin rgw grafana mds
ceph02.oldname.local 192.168.10.51 mon mgr mds _admin
3 hosts in cluster
</pre></p>
<p>ceph config-key get mgr/cephadm/inventory:<br /><pre>
{
"ceph01.domain.local":{
"hostname":"ceph01.domain.local",
"addr":"192.168.10.210",
"labels":[
"_admin",
"rgw",
"grafana",
"mds"
],
"status":""
},
"ceph03.domain.local":{
"hostname":"ceph03.domain.local",
"addr":"192.168.10.80",
"labels":[
"rbd"
],
"status":""
},
"ceph02.oldname.local":{
"hostname":"ceph02.oldname.local",
"addr":"192.168.10.51",
"labels":[
"mon",
"mgr",
"mds",
"_admin"
],
"status":""
}
}
</pre></p>
<p>I did health checks on both the host, as well as the cephadm shell container:<br /><pre>
root@ceph01:/var/log# ceph cephadm check-host ceph03.domain.local
ceph03.domain.local (None) ok
docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit chrony.service is enabled and running
Hostname "ceph03.domain.local" matches what is expected.
Host looks OK
root@ceph01:/var/log# ceph cephadm check-host ceph01.domain.local
ceph01.domain.local (None) ok
docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit chrony.service is enabled and running
Hostname "ceph01.domain.local" matches what is expected.
Host looks OK
root@ceph01:/var/log# ceph cephadm check-host ceph02.oldname.local
ceph02.oldname.local (None) ok
docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit chrony.service is enabled and running
Hostname "ceph02.oldname.local" matches what is expected.
Host looks OK
</pre></p>
<p>Thoughts?</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2274282022-10-13T16:08:36ZAdam King
<ul></ul><p>it's odd that the hostname it reports not having an address for isn't even a hostname it has stored "ceph02.domain.local". It seems at this point that it probably failed to find an address for "ceph02.domain.local" since it doesn't even have any entry for it. The question is why was it trying to go to that hostname at all? Perhaps something was cached that shouldn't have been there anymore? Does stopping the upgrade, running "ceph mgr fail" and starting the upgrade up again make this happen again? Might also be worth checking in `orch ps` output that "ceph02.domain.local" isn't reported as the hostname for any of the daemons as well and that no service spec placements explicitly reference "ceph02.domain.local" either.</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2274532022-10-14T16:03:49ZBrian Woods
<ul></ul><p>I add DNS entries for all combinations. So both ceph02.oldname.local and ceph02.domain.local are now valid names but the host is still configured as "oldname.local".</p>
<p>After confirming all combos worked, I then ran these, waiting about a minute between each command:<br /><pre>
root@ceph01# ceph orch upgrade stop
Stopped upgrade to quay.io/ceph/ceph:v17.2.4
root@ceph01# ceph mgr fail
root@ceph01# ceph orch upgrade start --image quay.io/ceph/ceph:v17.2.4
Initiating upgrade to quay.io/ceph/ceph:v17.2.4
root@ceph01# ceph orch upgrade status
{
"target_image": "quay.io/ceph/ceph:v17.2.4",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [],
"progress": "",
"message": ""
}
</pre></p>
<p>No movement in the process after about a half hour.</p>
<p>Current ceph ps:<br /><pre>
# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.ceph01 ceph01.domain.local *:9093,9094 running (3d) 4m ago 13d 29.0M - ba2b418f427c 16eaa667487b
crash.ceph03 ceph03.domain.local running (3d) 4m ago 13d 8999k - 17.2.3 0912465dcea5 60d65abff255
crash.ceph01 ceph01.domain.local running (3d) 4m ago 13d 8928k - 17.2.3 0912465dcea5 2463cf388cd7
crash.ceph02 ceph02.oldname.local running (11d) 4m ago 11d 10.2M - 17.2.3 0912465dcea5 a24e3c222e35
grafana.ceph01 ceph01.domain.local *:3000 running (3d) 4m ago 13d 86.0M - 8.3.5 dad864ee21e9 50ed1c829566
mds.mds-default.ceph03.ptkjle ceph03.domain.local running (3d) 4m ago 13d 1994M - 17.2.3 0912465dcea5 0db1f329b706
mds.mds-default.ceph01.zrrptd ceph01.domain.local running (3d) 4m ago 13d 30.3M - 17.2.3 0912465dcea5 496c9f753bc7
mgr.ceph03.haayqy ceph03.domain.local *:8443,9283 running (3d) 4m ago 13d 111M - 17.2.3 0912465dcea5 3fe9424055ee
mgr.ceph01.domain.local .miydsy ceph01.domain.local *:9283 running (3d) 4m ago 13d 469M - 17.2.3 0912465dcea5 5bfed8d219fd
mon.ceph03 ceph03.domain.local running (3d) 4m ago 13d 476M 2048M 17.2.3 0912465dcea5 1119bcfc84af
mon.ceph01.domain.local ceph01.domain.local running (3d) 4m ago 13d 472M 2048M 17.2.3 0912465dcea5 3da27dc943f4
mon.ceph02 ceph02.oldname.local running (11d) 4m ago 11d 578M 2048M 17.2.3 0912465dcea5 88d7bfbcd9f5
node-exporter.ceph03 ceph03.domain.local *:9100 running (3d) 4m ago 13d 20.2M - 1dbe0e931976 8e6130b088db
node-exporter.ceph01 ceph01.domain.local *:9100 running (3d) 4m ago 13d 21.3M - 1dbe0e931976 8d01b76dda13
node-exporter.ceph02 ceph02.oldname.local *:9100 running (11d) 4m ago 11d 4620k - 1dbe0e931976 fa2b46930880
osd.0 ceph01.domain.local running (3d) 4m ago 13d 3022M 4096M 17.2.3 0912465dcea5 d23b707f5f44
osd.1 ceph01.domain.local running (3d) 4m ago 11d 6731M 4096M 17.2.3 0912465dcea5 af7b429509e7
osd.2 ceph01.domain.local running (3d) 4m ago 11d 4897M 4096M 17.2.3 0912465dcea5 2bf8a273ffa9
osd.3 ceph01.domain.local running (3d) 4m ago 11d 4897M 4096M 17.2.3 0912465dcea5 57e198c87d82
osd.4 ceph01.domain.local running (3d) 4m ago 11d 4842M 4096M 17.2.3 0912465dcea5 90023164d14d
osd.5 ceph01.domain.local running (3d) 4m ago 11d 4460M 4096M 17.2.3 0912465dcea5 0c6a9a34ff72
osd.6 ceph03.domain.local running (3d) 4m ago 11d 2241M 4096M 17.2.3 0912465dcea5 537b839a31b7
osd.7 ceph03.domain.local running (3d) 4m ago 11d 3894M 4096M 17.2.3 0912465dcea5 8a30f14aa72c
osd.8 ceph03.domain.local running (3d) 4m ago 11d 3191M 4096M 17.2.3 0912465dcea5 5bcc089677a6
osd.9 ceph03.domain.local running (3d) 4m ago 11d 3717M 4096M 17.2.3 0912465dcea5 6e42ca8325d8
osd.10 ceph03.domain.local running (3d) 4m ago 11d 2406M 4096M 17.2.3 0912465dcea5 95858a805de8
osd.12 ceph03.domain.local running (3d) 4m ago 12d 3355M 4096M 17.2.3 0912465dcea5 3c8cc41e1dce
prometheus.ceph01 ceph01.domain.local *:9095 running (3d) 4m ago 13d 138M - 514e6a882f6e 95e532fae898
Also saw this in the logs, not sure what I was doing at the time, but:
2022-10-11T04:23:59.481304+0000 mgr.ceph03.haayqy (mgr.5537070) 1331 : cephadm [ERR] check-host failed for '192.168.10.210'
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/module.py", line 1042, in check_host
error_ok=True, no_fsid=True))
File "/usr/share/ceph/mgr/cephadm/module.py", line 590, in wait_async
return self.event_loop.get_result(coro)
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 48, in get_result
return asyncio.run_coroutine_threadsafe(coro, self._loop).result()
File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1273, in _run_cephadm
await self.mgr.ssh._remote_connection(host, addr)
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 66, in _remote_connection
raise OrchestratorError("host address is empty")
orchestrator._interface.OrchestratorError: host address is empty
</pre></p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2274542022-10-14T16:04:35ZBrian Woods
<ul></ul><p>Oh, by all combinations, I mean I created DNS entries for all hosts, not just ceph02.</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2275132022-10-17T19:29:19ZAdam King
<ul></ul><p>alright, looking back at the original traceback</p>
<pre>
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/module.py", line 1042, in check_host
error_ok=True, no_fsid=True))
</pre>
<p>that particular check_host function is the one that is called when directly running "ceph cephadm check-host". I hadn't read carefully earlier and confused that for _check_host elsewhere that get called internally.</p>
<p>I think the FQDNs and the hostnames are likely not the cause of the real issue here, which is that the upgrade stalled out. If you try another upgrade and wait for a bit, what does "ceph log last 100 info cephadm" say? If that doesn't give anything useful it could be worth setting the log level to debug "ceph config set mgr mgr/cephadm/log_to_cluster_level debug" then doing the same thing but instead running "ceph log last 200 debug cephadm". I think we need to go back and try to generally diagnose why the upgrade is getting stuck rather than continuing to look at this hostname stuff.</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2276892022-10-22T03:16:25ZBrian Woods
<ul></ul><p>Seems I haven't seen the "host address is empty" error in about 10 days now.... Not sure if that is because of DNS, or what. So, good news?</p>
<p>The bad news, even with debug logging enabled, and restarting the upgrade, even hours later, zero new entries:</p>
<pre>
2022-10-21T22:45:14.884185+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8661 : cephadm [DBG] mgr option ssh_config_file = None
2022-10-21T22:45:14.884260+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8662 : cephadm [DBG] mgr option device_cache_timeout = 1800
2022-10-21T22:45:14.884307+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8663 : cephadm [DBG] mgr option device_enhanced_scan = False
2022-10-21T22:45:14.884350+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8664 : cephadm [DBG] mgr option daemon_cache_timeout = 600
2022-10-21T22:45:14.884392+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8665 : cephadm [DBG] mgr option facts_cache_timeout = 60
2022-10-21T22:45:14.884434+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8666 : cephadm [DBG] mgr option host_check_interval = 600
2022-10-21T22:45:14.884474+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8667 : cephadm [DBG] mgr option2022-10-21T22:45:14.884185+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8661 : cephadm [DBG] mgr option ssh_config_file = None
2022-10-21T22:45:14.884260+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8662 : cephadm [DBG] mgr option device_cache_timeout = 1800
2022-10-21T22:45:14.884307+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8663 : cephadm [DBG] mgr option device_enhanced_scan = False
2022-10-21T22:45:14.884350+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8664 : cephadm [DBG] mgr option daemon_cache_timeout = 600
2022-10-21T22:45:14.884392+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8665 : cephadm [DBG] mgr option facts_cache_timeout = 60
2022-10-21T22:45:14.884434+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8666 : cephadm [DBG] mgr option host_check_interval = 600
2022-10-21T22:45:14.884474+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8667 : cephadm [DBG] mgr option mode = root
2022-10-21T22:45:14.884516+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8668 : cephadm [DBG] mgr option container_image_base = quay.io/ceph/ceph
2022-10-21T22:45:14.884557+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8669 : cephadm [DBG] mgr option container_image_prometheus = quay.io/prometheus/prometheus:v2.33.4
2022-10-21T22:45:14.884597+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8670 : cephadm [DBG] mgr option container_image_grafana = quay.io/ceph/ceph-grafana:8.3.5
2022-10-21T22:45:14.884636+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8671 : cephadm [DBG] mgr option container_image_alertmanager = quay.io/prometheus/alertmanager:v0.23.0
2022-10-21T22:45:14.884675+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8672 : cephadm [DBG] mgr option container_image_node_exporter = quay.io/prometheus/node-exporter:v1.3.1
2022-10-21T22:45:14.884714+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8673 : cephadm [DBG] mgr option container_image_loki = docker.io/grafana/loki:2.4.0
2022-10-21T22:45:14.884753+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8674 : cephadm [DBG] mgr option container_image_promtail = docker.io/grafana/promtail:2.4.0
2022-10-21T22:45:14.884793+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8675 : cephadm [DBG] mgr option container_image_haproxy = docker.io/library/haproxy:2.3
2022-10-21T22:45:14.884832+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8676 : cephadm [DBG] mgr option container_image_keepalived = docker.io/arcts/keepalived
2022-10-21T22:45:14.884871+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8677 : cephadm [DBG] mgr option container_image_snmp_gateway = docker.io/maxwo/snmp-notifier:v1.2.1
2022-10-21T22:45:14.884911+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8678 : cephadm [DBG] mgr option warn_on_stray_hosts = True
2022-10-21T22:45:14.884950+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8679 : cephadm [DBG] mgr option warn_on_stray_daemons = True
2022-10-21T22:45:14.884989+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8680 : cephadm [DBG] mgr option warn_on_failed_host_check = True
2022-10-21T22:45:14.885028+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8681 : cephadm [DBG] mgr option log_to_cluster = True
2022-10-21T22:45:14.885066+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8682 : cephadm [DBG] mgr option allow_ptrace = False
2022-10-21T22:45:14.885108+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8683 : cephadm [DBG] mgr option container_init = True
2022-10-21T22:45:14.885149+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8684 : cephadm [DBG] mgr option prometheus_alerts_path = /etc/prometheus/ceph/ceph_default_alerts.yml
2022-10-21T22:45:14.885190+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8685 : cephadm [DBG] mgr option migration_current = 5
2022-10-21T22:45:14.885230+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8686 : cephadm [DBG] mgr option config_dashboard = True
2022-10-21T22:45:14.885270+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8687 : cephadm [DBG] mgr option manage_etc_ceph_ceph_conf = False
2022-10-21T22:45:14.885309+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8688 : cephadm [DBG] mgr option manage_etc_ceph_ceph_conf_hosts = *
2022-10-21T22:45:14.885349+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8689 : cephadm [DBG] mgr option registry_url = None
2022-10-21T22:45:14.885389+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8690 : cephadm [DBG] mgr option registry_username = None
2022-10-21T22:45:14.885428+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8691 : cephadm [DBG] mgr option registry_password = None
2022-10-21T22:45:14.885467+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8692 : cephadm [DBG] mgr option registry_insecure = False
2022-10-21T22:45:14.885506+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8693 : cephadm [DBG] mgr option use_repo_digest = True
2022-10-21T22:45:14.885545+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8694 : cephadm [DBG] mgr option config_checks_enabled = False
2022-10-21T22:45:14.885588+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8695 : cephadm [DBG] mgr option default_registry = docker.io
2022-10-21T22:45:14.885629+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8696 : cephadm [DBG] mgr option max_count_per_host = 10
2022-10-21T22:45:14.885673+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8697 : cephadm [DBG] mgr option autotune_memory_target_ratio = 0.7
2022-10-21T22:45:14.885715+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8698 : cephadm [DBG] mgr option autotune_interval = 600
2022-10-21T22:45:14.885756+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8699 : cephadm [DBG] mgr option use_agent = False
2022-10-21T22:45:14.885795+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8700 : cephadm [DBG] mgr option agent_refresh_rate = 20
2022-10-21T22:45:14.885834+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8701 : cephadm [DBG] mgr option agent_starting_port = 4721
2022-10-21T22:45:14.885875+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8702 : cephadm [DBG] mgr option agent_down_multiplier = 3.0
2022-10-21T22:45:14.885915+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8703 : cephadm [DBG] mgr option max_osd_draining_count = 10
2022-10-21T22:45:14.885955+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8704 : cephadm [DBG] mgr option log_level =
2022-10-21T22:45:14.885995+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8705 : cephadm [DBG] mgr option log_to_file = False
2022-10-21T22:45:14.886036+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8706 : cephadm [DBG] mgr option log_to_cluster_level = debug
2022-10-21T22:51:09.306208+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 9031 : cephadm [INF] Upgrade: Stopped
2022-10-21T22:51:26.531006+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 9050 : cephadm [INF] Upgrade: Started with target quay.io/ceph/ceph:v17.2.4
mode = root
2022-10-21T22:45:14.884516+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8668 : cephadm [DBG] mgr option container_image_base = quay.io/ceph/ceph
2022-10-21T22:45:14.884557+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8669 : cephadm [DBG] mgr option container_image_prometheus = quay.io/prometheus/prometheus:v2.33.4
2022-10-21T22:45:14.884597+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8670 : cephadm [DBG] mgr option container_image_grafana = quay.io/ceph/ceph-grafana:8.3.5
2022-10-21T22:45:14.884636+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8671 : cephadm [DBG] mgr option container_image_alertmanager = quay.io/prometheus/alertmanager:v0.23.0
2022-10-21T22:45:14.884675+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8672 : cephadm [DBG] mgr option container_image_node_exporter = quay.io/prometheus/node-exporter:v1.3.1
2022-10-21T22:45:14.884714+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8673 : cephadm [DBG] mgr option container_image_loki = docker.io/grafana/loki:2.4.0
2022-10-21T22:45:14.884753+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8674 : cephadm [DBG] mgr option container_image_promtail = docker.io/grafana/promtail:2.4.0
2022-10-21T22:45:14.884793+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8675 : cephadm [DBG] mgr option container_image_haproxy = docker.io/library/haproxy:2.3
2022-10-21T22:45:14.884832+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8676 : cephadm [DBG] mgr option container_image_keepalived = docker.io/arcts/keepalived
2022-10-21T22:45:14.884871+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8677 : cephadm [DBG] mgr option container_image_snmp_gateway = docker.io/maxwo/snmp-notifier:v1.2.1
2022-10-21T22:45:14.884911+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8678 : cephadm [DBG] mgr option warn_on_stray_hosts = True
2022-10-21T22:45:14.884950+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8679 : cephadm [DBG] mgr option warn_on_stray_daemons = True
2022-10-21T22:45:14.884989+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8680 : cephadm [DBG] mgr option warn_on_failed_host_check = True
2022-10-21T22:45:14.885028+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8681 : cephadm [DBG] mgr option log_to_cluster = True
2022-10-21T22:45:14.885066+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8682 : cephadm [DBG] mgr option allow_ptrace = False
2022-10-21T22:45:14.885108+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8683 : cephadm [DBG] mgr option container_init = True
2022-10-21T22:45:14.885149+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8684 : cephadm [DBG] mgr option prometheus_alerts_path = /etc/prometheus/ceph/ceph_default_alerts.yml
2022-10-21T22:45:14.885190+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8685 : cephadm [DBG] mgr option migration_current = 5
2022-10-21T22:45:14.885230+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8686 : cephadm [DBG] mgr option config_dashboard = True
2022-10-21T22:45:14.885270+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8687 : cephadm [DBG] mgr option manage_etc_ceph_ceph_conf = False
2022-10-21T22:45:14.885309+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8688 : cephadm [DBG] mgr option manage_etc_ceph_ceph_conf_hosts = *
2022-10-21T22:45:14.885349+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8689 : cephadm [DBG] mgr option registry_url = None
2022-10-21T22:45:14.885389+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8690 : cephadm [DBG] mgr option registry_username = None
2022-10-21T22:45:14.885428+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8691 : cephadm [DBG] mgr option registry_password = None
2022-10-21T22:45:14.885467+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8692 : cephadm [DBG] mgr option registry_insecure = False
2022-10-21T22:45:14.885506+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8693 : cephadm [DBG] mgr option use_repo_digest = True
2022-10-21T22:45:14.885545+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8694 : cephadm [DBG] mgr option config_checks_enabled = False
2022-10-21T22:45:14.885588+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8695 : cephadm [DBG] mgr option default_registry = docker.io
2022-10-21T22:45:14.885629+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8696 : cephadm [DBG] mgr option max_count_per_host = 10
2022-10-21T22:45:14.885673+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8697 : cephadm [DBG] mgr option autotune_memory_target_ratio = 0.7
2022-10-21T22:45:14.885715+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8698 : cephadm [DBG] mgr option autotune_interval = 600
2022-10-21T22:45:14.885756+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8699 : cephadm [DBG] mgr option use_agent = False
2022-10-21T22:45:14.885795+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8700 : cephadm [DBG] mgr option agent_refresh_rate = 20
2022-10-21T22:45:14.885834+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8701 : cephadm [DBG] mgr option agent_starting_port = 4721
2022-10-21T22:45:14.885875+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8702 : cephadm [DBG] mgr option agent_down_multiplier = 3.0
2022-10-21T22:45:14.885915+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8703 : cephadm [DBG] mgr option max_osd_draining_count = 10
2022-10-21T22:45:14.885955+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8704 : cephadm [DBG] mgr option log_level =
2022-10-21T22:45:14.885995+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8705 : cephadm [DBG] mgr option log_to_file = False
2022-10-21T22:45:14.886036+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 8706 : cephadm [DBG] mgr option log_to_cluster_level = debug
2022-10-21T22:51:09.306208+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 9031 : cephadm [INF] Upgrade: Stopped
2022-10-21T22:51:26.531006+0000 mgr.ceph01.domain.local.miydsy (mgr.8880077) 9050 : cephadm [INF] Upgrade: Started with target quay.io/ceph/ceph:v17.2.4
</pre>
<p>Ideas? Those are the only entries sense enabling debug loggings.</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2277332022-10-25T12:14:01ZAdam King
<ul></ul><p>this seems to imply the cephadm service loop just isn't running at all. Does the REFRESHED column in `ceph orch device ls` report a relatively recent refresh time? If not something got stuck somewhere, the most common culprit being a hung `cephadm ceph-volume` process on one of the nodes. If that does report a recent refresh, perhaps the orchestrator is paused? What does `ceph orch status` spit out?</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2277522022-10-25T19:00:17ZBrian Woods
<ul><li><strong>File</strong> <a href="/attachments/download/6227/log.txt">log.txt</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/6227/log.txt">View</a> added</li></ul><p>I rebooted last night, all items report a refreshed time of about 13 hours ago, when I rebooted.</p>
<pre>
# ceph orch status
Backend: cephadm
Available: Yes
Paused: No
</pre>
<p>Re-enabled debug (just in case something resets it).<br />Restarted the upgrade.</p>
<p>And though I see this:<br /><pre>
# ceph orch upgrade status
{
"target_image": "quay.io/ceph/ceph:v17.2.4",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [],
"progress": "",
"message": ""
}
</pre></p>
<p>It is NOT showing in the ceph -s screen!!! So I think you are onto something. I thought it did this the other day, but after running the upgrade command again, I just figured I didn't hit enter or something..</p>
<p>So I ran it again, but it still doesn't show up...</p>
<p>Also wrote a quick script to scrub my logs for easy posting, so I have attached a a good long chunk of it.</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2278622022-10-29T17:32:31ZBrian Woods
<ul></ul><p>I am getting ready to add another node to the cluster. Is there anything you can think of I can check, pre or post?</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2278842022-10-31T05:04:35ZBrian Woods
<ul></ul><p>And suddenly the upgrade is happening!!!</p>
<p>Today I rebooted ceph02, a node that only had the MDS, and suddenly things started upgrading!</p>
<p>No idea why... Will report back if it finished, and will try and capture logs (not tonight though...).</p> Orchestrator - Bug #57800: ceph orch upgrade does not appear to work with FQNDs.https://tracker.ceph.com/issues/57800?journal_id=2319852023-02-23T00:06:53ZBrian Woods
<ul></ul><p>Something was stuck on one of the nodes. Can't debug further. This ticket can be canceled.</p>