bootstrap mgr timeout is too short
While doing a test installation of Ceph Quincy on our rock64 ARM64 machines, I spotted a problem with the timeout for waiting for the manager to come alive. I'd like to suppose to make it wait a little longer for slower machines. The mgr come up eventually with only 2 ticks left. Since the rock64 is running with an attached SSD I would suggest to raise the ticks to 30.
#2 Updated by Andreas Elvers 3 months ago
root@rock64:~# cephadm bootstrap --mon-ip 192.168.50.36 Creating directory /etc/ceph for ceph.conf Verifying podman|docker is present... Verifying lvm2 is present... Verifying time synchronization is in place... Unit chrony.service is enabled and running Repeating the final host check... docker (/usr/bin/docker) is present systemctl is present lvcreate is present Unit chrony.service is enabled and running Host looks OK Cluster fsid: cb682766-252f-11ed-a9f6-2e18419c566b Verifying IP 192.168.50.36 port 3300 ... Verifying IP 192.168.50.36 port 6789 ... Mon IP `192.168.50.36` is in CIDR network `192.168.50.0/23` Mon IP `192.168.50.36` is in CIDR network `192.168.50.0/23` Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network Pulling container image quay.io/ceph/ceph:v17... Ceph version: ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable) Extracting ceph user uid/gid from container image... Creating initial keys... Creating initial monmap... Creating mon... Waiting for mon to start... Waiting for mon... mon is available Assimilating anything we can from ceph.conf... Generating new minimal ceph.conf... Restarting the monitor... Setting mon public_network to 192.168.50.0/23 Wrote config to /etc/ceph/ceph.conf Wrote keyring to /etc/ceph/ceph.client.admin.keyring Creating mgr... Verifying port 9283 ... Waiting for mgr to start... Waiting for mgr... mgr not available, waiting (1/15)... mgr not available, waiting (2/15)... mgr not available, waiting (3/15)... mgr not available, waiting (4/15)... mgr not available, waiting (5/15)... mgr not available, waiting (6/15)... mgr not available, waiting (7/15)... mgr not available, waiting (8/15)... mgr not available, waiting (9/15)... mgr not available, waiting (10/15)... mgr not available, waiting (11/15)... mgr not available, waiting (12/15)... mgr not available, waiting (13/15)... mgr is available [ ... ]
#7 Updated by Andreas Elvers 3 months ago
Redouane Kachach Elhichou wrote:
Thanks, did you try to use the --timeout / --retry arguments to adapt the timeout to your specific case and see if these args helps to solve your issue?
It worked for me anyway. The mgr was created just in time. I am currently in testing, so I will re-do the setup and set the timeout to 30. But I think raising the timeout default a bit could be helpful. Thanks for pointing out the --timeout option.
#8 Updated by Redouane Kachach Elhichou 3 months ago
Raising the default timeout may also lead to slow reaction in case of real issue. IMHO no changes are needed since the default timeout is already high enough for most use cases and is adjustable by using the arguments I mentioned earlier. I think since there's no issue with the current behavior I'd suggest to close this tracker.