Containerized osd config must be updated when adding/removing mons
- bootstrap a cluster (1 mon, 1 mgr)
- add a bunch of osds (
ceph orch apply osd --all-available-devices)
- add some more mons and mgrs (
ceph orch apply mon 3 ; ceph orch apply mgr 3)
At this point, ceph.conf as seen by the osds (/var/lib/ceph/$FSID/osd.$ID/config) still only lists the first mon. If you restart any osds, and that first mon is down for some reason, the osd's can't join the cluster, because they don't know the other mons exist. They'll just sit there logging "monclient(hunting): authenticate timed out after 300" every five minutes until that first mon comes back.
I can think of two potential ways to address this:
cephadm mon apply go out and update every single osd config file with the new list of mons. This would of course not work completely if some osd hosts were down at the time. Also it might take a while...
2) Have the osds update their own config file automatically based on current monmaps.
This probably also needs to go into troubleshooting docs (check mon_host in each containerized osd's individual config file)