mgr will refuse connection from the monitor who starts behind it
For example, in a 3 monitor cluster. mon-A and mon-B are active and in quorum, now start a mgr, then start mon-C. The mgr cannot recognize mon-C. When we send query command such as 'ceph pg dump' to mon-C, mon-C will try to connect with mgr, but mgr will markdown this session after a few steps. If we are lucky enough, the query result will return to mon-C before mgr marks this session down. If not, mon-C will retry to connect with mgr until it successes.
#2 Updated by Xinying Song 8 months ago
Seems no one cares about this issue. But I'd still like to give a more detail description.
I'm using `juju` to deploy ceph environment. Ceph version is Luminous. Charm files for juju are downloaded from official charm-store. And use ceph_exporter provided by digitalocean for prometheus.
When all those components have been deployed, we find the query to ceph_eporter is dramatically slow. Further investigation shows that the ceph_export send a `pg dump` command to monitor-A using an interface called 'rados_mon_command' in librados. Then monitor-A delegates this query to mgr-A, waiting for the results that mgr-A should return. However, monitor-A always failed to read the result from mgr-A, despiting mgr-A indeed has successfully returned the result, and mon-A keep trying to resend the query until it successfully gets the result. According to monitor log, when monitor-A try to read the result returned from mgr-A, it got an 'peer close connection' error. After read a lot source codes that related, we find the root cause: service mgr-A is started before mon-A being in the quorum, so it doesn't know mon-A. when mon-A try to establish a connection to mgr-A, mgr-A will first accept it(in DaemonServer::handl_open()), and later mon-A(be strictly is mgr-client in mon-A) will send an MMgrReport message to mgr-A, then mgr-A will handle this(in DaemonServer::handle_report()) and find out it doesn't have any knowledge(DaemonServer::daemon_state) about mon-A, so mgr-A close this connection on it's own.
Although this problem can be avoided by starting ceph components in strictly right order, we still think it could be processed more elegantly in ceph. All we need to do is handle monmap change in mgr.
#3 Updated by Xinying Song 7 months ago
Here is a simple version about how to observe the problem in mgr.
1. prepare ceph.conf with 3 monitors.
2. init and start mon.A and mon.B
3. init and start mgr.A with debug_mgr=5
4. init and start mon.C
5. tail -f /var/log/ceph/ceph-mgr.A.log |grep 'mon,'
Then you will see logs like 'mgr.server handle_report rejecting report from mon,C, since we do not have its metadata now.' periodically occur. This indicates mgr doesn't update its daemon_state info as expected.