test_scrub_pause_and_resume (tasks.cephfs.test_scrub_checks.TestScrubControls) fails intermittently
Greg saw this in nautilus during test: http://pulpito.front.sepia.ceph.com/gregf-2020-03-13_20:56:54-fs-wip-greg-testing-nautilus-313-distro-basic-smithi/
Happens in master too: http://pulpito.ceph.com/vshankar-2020-03-16_10:45:29-fs-master-testing-basic-smithi/
The problem seems to be arising when a laggy (active) mds receives an mgrmap. When an active MDS is laggy, an incoming message is queued to be processed later (when the mds is not laggy anymore). If this message is a mgrmap, the (laggy) MDS queues the message, thereby marking the message as processed (returning `true` from `ms_dispatch()`). Later, when the mds processes the message queue, it doesn't handle this message thereby dropping it.
The side effect of this is that the mgr client instance in mds does not get a chance to process the mgr map which kickstarts things like periodic report updates to the manager. Mgr report (`MMgrReport`) carries `task_status` which contains MDS scrub status. Since the updated scrub status is not sent to ceph mgr, it does get recorded in service map (and, not displayed in `ceph status`) causing the test to fail.