aah, this looks like having the same root cause as https://tracker.ceph.com/issues/44518.
On the monitor side:
2020-03-16T19:53:42.160+0000 7f3bfc664700 10 mon.a@0(leader).log v82 logging 2020-03-16T19:53:42.161223+0000 mon.a (mon.0) 142 : audit [INF] from='client.? 127.0.0.1:0/2347291138' entity='client.admin' cmd=[{"prefix": "osd down", "ids": ["0"]}]: dispatch
2020-03-16T19:53:42.160+0000 7f3bfc664700 10 mon.a@0(leader).paxosservice(logm 1..82) setting proposal_timer 0x559f39316540 with delay of 0.0124679
2020-03-16T19:53:42.160+0000 7f3bfc664700 20 -- v2:127.0.0.1:7108/0 done calling dispatch on 0x559f393468c0
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxosservice(logm 1..82) propose_pending
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).log v82 encode_full log v 82
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).log v82 encode_pending v83
2020-03-16T19:53:42.173+0000 7f3bfee69700 5 mon.a@0(leader).paxos(paxos active c 1..135) queue_pending_finisher 0x559f37ef7850
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos active c 1..135) trigger_propose active, proposing now
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos active c 1..135) propose_pending 136 20090 bytes
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos updating c 1..135) begin for 136 20090 bytes
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos updating c 1..135) commit_start 136
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).paxosservice(osdmap 1..29) propose_pending
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 encode_pending e 30
2020-03-16T19:53:42.198+0000 7f3bfee69700 1 mon.a@0(leader).osd e29 do_prune osdmap full prune enabled
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 should_prune currently holding only 28 epochs (min osdmap epochs: 500); do not prune.
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 maybe_prime_pg_temp estimate 4 pgs on 1 osds >= 0.25 of total 4 pgs, all
2020-03-16T19:53:42.198+0000 7f3bfee69700 20 queue 0x7f3bfee64cc0 1 [0,4)
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 _process 0x7f3bfee64cc0 pool 1 [0,4) pgs []
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.0 [1,0,2]/[] -> [1,2]/[1,2], priming []
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.1 [2,0,1]/[] -> [2,1]/[2,1], priming []
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.2 [0,1,2]/[] -> [1,2]/[1,2], priming []
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.3 [1,2,0]/[] -> [1,2]/[1,2], priming []
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 maybe_prime_pg_temp done in 0.000100
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 scan_for_creating_pgs already created 1
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0 pools queued
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0 pgs removed because they're created
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs queue remaining: 0 pools
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0/0 pgs added from queued pools
2020-03-16T19:53:42.198+0000 7f3bfee69700 2 mon.a@0(leader).osd e29 osd.0 DOWN
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 encode_pending encoding full map with octopus features 1080873256688364036
2020-03-16T19:53:42.198+0000 7f3bfee69700 20 mon.a@0(leader).osd e29 full_crc 113280054 inc_crc 473227981
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader) e1 log_health updated 1 previous 0
2020-03-16T19:53:42.198+0000 7f3bfee69700 0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)
e.g., osd.0 has been marked down at e29.
On the osd.0 side, osd.0 was stuck at e28:
2020-03-16T19:54:07.999+0000 7f944a1b8700 10 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] do_peering_event: epoch_sent: 28 epoch_requested: 28 MLeaseAck epoch 28 from osd.1 pg_lease_ack(ruub 71.698279127s)
2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until peer osd.1 ruub 71.698279127s
2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until peer osd.2 ruub 71.698279127s
2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until readable_until[_ub] 71.698279127s (sent 71.698279127s)
2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recheck_readable wasn't wait or laggy