ops stuck on "wait for new map" for no apparent reason
I'd like to report the problem we've hit on one of our mimic clusters
(13.2.6 as well as 13.2.6). Any manipulation with OSD (ie restart) causes
lot of slow ops caused by waiting for new map. It seems those are slowed by SATA
OSDs which keep being 100% busy reading for long time until all ops are gone,
blocking OPS on unrelated NVME pools - SATA pools are completely unused now.
is this possible that those maps are being requested from slow SATA OSDs
and it takes such a long time for some reason? why could it take so long?
the cluster is very small with very light load..
when we restarted one of the nodes, it took literally hours for the peering to finish
due to waiting for maps.. we've done all possible network checks, as well as harddrives
checks, everything seems to be in order..
We can easily reproduce the problem I'll soon have maintenance window so I'll try to gather
as much debug info as possible..
#1 Updated by Nikola Ciprich 4 months ago
while digging deeper, I noticed that when the cluster gets into this
state, osd_map_cache_miss on OSDs starts growing rapidly.. even when
I increased osd map cache size to 500 (which was the default at least
for luminous) it behaves the same..
I think this could be related..
#2 Updated by Nikola Ciprich 4 months ago
so I can confirm that at least in my case, the problem is caused
by old osd maps not being pruned for some reason, and thus not fitting
into cache. When I increased osd map cache to 5000 the problem is gone.
The question is why they're not being pruned, even though the cluster is in
healthy state and there are no down OSDs.