Bug #44419: ops stuck on "wait for new map" for no apparent reason - Ceph - Ceph

Actions

Copy link

Bug #44419

closed

ops stuck on "wait for new map" for no apparent reason

Added by Nikola Ciprich about 4 years ago. Updated about 4 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

OSDMap

Target version:

v13.2.8

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

v13.2.6, v13.2.8

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I'd like to report the problem we've hit on one of our mimic clusters
(13.2.6 as well as 13.2.6). Any manipulation with OSD (ie restart) causes
lot of slow ops caused by waiting for new map. It seems those are slowed by SATA
OSDs which keep being 100% busy reading for long time until all ops are gone,
blocking OPS on unrelated NVME pools - SATA pools are completely unused now.

is this possible that those maps are being requested from slow SATA OSDs
and it takes such a long time for some reason? why could it take so long?
the cluster is very small with very light load..

when we restarted one of the nodes, it took literally hours for the peering to finish
due to waiting for maps.. we've done all possible network checks, as well as harddrives
checks, everything seems to be in order..

We can easily reproduce the problem I'll soon have maintenance window so I'll try to gather
as much debug info as possible..

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Nikola Ciprich about 4 years ago

while digging deeper, I noticed that when the cluster gets into this
state, osd_map_cache_miss on OSDs starts growing rapidly.. even when
I increased osd map cache size to 500 (which was the default at least
for luminous) it behaves the same..

I think this could be related..

Actions

Copy link

Updated by Nikola Ciprich about 4 years ago

so I can confirm that at least in my case, the problem is caused
by old osd maps not being pruned for some reason, and thus not fitting
into cache. When I increased osd map cache to 5000 the problem is gone.

The question is why they're not being pruned, even though the cluster is in
healthy state and there are no down OSDs.

Actions

Copy link