OSD: shutdown of a OSD Host causes slow requests
while stopping all OSDs on a host I get for some seconds slow ops. Sometimes this don't happen and mostly it works as expected on stopping a single OSD.
I found the osd_fast_shutdown parameter which is per default true. In this case the OSD don't announce its shutdown to the mons before it stops.
If I stop all OSDs on a host (24 OSDs) I see the following several hundred times (of course with different OSD numbers) in my ceph.log
cluster [DBG] osd.317 reported immediately failed by osd.202
The first message like this is about 5 to 7 seconds after stopping the OSD. After down detection the cluster starts its peering and after this all is good. But the long time causes slow ops.
If i set the option osd_fast_shutdown to false I will seed more or less immediately the following in the ceph.log (one message per OSD)
cluster [INF] osd.837 marked itself down
In this case the whole process of detection and peering is much faster and I don't get any slow ops
After reading the conversation I don't get it why the fast_sthudown shout be faster and better (related to the down detection) than so normal one. But I can understand that it is not necessary to stop all subsystems.
I wonder if it would make sense to send the shutdown message to the mons before stopping the OSD even in the fast shutdown process.
What do you think?
each node is connected via 2x10G LACP Channel
#13 Updated by Mauricio Oliveira over 1 year ago
I'd like to / can work on submitting the backport PRs, if that's OK.
In the future, if I want to open backport tracker issues with the
script, is it possible to get access so not to get ForbiddenError?
(I tried slightly before it was done, and hit this.)
INFO:root:Processing issue list ->46978<- INFO:root:Processing 1 issues with status Pending Backport Traceback (most recent call last): File "./src/script/backport-create-issue", line 343, in <module> ... redminelib.exceptions.ForbiddenError: Requested resource is forbidden
#20 Updated by Mauricio Oliveira over 1 year ago
Hi @singuliere _,
Could you please revert the backport field to include Octopus and Nautilus?
Such backports have been done (see 'Related issues'), but are waiting reviews.
Only Pacific has been merged at this time perhaps due to release timing/focus.
There's no other comment or discussion in the backport bugs or PRs that suggest
these being no longer needed; so I'd like to ask for the backports to be kept.