PR #14886 creates a SafeTimer thread per PG
This appears to be causing huge numbers of idle safe_timer threads to be created on the OSD. While this is likely not ideal for a variety of reasons, the immediate fallout is that it's making wallclock profiling ridiculously slow and also making it really irritating to gather backtraces from gdb.
#1 Updated by Brad Hubbard 2 months ago
Hi Mark, Thanks for the report. The intention was that this should increase performance as it frees up the work queue to make progress by sleeping asynchronously, rather than synchronously as it was doing prior to 14886. I'll see what can be done about the issues you are seeing. Do you see the same sort of issue (to a lesser extent) when snap trimming is in progress as the sleep in that case is implemented in a very similar fashion?
#3 Updated by Greg Farnum 2 months ago
I guess in particular, the issue is that there is a cost per thread (not a huge one, but it exists). Mark is seeing it pretty aggressively because it dramatically increases the number of threads and he's doing hacks with libunwind on every thread and things, but it also manifests as higher VSS and RSS that we don't want. Should be pretty simple to have a shared Timer object instead of one per PG.
(I'm not sure if there's any particular reason we have more than one Timer to begin with; I guess just because they interact with different threads and locks and we don't want them to contend?)