Large RGW GC queue might prevent OSD from starting
It looks like rgw_gc_queue_list_entries might cause HDD-based OSD to load the queue for more than half an hour.
Which causes OSD being marked as down shortly after restart.
The root cause is that the queue (or rather some entry in it) takes approx. 60 MB and rgw_gc_queue_list_entries reads it through pretty inefficient CLS queue_list_entries using 1K (sic!) read chunks. Which in turn permits 7-8 reads from a spinning drive.
Relevant OSD log is attached.
#3 Updated by Igor Fedotov about 2 months ago
Matt Benjamin wrote:
I think this could also be interacting with the issue being addressed here:
Matt, IIUC your point is that queue_list_entries might take too long time due to large bufferlist and resulting copying overhead, is that true?
IMO that's not the case for this ticket - the issue is that reading 60MB entry using 1K chunks from spinning drive is absolutely inappropriate - each read takes approx. 150ms hence reading the full entry takes more than half an hour!!! And even worse - for unknown reason this stalls all other exchange with an OSD too...
Migrating the pool to SSD drives resolved the issue - so it's [disk] reading performance which is crucial...