Bug #10431

Updated by Loic Dachary about 5 years ago

We were debugging a PG stuck at peering problem. It may due to peering event lost or not been handled.

We found that some thread call osd->peering_queue.push_back without holding the osd_lock. It may cause a race condition when other threads (usually a dispatcher thread) push_back to peering_queue at the same time.

We found at least when handling an FlushedEvt, the thread will push_back osd peering_queue.

Can we add some checkers to assure the thread holds lock when doing osd->peering_wq.queue(PG*).

* firefly equivalent change commit:852d7b5b3c019c02c042b767fc88916088e1a94d