Project

General

Profile

Bug #10431

Updated by Loïc Dachary about 9 years ago

We were debugging a PG stuck at peering problem. It may due to peering event lost or not been handled.  

 We found that some thread call osd->peering_queue.push_back without holding the osd_lock. It may cause a race condition when other threads (usually a dispatcher thread) push_back to peering_queue at the same time.  

 We found at least when handling an FlushedEvt, the thread will push_back osd peering_queue. 

 Can we add some checkers to assure the thread holds lock when doing osd->peering_wq.queue(PG*). 

 * firefly equivalent change commit:852d7b5b3c019c02c042b767fc88916088e1a94d 

Back