Project

General

Profile

Backport #14592

Updated by Kefu Chai about 8 years ago

see http://tracker.ceph.com/issues/13990#note-39. 

 reproduce steps: 

 # monitor sends pg-create messages, so the @pg_create.created@ is @pool.last_change@, while the newly pool.last_change is the OSDMonitor.pending_inc.epoch at that moment. but somehow these PGs fail to create because 

   
   * some osd is down but not out, or 
   * some osd's osd_debug_drop_pg_create_probability is 1.0 // this option is only available in hammer 
 # and some changes are happening in the meantime, which update the osdmap. once the number osdmap epochs reach the threshold, monitor starts to trim them 
 # the the OSDs are back to business, they start to process pg-create, and these pg-create messages carry old osdmaps which were already trimmed by mon and osd, so when osd try to build the prior set, they are missing. so assert failure! 

 a possible fix could be: 

 monitor should not trim the osdmaps until the pg-create which references them gets processed by osd.

Back