Project

General

Profile

Backport #13060

Updated by Loïc Dachary over 8 years ago

https://github.com/ceph/ceph/pull/5892 

 On Fri, 11 Sep 2015, Haomai Wang wrote: 
 > On Fri, Sep 11, 2015 at 8:56 PM, Sage Weil <sage@newdream.net> wrote: 
 >         On Fri, 11 Sep 2015, ?? wrote: 
 >         > Thank Sage Weil: 
 >         >  
 >         > 1. I delete some testing pools in the past, but is was a long 
 >         time ago (may be 2 months ago), in recently upgrade, do not 
 >         delete pools. 
 >         > 2.? ceph osd dump please see the (attachment file 
 >         ceph.osd.dump.log) 
 >         > 3. debug osd = 20' and 'debug filestore = 20? (attachment file 
 >         ceph.osd.5.log.tar.gz) 
 >        
 >         This one is failing on pool 54, which has been deleted.? In this 
 >         case you 
 >         can work around it by renaming current/54.* out of the way. 
 >        
 >         > 4. i install the ceph-test, but output error 
 >         > ceph-kvstore-tool /ceph/data5/current/db list 
 >         > Invalid argument: /ceph/data5/current/db: does not exist 
 >         (create_if_missing is false) 
 > 
 >         Sorry, I should have said current/omap, not current/db.? I'm 
 >         still curious 
 >         to see the key dump.? I'm not sure why the leveldb key for these 
 >         pgs is 
 >         missing... 
 >        
 >        
 > Yesterday I have a chat with wangrui and the reason is "infos"(legacy oid) 
 > is missing. I'm not sure why it's missing. 
   
 Probably 
   
 https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908 
   
 Oh, I think I see what happened: 
   
  - the pg removal was aborted pre-hammer.    On pre-hammer, thsi means that 
 load_pgs skips it here:                                
   
  https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L2121 
        
  - we upgrade to hammer.    we skip this pg (same reason), don't upgrade it, 
 but delete teh legacy infos object   
 
  https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908 
 
  - now we see this crash... 
        
 I think the fix is, in hammer, to bail out of peek_map_epoch if the infos 
 object isn't present, here 
        
  https://github.com/ceph/ceph/blob/hammer/src/osd/PG.cc#L2867 
        
 Probably we should restructure so we can return a 'fail' value  
 instead of a magic epoch_t meaning the same... 

Back