Bug #11814: implicit erasure code crush ruleset is not validated - Ceph - Ceph

Bug #11814

Updated by Loïc Dachary almost 9 years ago

h3. Context 

 * RHEL6 
 * Hammer 0.94.1 
 * 3 Mons 
 * 315 OSDs 

 h3. Steps to reproduce (crushmap is attached to the ticket) 

 * $profile = k8m4isa 
 * ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host 
 * $pool = castor-ec-isa 
 * ceph osd pool create $pool 4096 4096 erasure k8m4isa castor-ec-isa 

 The mon should crash instantaneously, we got this backtrace:  
 <pre> 
 #0    crush_choose_indep (map=0x363fbc0, bucket=0x0, weight=0x36fa300, weight_max=315, x=-733087052, left=12, numrep=12, type=1, out=0x7fffffffbcb0, outpos=0, tries=100, recurse_tries=5, recurse_to_leaf=1, out2=0x7fffffffbce0, parent_r=0) at crush/mapper.c:664 
 #1    0x000000000079ec61 in crush_do_rule (map=0x363fbc0, ruleno=<value optimized out>, x=-733087052, result=0x7fffffffbd20, result_max=12, weight=0x36fa300, weight_max=315, scratch=0x7fffffffbc80) at crush/mapper.c:930 
 #2    0x000000000080cdc5 in CrushWrapper::do_rule (this=<value optimized out>, rule=10, x=-733087052, out=std::vector of length 0, capacity 0, maxout=12, weight=std::vector of length 315, capacity 315 = {...}) at crush/CrushWrapper.h:1025 
 #3    0x0000000000836c06 in OSDMap::_pg_to_osds (this=0x3888988, pool=..., pg=..., osds=0x7fffffffbea0, primary=0x7fffffffbecc, ppps=0x7fffffffbec4) at osd/OSDMap.cc:1521 
 #4    0x0000000000837044 in OSDMap::_pg_to_up_acting_osds (this=0x3888988, pg=..., up=0x7fffffffc330, up_primary=0x7fffffffc36c, acting=0x7fffffffc0d0, acting_primary=0x7fffffffc368) at osd/OSDMap.cc:1702 
 #5    0x000000000065c154 in pg_to_up_acting_osds (this=0x3740e00) at osd/OSDMap.h:677 
 #6    PGMonitor::map_pg_creates (this=0x3740e00) at mon/PGMonitor.cc:1127 
 #7    0x000000000065cd7d in PGMonitor::post_paxos_update (this=0x3740e00) at mon/PGMonitor.cc:311 
 #8    0x0000000000583431 in Monitor::refresh_from_paxos (this=0x3878000, need_bootstrap=0x0) at mon/Monitor.cc:791 
 #9    0x00000000005836d5 in Monitor::init_paxos (this=0x3878000) at mon/Monitor.cc:766 
 #10 0x000000000059a411 in Monitor::preinit (this=0x3878000) at mon/Monitor.cc:651 
 #11 0x000000000055519a in main (argc=<value optimized out>, argv=0x36b00b0) at ceph_mon.cc:731 
 </pre> 

 In fact I just noticed that the (probable) cause of the crash is that we created the erasure-code-profile with a ruleset-root=bigbang but this root has been decommisionned. If I noticed that before I would have fixed this parameter and then the MONs wouldn't have crashed (as far as I can tell).

Back

Project

General

Profile

Ceph

Bug #11814