Bug #12876
monitor crashed in CrushWrapper::do_rule()
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
(gdb) bt #0 0x00007f071a05020b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00000000009a996d in reraise_fatal (signum=11) at global/signal_handler.cc:59 #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:109 #3 <signal handler called> #4 crush_do_rule (map=0x52b0d40, ruleno=<optimized out>, x=211857128, result=0x7fff2f78dcc0, result_max=8, weight=0x53ae5a0, weight_max=120, scratch=<optimized out>) at crush/mapper.c:937 #5 0x00000000007a85cb in do_rule (weight=..., maxout=8, out=..., x=211857128, rule=2, this=0x536a680) at ./crush/CrushWrapper.h:1026 #6 OSDMap::_pg_to_osds (this=this@entry=0x53ec088, pool=..., pg=..., osds=osds@entry=0x7fff2f78dd80, primary=primary@entry=0x7fff2f78de40, ppps=ppps@entry=0x7fff2f78dd74) at osd/OSDMap.cc:1521 #7 0x00000000007a8a64 in OSDMap:g_to_raw_up (this=this@entry=0x53ec088, pg=..., up=up@entry=0x7fff2f78de60, primary=primary@entry=0x7fff2f78de40) at osd/OSDMap.cc:1676 #8 0x00000000007ab8f7 in OSDMap::remove_redundant_temporaries (cct=0x5272000, osdmap=..., pending_inc=pending_inc@entry=0x53ec298) at osd/OSDMap.cc:1198 #9 0x000000000060fdb9 in OSDMonitor::create_pending (this=0x53ec000) at mon/OSDMonitor.cc:885 #10 0x00000000006047b9 in PaxosService::_active (this=this@entry=0x53ec000) at mon/PaxosService.cc:272 #11 0x0000000000604ad7 in PaxosService::election_finished (this=0x53ec000) at mon/PaxosService.cc:250 #12 0x00000000005c34a6 in Monitor::win_election (this=this@entry=0x52bab00, epoch=epoch@entry=1, active=..., features=features@entry=1125899906842623, cmdset=0xd14f80 <mon_commands>, cmdsize=168, classic_monitors=classic_monitors@entry=0x0) at mon/Monitor.cc:1848 #13 0x00000000005c388c in Monitor::win_standalone_election (this=this@entry=0x52bab00) at mon/Monitor.cc:1803 #14 0x00000000005c42eb in Monitor::bootstrap (this=this@entry=0x52bab00) at mon/Monitor.cc:929 #15 0x00000000005c4645 in Monitor::init (this=0x52bab00) at mon/Monitor.cc:742 #16 0x00000000005769c0 in main (argc=<optimized out>, argv=<optimized out>) at ceph_mon.cc:750
looks like it is more a crush related bug than a monitor one
Related issues
History
#1 Updated by Kefu Chai over 8 years ago
- Description updated (diff)
#2 Updated by Kefu Chai over 8 years ago
to reproduce this issue:
tar xzvf ~/eino-utu-storedb.tar.gz -C /tmp/mon.a ./ceph-monstore-tool /tmp/mon.a get monmap -- --out /tmp/monmap ./monmaptool --print /tmp/monmap # find out the mon name ./monmaptool --rm cephmon-test-02 /tmp/monmap # remove it ./monmaptool --add a 127.0.0.1:6789 /tmp/monmap # replace it with mine ./ceph-mon -i a -c /home/kefu/dev/ceph/src/ceph.conf --mon-data /tmp/mon.a --inject-monmap /tmp/monmap # inject the cooked monmap into the monstore
./ceph-mon -i a --mon-data /tmp/mon.a -f
#3 Updated by Kefu Chai over 8 years ago
- Status changed from New to In Progress
#4 Updated by Kefu Chai over 8 years ago
- File crush.src added
#5 Updated by Kefu Chai over 8 years ago
rule ecpool { ruleset 2 type erasure min_size 3 max_size 20 step set_chooseleaf_tries 20 step take default step choose indep 5 type host step chooseleaf indep 0 type osd step emit }
this rule has two choose statements. without it, the check passes.
./crushtool --compile /tmp/crush.src -o /tmp/crushmap.compiled ./crushtool -i /tmp/crushmap.compiled --test --check 60 --min-x 1 --max-x 50
so we need to validate the correctness of crush rules before injecting it into osdmap.
#6 Updated by Kefu Chai over 8 years ago
- Status changed from In Progress to Duplicate