Actions
Bug #7487
closedmon: crashes when moving CRUSH items in zero-weighted tree?
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
Monitor
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
See the thread "[ceph-users] ceph-mon segmentation fault"
I have compiled and installed ceph from sources on debian/jessie: git clone --recursive -b v0.75 https://github.com/ceph/ceph.git cd ceph/ && ./autogen.sh && ./configure && make && make install /usr/local/bin/ceph-authtool --create-keyring /data/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *' /usr/local/bin/ceph-authtool --create-keyring /ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow' /usr/local/bin/ceph-authtool /data/ceph.mon.keyring --import-keyring /ceph.client.admin.keyring /usr/local/bin/monmaptool --create --fsid e90dfd37-98d1-45bb-a847-8590a5ed8e71 /data/monmap /usr/local/bin/ceph-mon --mkfs -i ceph-mon.dkctl --monmap /data/monmap --keyring /data/ceph.mon.keyring my ceph.conf is (I have configured local TLD dkctl. with ceph-mon A-record): [global] fsid = e90dfd37-98d1-45bb-a847-8590a5ed8e71 mon initial members = ceph-mon.dkctl auth cluster required = cephx auth service required = cephx auth client required = cephx keyring = /ceph.client.admin.keyring osd pool default size = 2 osd pool default min size = 2 osd pool default pg num = 333 osd pool default pgp num = 333 osd crush chooseleaf type = 1 osd journal size = 1000 filestore xattr use omap = true mon host = ceph-mon.dkctl mon addr = ceph-mon.dkctl log file = /data/logs/ceph.log [mon] mon data = /data/mon keyring = /data/ceph.mon.keyring log file = /data/logs/mon.log [osd.0] osd host = osd0 osd data = /data/osd osd journal = /data/osd.journal log file = /data/logs/osd.log keyring = /data/ceph.osd.keyring started ceph-mon: /usr/local/bin/ceph-mon -c /ceph.conf --public-addr `grep ceph-mon /etc/hosts | awk '{print $1}'` -i ceph-mon.dkctl After that following commands crushed ceph-mon daemon: root@ceph-mon:/# ceph osd crush add-bucket osd-host host added bucket osd-host type host to crush map root@ceph-mon:/# ceph osd crush move osd-host root=default moved item id -2 name 'osd-host' to location {root=default} in crush map root@ceph-mon:/# ceph osd crush add-bucket osd.0 osd added bucket osd.0 type osd to crush map root@ceph-mon:/# ceph osd tree # id weight type name up/down reweight -3 0 osd osd.0 -1 0 root default -2 0 host osd-host root@ceph-mon:/# ceph osd crush move osd.0 host=osd-host 2014-02-18 16:00:14.093243 7ff077fff700 0 monclient: hunting for new mon 2014-02-18 16:00:14.093781 7ff07c130700 0 -- 172.17.0.160:0/1000148 >> 172.17.0.160:6789/0 pipe(0x7ff06c004770 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff06c0049d0).fault 2014-02-18 16:00:16.996981 7ff07c231700 0 -- 172.17.0.160:0/1000148 >> 172.17.0.160:6789/0 pipe(0x7ff060000c00 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff060000e60).fault 2014-02-18 16:00:19.998108 7ff07c130700 0 -- 172.17.0.160:0/1000148 >> 172.17.0.160:6789/0 pipe(0x7ff060003010 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff060001e70).fault Log file of ceph mon shows: *** Caught signal (Segmentation fault) ** in thread 7f09109dd700 ceph version 0.75 (946d60369589d6a269938edd65c0a6a7b1c3ef5c) 1: /usr/local/bin/ceph-mon() [0x83457e] 2: (()+0xf210) [0x7f0915772210] 3: /usr/local/bin/ceph-mon() [0x7c398a] 4: /usr/local/bin/ceph-mon() [0x7c3c9c] 5: /usr/local/bin/ceph-mon() [0x7c3d31] 6: (crush_do_rule()+0x20a) [0x7c448a] 7: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >&) const+0xdd) [0x725add] 8: (OSDMap::pg_to_acting_osds(pg_t, std::vector<int, std::allocator<int> >&) const+0x81) [0x725da1] 9: (PGMonitor::map_pg_creates()+0x15f) [0x610abf] 10: (PGMonitor::post_paxos_update()+0x25) [0x611205] 11: (Monitor::refresh_from_paxos(bool*)+0x95) [0x543205] 12: (Paxos::do_refresh()+0x24) [0x590c24] 13: (Paxos::begin(ceph::buffer::list&)+0x99e) [0x59b54e] 14: (Paxos::propose_queued()+0xdd) [0x59b92d] 15: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x150) [0x59ca30] 16: (PaxosService::propose_pending()+0x6d9) [0x5a3099] 17: (PaxosService::dispatch(PaxosServiceMessage*)+0xd77) [0x5a4347] 18: (Monitor::handle_command(MMonCommand*)+0x1073) [0x56e253] 19: (Monitor::dispatch(MonSession*, Message*, bool)+0x2e8) [0x571168] 20: (Monitor::_ms_dispatch(Message*)+0x1e4) [0x571774] 21: (Monitor::ms_dispatch(Message*)+0x20) [0x590050] 22: (DispatchQueue::entry()+0x56a) [0x80a65a] 23: (DispatchQueue::DispatchThread::entry()+0xd) [0x73e75d] 24: (()+0x7e0e) [0x7f091576ae0e] 25: (clone()+0x6d) [0x7f0913d1c0fd] 2014-02-18 16:00:14.088851 7f09109dd700 -1 *** Caught signal (Segmentation fault ) ** in thread 7f09109dd700 ceph version 0.75 (946d60369589d6a269938edd65c0a6a7b1c3ef5c) 1: /usr/local/bin/ceph-mon() [0x83457e] 2: (()+0xf210) [0x7f0915772210] 3: /usr/local/bin/ceph-mon() [0x7c398a] 4: /usr/local/bin/ceph-mon() [0x7c3c9c] 5: /usr/local/bin/ceph-mon() [0x7c3d31] 6: (crush_do_rule()+0x20a) [0x7c448a] 7: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator <int> >&) const+0xdd) [0x725add] 8: (OSDMap::pg_to_acting_osds(pg_t, std::vector<int, std::allocator<int> >&) co nst+0x81) [0x725da1] 9: (PGMonitor::map_pg_creates()+0x15f) [0x610abf] 10: (PGMonitor::post_paxos_update()+0x25) [0x611205] 11: (Monitor::refresh_from_paxos(bool*)+0x95) [0x543205] 12: (Paxos::do_refresh()+0x24) [0x590c24] 13: (Paxos::begin(ceph::buffer::list&)+0x99e) [0x59b54e] 14: (Paxos::propose_queued()+0xdd) [0x59b92d] 15: (Paxos::propose_new_value(ceph::buffer::list&, Context*)+0x150) [0x59ca30] 16: (PaxosService::propose_pending()+0x6d9) [0x5a3099] 17: (PaxosService::dispatch(PaxosServiceMessage*)+0xd77) [0x5a4347] 18: (Monitor::handle_command(MMonCommand*)+0x1073) [0x56e253] 19: (Monitor::dispatch(MonSession*, Message*, bool)+0x2e8) [0x571168] 20: (Monitor::_ms_dispatch(Message*)+0x1e4) [0x571774] 21: (Monitor::ms_dispatch(Message*)+0x20) [0x590050] 22: (DispatchQueue::entry()+0x56a) [0x80a65a] 23: (DispatchQueue::DispatchThread::entry()+0xd) [0x73e75d] 24: (()+0x7e0e) [0x7f091576ae0e] 25: (clone()+0x6d) [0x7f0913d1c0fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to int erpret this. --- begin dump of recent events --- -395> 2014-02-18 15:59:09.388974 7f0915dfb7c0 5 asok(0x354af50) register_command perfcounters_dump hook 0x3542010 -394> 2014-02-18 15:59:09.389006 7f0915dfb7c0 5 asok(0x354af50) register_command 1 hook 0x3542010 -393> 2014-02-18 15:59:09.389011 7f0915dfb7c0 5 asok(0x354af50) register_command perf dump hook 0x3542010 -392> 2014-02-18 15:59:09.389016 7f0915dfb7c0 5 asok(0x354af50) register_command perfcounters_schema hook 0x3542010 -391> 2014-02-18 15:59:09.389020 7f0915dfb7c0 5 asok(0x354af50) register_command 2 hook 0x3542010 -390> 2014-02-18 15:59:09.389021 7f0915dfb7c0 5 asok(0x354af50) register_command perf schema hook 0x3542010 -389> 2014-02-18 15:59:09.389023 7f0915dfb7c0 5 asok(0x354af50) register_command config show hook 0x3542010 -388> 2014-02-18 15:59:09.389028 7f0915dfb7c0 5 asok(0x354af50) register_command config set hook 0x3542010 -387> 2014-02-18 15:59:09.389029 7f0915dfb7c0 5 asok(0x354af50) register_command config get hook 0x3542010 -386> 2014-02-18 15:59:09.389031 7f0915dfb7c0 5 asok(0x354af50) register_command log flush hook 0x3542010 -385> 2014-02-18 15:59:09.389035 7f0915dfb7c0 5 asok(0x354af50) register_command log dump hook 0x3542010 -384> 2014-02-18 15:59:09.389037 7f0915dfb7c0 5 asok(0x354af50) register_command log reopen hook 0x3542010 -383> 2014-02-18 15:59:09.390539 7f0915dfb7c0 0 ceph version 0.75 (946d60369589d6a269938edd65c0a6a7b1c3ef5c), process ceph-mon, pid 6 -382> 2014-02-18 15:59:09.390870 7f0915dfb7c0 5 asok(0x354af50) init /var/run/ceph/ceph-mon.ceph-mon.dkctl.asok -381> 2014-02-18 15:59:09.390898 7f0915dfb7c0 5 asok(0x354af50) bind_and_listen /var/run/ceph/ceph-mon.ceph-mon.dkctl.asok -380> 2014-02-18 15:59:09.391018 7f0915dfb7c0 5 asok(0x354af50) register_command 0 hook 0x353e038 -379> 2014-02-18 15:59:09.391043 7f0915dfb7c0 5 asok(0x354af50) register_command version hook 0x353e038 -378> 2014-02-18 15:59:09.391046 7f0915dfb7c0 5 asok(0x354af50) register_command git_version hook 0x353e038 -377> 2014-02-18 15:59:09.391049 7f0915dfb7c0 5 asok(0x354af50) register_command help hook 0x3542050 -376> 2014-02-18 15:59:09.391051 7f0915dfb7c0 5 asok(0x354af50) register_command get_command_descriptions hook 0x3542040 -375> 2014-02-18 15:59:09.391104 7f09121e0700 5 asok(0x354af50) entry start -374> 2014-02-18 15:59:09.459305 7f0915dfb7c0 1 -- 172.17.0.160:6789/0 learned my addr 172.17.0.160:6789/0 -373> 2014-02-18 15:59:09.459333 7f0915dfb7c0 1 accepter.accepter.bind my_inst.addr is 172.17.0.160:6789/0 need_addr=0 -372> 2014-02-18 15:59:09.459359 7f0915dfb7c0 5 adding auth protocol: cephx -371> 2014-02-18 15:59:09.459363 7f0915dfb7c0 5 adding auth protocol: cephx -370> 2014-02-18 15:59:09.459451 7f0915dfb7c0 1 mon.ceph-mon.dkctl@-1(probing) e1 preinit fsid e90dfd37-98d1-45bb-a847-8590a5ed8e71 -369> 2014-02-18 15:59:09.459512 7f0915dfb7c0 1 mon.ceph-mon.dkctl@-1(probing) e1 initial_members ceph-mon.dkctl, filtering seed monmap -368> 2014-02-18 15:59:09.459524 7f0915dfb7c0 1 keeping ceph-mon.dkctl 172.17.0.160:6789/0 -367> 2014-02-18 15:59:09.459812 7f0915dfb7c0 2 auth: KeyRing::load: loaded key file /data/mon/keyring -366> 2014-02-18 15:59:09.459832 7f0915dfb7c0 5 asok(0x354af50) register_command mon_status hook 0x35420e0 -365> 2014-02-18 15:59:09.459838 7f0915dfb7c0 5 asok(0x354af50) register_command quorum_status hook 0x35420e0 -364> 2014-02-18 15:59:09.459840 7f0915dfb7c0 5 asok(0x354af50) register_command sync_force hook 0x35420e0 -363> 2014-02-18 15:59:09.459842 7f0915dfb7c0 5 asok(0x354af50) register_command add_bootstrap_peer_hint hook 0x35420e0 -362> 2014-02-18 15:59:09.459844 7f0915dfb7c0 5 asok(0x354af50) register_command quorum enter hook 0x35420e0 -361> 2014-02-18 15:59:09.459845 7f0915dfb7c0 5 asok(0x354af50) register_command quorum exit hook 0x35420e0 -360> 2014-02-18 15:59:09.459851 7f0915dfb7c0 1 -- 172.17.0.160:6789/0 messenger.start -359> 2014-02-18 15:59:09.459917 7f0915dfb7c0 2 mon.ceph-mon.dkctl@-1(probing) e1 init -358> 2014-02-18 15:59:09.459979 7f0915dfb7c0 1 accepter.accepter.start -357> 2014-02-18 15:59:09.460029 7f0915dfb7c0 0 mon.ceph-mon.dkctl@-1(probing) e1 my rank is now 0 (was -1) -356> 2014-02-18 15:59:09.460033 7f0915dfb7c0 1 -- 172.17.0.160:6789/0 mark_down_all -355> 2014-02-18 15:59:09.460045 7f0915dfb7c0 1 mon.ceph-mon.dkctl@0(probing) e1 win_standalone_election -354> 2014-02-18 15:59:09.482424 7f0915dfb7c0 0 log [INF] : mon.ceph-mon.dkctl@0 won leader election with quorum 0 -353> 2014-02-18 15:59:09.482450 7f0915dfb7c0 10 send_log to self -352> 2014-02-18 15:59:09.482453 7f0915dfb7c0 10 log_queue is 1 last_log 1 sent 0 num 1 unsent 1 sending 1 -351> 2014-02-18 15:59:09.482457 7f0915dfb7c0 10 will send 2014-02-18 15:59:09.482449 mon.0 172.17.0.160:6789/0 1 : [INF] mon.ceph-mon.dkctl@0 won leader election with quorum 0 -350> 2014-02-18 15:59:09.482491 7f0915dfb7c0 1 -- 172.17.0.160:6789/0 --> mon.0 172.17.0.160:6789/0 -- log(1 entries) v1 -- ?+0 0x35866c0 -349> 2014-02-18 15:59:09.482564 7f09109dd700 1 -- 172.17.0.160:6789/0 <== mon.0 172.17.0.160:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x35866c0 con 0x359a420 -348> 2014-02-18 15:59:09.482598 7f0915dfb7c0 5 mon.ceph-mon.dkctl@0(leader).paxos(paxos active c 0..0) queue_proposal bl 398 bytes; ctx = 0x35420c0 -347> 2014-02-18 15:59:09.530752 7f0915dfb7c0 0 log [INF] : pgmap v1: 0 pgs: ; 0 bytes data, 0 kB used, 0 kB / 0 kB avail -346> 2014-02-18 15:59:09.530776 7f0915dfb7c0 10 send_log to self -345> 2014-02-18 15:59:09.530778 7f0915dfb7c0 10 log_queue is 2 last_log 2 sent 1 num 2 unsent 1 sending 1 -344> 2014-02-18 15:59:09.530781 7f0915dfb7c0 10 will send 2014-02-18 15:59:09.482449 mon.0 172.17.0.160:6789/0 1 : [INF] mon.ceph-mon.dkctl@0 won leader election with quorum 0 -343> 2014-02-18 15:59:09.530808 7f0915dfb7c0 1 -- 172.17.0.160:6789/0 --> mon.0 172.17.0.160:6789/0 -- log(1 entries) v1 -- ?+0 0x3586d80 -342> 2014-02-18 15:59:09.530898 7f0915dfb7c0 5 mon.ceph-mon.dkctl@0(leader).paxos(paxos active c 1..1) queue_proposal bl 477 bytes; ctx = 0x35420c0 -341> 2014-02-18 15:59:09.578860 7f0915dfb7c0 4 mon.ceph-mon.dkctl@0(leader).mds e1 new map -340> 2014-02-18 15:59:09.578888 7f0915dfb7c0 0 mon.ceph-mon.dkctl@0(leader).mds e1 print_map
I'm just speculating about the actual crash issue.
Updated by Sage Weil about 10 years ago
- Status changed from New to Fix Under Review
The problem is adding a bucket of type 'osd'; fixing the mon to error out at that stage. Tested this with a normal type and all is well.
Note, I did not see your crash on firefly, but I am too lazy to build 0.75 and confirm the problem is there. There were several CRUSH changes since then tho so that is not terribly surprising.
Updated by Sage Weil about 10 years ago
- Status changed from Fix Under Review to Resolved
Actions