Project

General

Profile

Bug #11680

mon crashes when "ceph osd tree 85 --format json"

Added by Kefu Chai about 5 years ago. Updated almost 5 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
firefly,hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: /usr/bin/ceph-mon() [0x9017e2]
 2: (()+0xf130) [0x7f30fab97130]
 3: (gsignal()+0x37) [0x7f30f95b15d7]
 4: (abort()+0x148) [0x7f30f95b2cc8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f30f9eb59b5]
 6: (()+0x5e926) [0x7f30f9eb3926]
 7: (()+0x5e953) [0x7f30f9eb3953]
 8: (()+0x5eb73) [0x7f30f9eb3b73]
 9: (std::__throw_logic_error(char const*)+0x77) [0x7f30f9f08717]
 10: (char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag)+0xa1) [0x7f30f9f14561]
 11: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)+0x38) [0x7f30f9f14918]
 12: (CrushTreeDumper::dump_item_fields(CrushWrapper const*, CrushTreeDumper::Item const&, ceph::Formatter*)+0xb9) [0x6175c9]
 13: (OSDMap::print_tree(std::ostream*, ceph::Formatter*) const+0x10d9) [0x787ab9]
 14: (OSDMonitor::preprocess_command(MMonCommand*)+0xe55) [0x60a0b5]
 15: (OSDMonitor::preprocess_query(PaxosServiceMessage*)+0x20b) [0x60f75b]
 16: (PaxosService::dispatch(PaxosServiceMessage*)+0x833) [0x5cacd3]
 17: (Monitor::handle_command(MMonCommand*)+0x147c) [0x591a4c]
 18: (Monitor::dispatch(MonSession*, Message*, bool)+0xf9) [0x594c89]
 19: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x595936]
 20: (Monitor::ms_dispatch(Message*)+0x23) [0x5b5403]
 21: (DispatchQueue::entry()+0x64a) [0x8a1d9a]
 22: (DispatchQueue::DispatchThread::entry()+0xd) [0x79bd9d]
 23: (()+0x7df5) [0x7f30fab8fdf5]
 24: (clone()+0x6d) [0x7f30f96721ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

the most recent log messages

  -25> 2015-05-18 06:20:09.143639 7f09817a0700 10 log_client  log_queue is 5 last_log 24 sent 23 num 5 unsent 1 sending 1
   -24> 2015-05-18 06:20:09.143642 7f09817a0700 10 log_client  will send 2015-05-18 06:20:09.143637 mon.0 10.8.128.33:6789/0 24 : audit [DBG] from='client.? 10.8.128.33:0/1004078' entity='client.admin' cmd=[{"prefix": "status", "format": "json"}]: dispatch
   -23> 2015-05-18 06:20:09.143652 7f09817a0700  1 -- 10.8.128.33:6789/0 --> 10.8.128.33:6789/0 -- log(1 entries) v1 -- ?+0 0x47f2400 con 0x4609600
   -22> 2015-05-18 06:20:09.143795 7f09817a0700  1 -- 10.8.128.33:6789/0 --> 10.8.128.33:0/1004078 -- mon_command_ack([{"prefix": "status", "format": "json"}]=0  v0) v1 -- ?+1108 0x466b000 con 0x460aaa0
   -21> 2015-05-18 06:20:09.143817 7f09817a0700  1 -- 10.8.128.33:6789/0 <== mon.0 10.8.128.33:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x47f2400 con 0x4609600
   -20> 2015-05-18 06:20:09.143827 7f09817a0700  5 mon.Mon@0(leader).paxos(paxos active c 78062..78631) is_readable = 1 - now=2015-05-18 06:20:09.143828 lease_expire=0.000000 has v0 lc 78631
   -19> 2015-05-18 06:20:09.145454 7f09817a0700  1 -- 10.8.128.33:6789/0 <== client.184098 10.8.128.33:0/1004078 8 ==== mon_command({"prefix": "osd dump", "format": "json"} v 0) v1 ==== 82+0+0 (3917390200 0 0) 0x466a000 con 0x460aaa0
   -18> 2015-05-18 06:20:09.145494 7f09817a0700  0 mon.Mon@0(leader) e1 handle_command mon_command({"prefix": "osd dump", "format": "json"} v 0) v1
   -17> 2015-05-18 06:20:09.145520 7f09817a0700  0 log_channel(audit) log [DBG] : from='client.? 10.8.128.33:0/1004078' entity='client.admin' cmd=[{"prefix": "osd dump", "format": "json"}]: dispatch
   -16> 2015-05-18 06:20:09.145522 7f09817a0700 10 log_client _send_to_monlog to self
   -15> 2015-05-18 06:20:09.145523 7f09817a0700 10 log_client  log_queue is 6 last_log 25 sent 24 num 6 unsent 1 sending 1
   -14> 2015-05-18 06:20:09.145527 7f09817a0700 10 log_client  will send 2015-05-18 06:20:09.145522 mon.0 10.8.128.33:6789/0 25 : audit [DBG] from='client.? 10.8.128.33:0/1004078' entity='client.admin' cmd=[{"prefix": "osd dump", "format": "json"}]: dispatch
   -13> 2015-05-18 06:20:09.145535 7f09817a0700  1 -- 10.8.128.33:6789/0 --> 10.8.128.33:6789/0 -- log(1 entries) v1 -- ?+0 0x47f6540 con 0x4609600
   -12> 2015-05-18 06:20:09.145546 7f09817a0700  5 mon.Mon@0(leader).paxos(paxos active c 78062..78631) is_readable = 1 - now=2015-05-18 06:20:09.145547 lease_expire=0.000000 has v0 lc 78631
   -11> 2015-05-18 06:20:09.145779 7f09817a0700  1 -- 10.8.128.33:6789/0 --> 10.8.128.33:0/1004078 -- mon_command_ack([{"prefix": "osd dump", "format": "json"}]=0  v85) v1 -- ?+3436 0x4669800 con 0x460aaa0
   -10> 2015-05-18 06:20:09.145805 7f09817a0700  1 -- 10.8.128.33:6789/0 <== mon.0 10.8.128.33:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x47f6540 con 0x4609600
    -9> 2015-05-18 06:20:09.145814 7f09817a0700  5 mon.Mon@0(leader).paxos(paxos active c 78062..78631) is_readable = 1 - now=2015-05-18 06:20:09.145815 lease_expire=0.000000 has v0 lc 78631
    -8> 2015-05-18 06:20:09.147249 7f09817a0700  1 -- 10.8.128.33:6789/0 <== client.184098 10.8.128.33:0/1004078 9 ==== mon_command({"prefix": "osd tree", "epoch": 85, "format": "json"} v 0) v1 ==== 95+0+0 (942654298 0 0) 0x466b800 con 0x460aaa0
    -7> 2015-05-18 06:20:09.147309 7f09817a0700  0 mon.Mon@0(leader) e1 handle_command mon_command({"prefix": "osd tree", "epoch": 85, "format": "json"} v 0) v1
    -6> 2015-05-18 06:20:09.147337 7f09817a0700  0 log_channel(audit) log [DBG] : from='client.? 10.8.128.33:0/1004078' entity='client.admin' cmd=[{"prefix": "osd tree", "epoch": 85, "format": "json"}]: dispatch
    -5> 2015-05-18 06:20:09.147343 7f09817a0700 10 log_client _send_to_monlog to self
    -4> 2015-05-18 06:20:09.147345 7f09817a0700 10 log_client  log_queue is 7 last_log 26 sent 25 num 7 unsent 1 sending 1
    -3> 2015-05-18 06:20:09.147347 7f09817a0700 10 log_client  will send 2015-05-18 06:20:09.147343 mon.0 10.8.128.33:6789/0 26 : audit [DBG] from='client.? 10.8.128.33:0/1004078' entity='client.admin' cmd=[{"prefix": "osd tree", "epoch": 85, "format": "json"}]: dispatch
    -2> 2015-05-18 06:20:09.147355 7f09817a0700  1 -- 10.8.128.33:6789/0 --> 10.8.128.33:6789/0 -- log(1 entries) v1 -- ?+0 0x47f3cc0 con 0x4609600
    -1> 2015-05-18 06:20:09.147365 7f09817a0700  5 mon.Mon@0(leader).paxos(paxos active c 78062..78631) is_readable = 1 - now=2015-05-18 06:20:09.147366 lease_expire=0.000000 has v0 lc 78631
     0> 2015-05-18 06:20:09.149437 7f09817a0700 -1 *** Caught signal (Aborted) **
 in thread 7f09817a0700

originally reported at https://bugzilla.redhat.com/show_bug.cgi?id=1222509

file_1222509.log.txt View - /var/log/ceph/ceph-mon.Mon.log (83 KB) Kefu Chai, 05/19/2015 05:13 AM

ceph-mon.tgz - /var/lib/ceph/mon/ (425 KB) Kefu Chai, 05/19/2015 05:13 AM

1029402.log.gz - mon log after a modified crush map is posted (187 KB) Kefu Chai, 05/25/2015 04:32 PM

1029386.log.gz - mon log after an empty crush map of "#begin #end" is consumed (181 KB) Kefu Chai, 05/25/2015 04:32 PM

CalAPITester.py View - the python script used to send crush map (5.46 KB) Kefu Chai, 05/25/2015 04:32 PM


Related issues

Related to Ceph - Feature #11815: mon: allow injecting new crushmap Resolved 05/29/2015
Duplicated by Ceph - Bug #11757: Failure to load the osdmap Duplicate 05/26/2015
Duplicated by Ceph - Bug #12047: monitor segmentation fault on faulty crushmap Duplicate 06/17/2015
Duplicated by Ceph - Bug #12876: monitor crashed in CrushWrapper::do_rule() Duplicate 08/31/2015
Copied to Ceph - Backport #11975: mon crashes when "ceph osd tree 85 --format json" Resolved 05/19/2015
Copied to Ceph - Backport #11977: mon crashes when "ceph osd tree 85 --format json" Rejected 05/19/2015

Associated revisions

Revision a955f36a (diff)
Added by Kefu Chai about 5 years ago

mon: validate new crush for unknown names

  • the "osd tree dump" command enumerates all buckets/osds found in either the
    crush map or the osd map. but the newly set crushmap is not validated for
    the dangling references, so we need to check to see if any item in new crush
    map is referencing unknown type/name when a new crush map is sent to
    monitor, reject it if any.

Fixes: #11680
Signed-off-by: Kefu Chai <>

Revision 22e6bd6e (diff)
Added by Kefu Chai about 5 years ago

mon: check the new crush map against osdmap.max_osd

Fixes: #11680
Signed-off-by: Kefu Chai <>

Revision c6e63487 (diff)
Added by Kefu Chai about 5 years ago

mon: add "--check" to CrushTester::test_with_crushtool()

so we don't need to call CrushTester::check_name_maps() in OSDMonitor.cc
anymore.

Fixes: #11680
Signed-off-by: Kefu Chai <>

Revision cc1cc033 (diff)
Added by Kefu Chai about 5 years ago

mon: validate new crush for unknown names

  • the "osd tree dump" command enumerates all buckets/osds found in either the
    crush map or the osd map. but the newly set crushmap is not validated for
    the dangling references, so we need to check to see if any item in new crush
    map is referencing unknown type/name when a new crush map is sent to
    monitor, reject it if any.

Fixes: #11680
Signed-off-by: Kefu Chai <>
(cherry picked from commit a955f36a509e5412b1f72632a1a956d99e768e35)

Revision 2a8fe886 (diff)
Added by Kefu Chai about 5 years ago

mon: check the new crush map against osdmap.max_osd

Fixes: #11680
Signed-off-by: Kefu Chai <>
(cherry picked from commit 22e6bd6e01d5df3f3e897562597e22ca1737f8c8)

Revision 51413011 (diff)
Added by Kefu Chai about 5 years ago

mon: add "--check" to CrushTester::test_with_crushtool()

so we don't need to call CrushTester::check_name_maps() in OSDMonitor.cc
anymore.

Fixes: #11680
Signed-off-by: Kefu Chai <>
(cherry picked from commit c6e634875316cf17368d497e6dc4f6f4b5dd65d2)

History

#1 Updated by Kefu Chai about 5 years ago

#2 Updated by Kefu Chai about 5 years ago

  • Description updated (diff)

#3 Updated by Kefu Chai about 5 years ago

  • Description updated (diff)

#4 Updated by Kefu Chai about 5 years ago

and i am able to dump the osd map using

ceph-monstore-tool /var/lib/ceph/mon/ceph-Mon get osdmap -- --out /tmp/osdmap.85
osdmaptool /tmp/osdmap.85 --print --tree --dump-json

the output is attached:

epoch 85
fsid 283bbd8b-9205-4cfc-83fa-80546ba2fe36
created 2015-05-06 15:39:42.874568
modified 2015-05-18 06:20:07.885802
flags

pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0

max_osd 4
osd.0 down in  weight 1 up_from 78 up_thru 78 down_at 85 last_clean_interval [46,72) 10.8.128.40:6800/25974 10.8.128.40:6804/25974 10.8.128.40:6808/25974 10.8.128.40:6809/25974 exists 4d27fc96-3a2f-478f-bca5-a4e55ec7e53c
osd.1 up   in  weight 1 up_from 69 up_thru 80 down_at 63 last_clean_interval [8,62) 10.8.128.86:6800/19549 10.8.128.86:6801/19549 10.8.128.86:6802/19549 10.8.128.86:6803/19549 exists,up 61f9fbec-687e-4d89-8900-fc7d9ff5f88a
osd.2 up   in  weight 1 up_from 84 up_thru 80 down_at 83 last_clean_interval [80,83) 10.8.128.40:6810/26274 10.8.128.40:6801/1026274 10.8.128.40:6802/1026274 10.8.128.40:6803/1026274 exists,up 7823abe2-208c-4381-9020-54b6bbf11b07
osd.3 down in  weight 1 up_from 71 up_thru 80 down_at 85 last_clean_interval [17,65) 10.8.128.86:6804/19849 10.8.128.86:6805/19849 10.8.128.86:6808/19849 10.8.128.86:6809/19849 exists 8551c2bb-82ed-4b15-ad6e-305e6d560b59

{
    "epoch": 85,
    "fsid": "283bbd8b-9205-4cfc-83fa-80546ba2fe36",
    "created": "2015-05-06 15:39:42.874568",
    "modified": "2015-05-18 06:20:07.885802",
    "flags": "",
    "cluster_snapshot": "",
    "pool_max": 0,
    "max_osd": 4,
    "pools": [
        {
            "pool": 0,
            "pool_name": "rbd",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 2,
            "min_size": 1,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 64,
            "pg_placement_num": 64,
            "crash_replay_interval": 0,
            "last_change": "1",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 0,
            "cache_target_full_ratio_micro": 0,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none" 
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 0,
            "stripe_width": 0,
            "expected_num_objects": 0
        }
    ],
    "osds": [
        {
            "osd": 0,
            "uuid": "4d27fc96-3a2f-478f-bca5-a4e55ec7e53c",
            "up": 0,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 46,
            "last_clean_end": 72,
            "up_from": 78,
            "up_thru": 78,
            "down_at": 85,
            "lost_at": 0,
            "public_addr": "10.8.128.40:6800\/25974",
            "cluster_addr": "10.8.128.40:6804\/25974",
            "heartbeat_back_addr": "10.8.128.40:6808\/25974",
            "heartbeat_front_addr": "10.8.128.40:6809\/25974",
            "state": [
                "exists" 
            ]
        },
        {
            "osd": 1,
            "uuid": "61f9fbec-687e-4d89-8900-fc7d9ff5f88a",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 8,
            "last_clean_end": 62,
            "up_from": 69,
            "up_thru": 80,
            "down_at": 63,
            "lost_at": 0,
            "public_addr": "10.8.128.86:6800\/19549",
            "cluster_addr": "10.8.128.86:6801\/19549",
            "heartbeat_back_addr": "10.8.128.86:6802\/19549",
            "heartbeat_front_addr": "10.8.128.86:6803\/19549",
            "state": [
                "exists",
                "up" 
            ]
        },
        {
            "osd": 2,
            "uuid": "7823abe2-208c-4381-9020-54b6bbf11b07",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 80,
            "last_clean_end": 83,
            "up_from": 84,
            "up_thru": 80,
            "down_at": 83,
            "lost_at": 0,
            "public_addr": "10.8.128.40:6810\/26274",
            "cluster_addr": "10.8.128.40:6801\/1026274",
            "heartbeat_back_addr": "10.8.128.40:6802\/1026274",
            "heartbeat_front_addr": "10.8.128.40:6803\/1026274",
            "state": [
                "exists",
                "up" 
            ]
        },
        {
            "osd": 3,
            "uuid": "8551c2bb-82ed-4b15-ad6e-305e6d560b59",
            "up": 0,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 17,
            "last_clean_end": 65,
            "up_from": 71,
            "up_thru": 80,
            "down_at": 85,
            "lost_at": 0,
            "public_addr": "10.8.128.86:6804\/19849",
            "cluster_addr": "10.8.128.86:6805\/19849",
            "heartbeat_back_addr": "10.8.128.86:6808\/19849",
            "heartbeat_front_addr": "10.8.128.86:6809\/19849",
            "state": [
                "exists" 
            ]
        }
    ],
    "osd_xinfo": [
        {
            "osd": 0,
            "down_stamp": "2015-05-18 06:20:07.885802",
            "laggy_probability": 0.249900,
            "laggy_interval": 47,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 1,
            "down_stamp": "2015-05-14 04:58:18.937390",
            "laggy_probability": 0.357000,
            "laggy_interval": 933,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 2,
            "down_stamp": "2015-05-18 05:32:05.357656",
            "laggy_probability": 0.474930,
            "laggy_interval": 8,
            "features": 1125899906842623,
            "old_weight": 0
        },
        {
            "osd": 3,
            "down_stamp": "2015-05-18 06:20:07.885802",
            "laggy_probability": 0.357000,
            "laggy_interval": 928,
            "features": 1125899906842623,
            "old_weight": 65536
        }
    ],
    "pg_temp": [],
    "primary_temp": [],
    "blacklist": [],
    "erasure_code_profiles": {
        "default": {
            "directory": "\/usr\/lib64\/ceph\/erasure-code",
            "k": "2",
            "m": "1",
            "plugin": "jerasure",
            "technique": "reed_sol_van" 
        }
    }
}
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
 0      0 osd.0        down  1.00000          1.00000
 1      0 osd.1          up  1.00000          1.00000
 2      0 osd.2          up  1.00000          1.00000
 3      0 osd.3        down  1.00000          1.00000

#5 Updated by Joao Eduardo Luis about 5 years ago

kefu, we need logs from osdmap epoch 81 to 82. Last working crushmap is from epoch 81, all subsequent maps are broken.

#6 Updated by Joao Eduardo Luis about 5 years ago

Right now, the only way I can easily reproduce this is by setting an empty crushmap with 'ceph osd setcrushmap -i some-empty.crushmap'.

I have no idea whether this was deliberately performed or if this is was an unfortunate side effect of some operation.

#7 Updated by Kefu Chai about 5 years ago

diff --git a/src/tools/osdmaptool.cc b/src/tools/osdmaptool.cc
index 14331c3..a628454 100644
--- a/src/tools/osdmaptool.cc
+++ b/src/tools/osdmaptool.cc
@@ -465,9 +465,14 @@ int main(int argc, const char **argv)
     osdmap.print(cout);
   if (print_json)
     osdmap.dump_json(cout);
-  if (tree)
-    osdmap.print_tree(&cout, NULL);
-
+  if (tree) {
+    boost::scoped_ptr<Formatter> f(Formatter::create("json"));
+    f->open_object_section("tree");
+    osdmap.print_tree(NULL, f.get());
+    f->close_section();
+    f->flush(cout);
+    // osdmap.print_tree(&cout, NULL);
+  }
   if (modified) {
     bl.clear();
     osdmap.encode(bl, CEPH_FEATURES_SUPPORTED_DEFAULT | CEPH_FEATURE_RESERVED);

and with the patch above. we are able to reproduce the crash using

./osdmaptool /tmp/osdmap.85 --tree # in which, /tmp/osdmap.85 is the osdmap extracted using ceph-monstore-tool

#8 Updated by Kefu Chai about 5 years ago

yes, "osd setcrushmap" was issued before we ran into this issue.

$ zgrep setcrushmap *
ceph.audit.log-20150519.gz:2015-05-18 05:14:12.613798 mon.0 10.8.128.33:6789/0 224816 : audit [INF] from='client.? 10.8.128.33:0/1023292' entity='client.admin' cmd=[{"prefix": "osd setcrushmap"}]: dispatch
ceph.audit.log-20150519.gz:2015-05-18 05:14:12.613798 mon.0 10.8.128.33:6789/0 224816 : audit [INF] from='client.? 10.8.128.33:0/1023292' entity='client.admin' cmd=[{"prefix": "osd setcrushmap"}]: dispatch

#9 Updated by Joao Eduardo Luis about 5 years ago

Yeah, I noticed that a while ago. The annoying thing is that I can't find logs for the period when the osdmap moves from epoch 81 (the last correctly formed crushmap) to epoch 82 (first wrong map). The 'ceph osd setcrushmap' commands do fall in this gap, so I'm assuming this was indeed an incorrectly formed crushmap that was pushed to the cluster -- and by 'incorrectly formed', in this case, I mean an empty map.

I think the real bug here is that we are allowing setting an empty map without requiring the user to be really sure about that.

#10 Updated by Kefu Chai about 5 years ago

echoing joao

(gdb) p *this->crush
$9 = {mapper_lock = {name = 0xb298b8 "CrushWrapper::mapper_lock", id = -1, recursive = false, lockdep = true,
    backtrace = false, _m = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2,
        __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
      __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>, __align = 0}, nlock = 0,
    locked_by = 0, cct = 0x0, logger = 0x0}, type_map = std::map with 0 elements,
  name_map = std::map with 0 elements, rule_name_map = std::map with 0 elements, crush = 0x2cfeff0,
  have_rmaps = false, type_rmap = std::map with 0 elements, name_rmap = std::map with 0 elements,
  rule_name_rmap = std::map with 0 elements}

so in osdmap#85, we have 4 OSDs. but since the crushmap is empty, crush->get_typename(0) returns NULL, hence the crash.

I think the real bug here is that we are allowing setting an empty map without requiring the user to be really sure about that.

we perform some sanity tests when handling "osd setcrushmap" or "osd crush set". but they failed to detect a crush map which not able to cover all OSDs in the osdmap.

will add a test for it.

#11 Updated by Kefu Chai about 5 years ago

  • Status changed from New to Fix Under Review

#12 Updated by Kefu Chai about 5 years ago

#13 Updated by Kefu Chai about 5 years ago

in http://tracker.ceph.com/attachments/download/1716/1029402.log.gz, we have

    -1> 2015-05-25 05:33:57.741951 7ffe076b2700  5 mon.Mon@0(leader).paxos(paxos active c 1..531) is_readable = 1 - now=2015-05-25 05:33:57.741956 lease_expire=0.000000 has v0 lc 531
     0> 2015-05-25 05:33:57.748845 7ffe076b2700 -1 osd/OSDMap.h: In function 'unsigned int OSDMap::get_weight(int) const' thread 7ffe076b2700 time 2015-05-25 05:33:57.742429
osd/OSDMap.h: 374: FAILED assert(o < max_osd)

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7b3a95]
 2: /usr/bin/ceph-mon() [0x773223]
 3: (OSDMap::print_tree(std::ostream*, ceph::Formatter*) const+0x192a) [0x78835a]
 4: (OSDMonitor::preprocess_command(MMonCommand*)+0xe55) [0x60a105]
 5: (OSDMonitor::preprocess_query(PaxosServiceMessage*)+0x20b) [0x60f7ab]
 6: (PaxosService::dispatch(PaxosServiceMessage*)+0x833) [0x5cad23]
 7: (Monitor::handle_command(MMonCommand*)+0x147c) [0x591a9c]
 8: (Monitor::dispatch(MonSession*, Message*, bool)+0xf9) [0x594cd9]
 9: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x595986]

where the crush map possesses some OSD which is missing in osdmap.

BTW, per http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ and http://ceph.com/docs/master/install/manual-deployment/. the suggested practice of adding an OSD manually is:

  1. add the bucket hosting the osd to crush map
  2. add the OSD to the host
  3. start the OSD instance

#14 Updated by Kefu Chai about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to firefly,hammer

#15 Updated by Nathan Cutler about 5 years ago

  • Status changed from Pending Backport to Resolved

#16 Updated by Kefu Chai almost 5 years ago

  • Status changed from Resolved to 12

reopen this ticket, as we have reported failure due to bad crushmap.

>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: /usr/bin/ceph-mon() [0x9a98aa]
>  2: (()+0x10340) [0x7f8ab3a3d340]
>  3: (crush_do_rule()+0x292) [0x85ada2]
>  4: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >*, int*, unsigned int*) const+0xeb) [0x7a85cb]
>  5: (OSDMap:g_to_raw_up(pg_t, std::vector<int, std::allocator<int> >*, int*) const+0x94) [0x7a8a64]
>  6: (OSDMap::remove_redundant_temporaries(CephContext*, OSDMap const&, OSDMap::Incremental*)+0x317) [0x7ab8f7]
>  7: (OSDMonitor::create_pending()+0xf69) [0x60fdb9]
>  8: (PaxosService::_active()+0x709) [0x6047b9]
>  9: (PaxosService::election_finished()+0x67) [0x604ad7]
>  10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)
> +0x236) [0x5c34a6]
>  11: (Monitor::win_standalone_election()+0x1cc) [0x5c388c]
>  12: (Monitor::bootstrap()+0x9bb) [0x5c42eb]
>  13: (Monitor::init()+0xd5) [0x5c4645]
>  14: (main()+0x2470) [0x5769c0]
>  15: (__libc_start_main()+0xf5) [0x7f8ab1ec7ec5]
>  16: /usr/bin/ceph-mon() [0x5984f7]

#17 Updated by Sage Weil almost 5 years ago

  • Priority changed from Normal to High

#18 Updated by Kefu Chai almost 5 years ago

  • Status changed from 12 to Need More Info

quote Eino's report:

I'm getting a segmentation fault error from the monitor of our test
cluster. The cluster was in a bad state because I have recently removed
three hosts from it. Now I started cleaning it up and first marked the
removed osd's as lost (ceph osd lost), and then I tried to remove the
osd's from the crush map (ceph osd crush remove). After a few successful
commands the cluster ceased to respond. On monitor seemed to stay up (it
was responding through the admin socket), so I stopped it and used
monmaptool to remove the failed monitor from the monmap. But, now also the
second monitor segfaults when I try to start it.

The cluster does not have any important data, but I'd like to get the
monitors up as a practice. How do I debug this further?

Linux cephmon-test-02 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

The output:

#19 Updated by Kefu Chai almost 5 years ago

sent mail to Eino for more information.

#20 Updated by Kefu Chai almost 5 years ago

quote of Kefu's inquiry

Eino, i was looking at your issue at http://tracker.ceph.com/issues/12876.
seems it is due to a fault crush rule, see http://tracker.ceph.com/issues/12876#note-5.
may i know how you managed to inject it into the monitor? i tried using

$ ceph osd setcrushmap -i new-crush-map
Error EINVAL: Failed to parse crushmap: ** Caught signal (Segmentation fault) *

but no luck.

quote from Eino's reply

I'm pretty sure I did it just like you were trying to do. The cluster has since been upgraded a couple of times. Unfortunately I can't remember when I created that particular faulty rule.

#21 Updated by Kefu Chai almost 5 years ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF