https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-05-20T22:17:30ZCeph Ceph - Bug #19989: "OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)" in rados runhttps://tracker.ceph.com/issues/19989?journal_id=912832017-05-20T22:17:30ZSage Weilsage@newdream.net
<ul></ul><p>appears to be related to this code, which assumes the osd is out but may not be true. fix might be something like the below but i didn't look at this very carefully<br /><pre>
diff --git a/src/osd/OSDMap.cc b/src/osd/OSDMap.cc
index 5503fb4..a89e82e 100644
--- a/src/osd/OSDMap.cc
+++ b/src/osd/OSDMap.cc
@@ -298,10 +298,12 @@ bool OSDMap::subtree_type_is_down(CephContext *cct, int id, int subtree_type, se
{
if (id >= 0) {
bool is_down_ret = is_down(id);
- if (is_down_ret) {
- down_in_osds->insert(id);
- } else {
- up_in_osds->insert(id);
+ if (!is_out(id)) {
+ if (is_down_ret) {
+ down_in_osds->insert(id);
+ } else {
+ up_in_osds->insert(id);
+ }
}
return is_down_ret;
}
</pre></p> Ceph - Bug #19989: "OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)" in rados runhttps://tracker.ceph.com/issues/19989?journal_id=916132017-05-24T22:05:02ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul> Ceph - Bug #19989: "OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)" in rados runhttps://tracker.ceph.com/issues/19989?journal_id=2026522021-09-08T09:07:33ZSebastian Wagner
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-11 priority-5 priority-high3" href="/issues/52535">Bug #52535</a>: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_down_in_osds <= num_in_osds)</i> added</li></ul> Ceph - Bug #19989: "OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)" in rados runhttps://tracker.ceph.com/issues/19989?journal_id=2543582024-02-09T16:00:32ZJan Horacek
<ul></ul><p>we had similar issue and noticed, that we have lots of OSDs marked with status "new".<br />after couple of tests after getting rid of "new" status (by out/in every OSD, with norebalance/nobackfill during this operation) it looks like we got rid of the problem.</p>
<p>could i ask you guys to check ceph osd status for the OSDs ?</p> Ceph - Bug #19989: "OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)" in rados runhttps://tracker.ceph.com/issues/19989?journal_id=2545342024-02-12T18:46:14ZRadoslaw Zarzynskirzarzyns@redhat.com
<ul><li><strong>Tags</strong> set to <i>medium-hanging-fruit</i></li></ul><p>There is starting point for the fix: <a class="external" href="https://tracker.ceph.com/issues/19989#note-1">https://tracker.ceph.com/issues/19989#note-1</a>.<br />Tagging as medmium-hanging-fruit,</p>