Project

General

Profile

Bug #19989

"OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)" in rados run

Added by Yuri Weinstein 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
Start date:
05/19/2017
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados

Description

Run: http://qa-proxy.ceph.com/teuthology/yuriw-2017-05-19_03:47:57-rados-wip-yuri-testing_2017_5_19---basic-smithi
Job: 1194864
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2017-05-19_03:47:57-rados-wip-yuri-testing_2017_5_19---basic-smithi/1194864/teuthology.log

2017-05-19T03:57:24.271 INFO:tasks.ceph.mon.a.smithi077.stderr:/build/ceph-12.0.2-1341-g406a26a/src/mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::get_health(std::list<std::pair<health_status_t, std::basic_string<char> > >&, std::list<std::pair<health_status_t, std::basic_string<char> > >*, CephContext*) const' thread 7fcb8da81700 time 2017-05-19 03:57:23.973763
2017-05-19T03:57:24.271 INFO:tasks.ceph.mon.a.smithi077.stderr:/build/ceph-12.0.2-1341-g406a26a/src/mon/OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)
2017-05-19T03:57:24.272 INFO:tasks.ceph.mon.a.smithi077.stderr: ceph version 12.0.2-1341-g406a26a (406a26a1c327a13df48890994379a5ebe7ccda97)
2017-05-19T03:57:24.272 INFO:tasks.ceph.mon.a.smithi077.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x56528de789fe]
2017-05-19T03:57:24.272 INFO:tasks.ceph.mon.a.smithi077.stderr: 2: (OSDMonitor::get_health(std::list<std::pair<health_status_t, std::string>, std::allocator<std::pair<health_status_t, std::string> > >&, std::list<std::pair<health_status_t, std::string>, std::allocator<std::pair<health_status_t, std::string> > >*, CephContext*) const+0x2831) [0x56528dcfd471]
2017-05-19T03:57:24.273 INFO:tasks.ceph.mon.a.smithi077.stderr: 3: (Monitor::get_health(std::list<std::string, std::allocator<std::string> >&, ceph::buffer::list*, ceph::Formatter*)+0xca) [0x56528dc924da]
2017-05-19T03:57:24.273 INFO:tasks.ceph.mon.a.smithi077.stderr: 4: (MgrMonitor::send_digests()+0x324) [0x56528dd9ab04]
2017-05-19T03:57:24.273 INFO:tasks.ceph.mon.a.smithi077.stderr: 5: (C_MonContext::finish(int)+0x27) [0x56528dc7b9f7]
2017-05-19T03:57:24.274 INFO:tasks.ceph.mon.a.smithi077.stderr: 6: (Context::complete(int)+0x9) [0x56528dcb46a9]
2017-05-19T03:57:24.274 INFO:tasks.ceph.mon.a.smithi077.stderr: 7: (SafeTimer::timer_thread()+0xec) [0x56528de754bc]
2017-05-19T03:57:24.274 INFO:tasks.ceph.mon.a.smithi077.stderr: 8: (SafeTimerThread::entry()+0xd) [0x56528de76e4d]
2017-05-19T03:57:24.277 INFO:tasks.ceph.mon.a.smithi077.stderr: 9: (()+0x8184) [0x7fcb92ee1184]
2017-05-19T03:57:24.277 INFO:tasks.ceph.mon.a.smithi077.stderr: 10: (clone()+0x6d) [0x7fcb917a2bed]
2017-05-19T03:57:24.277 INFO:tasks.ceph.mon.a.smithi077.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

History

#1 Updated by Sage Weil 11 months ago

appears to be related to this code, which assumes the osd is out but may not be true. fix might be something like the below but i didn't look at this very carefully

diff --git a/src/osd/OSDMap.cc b/src/osd/OSDMap.cc
index 5503fb4..a89e82e 100644
--- a/src/osd/OSDMap.cc
+++ b/src/osd/OSDMap.cc
@@ -298,10 +298,12 @@ bool OSDMap::subtree_type_is_down(CephContext *cct, int id, int subtree_type, se
 {
   if (id >= 0) {
     bool is_down_ret = is_down(id);
-    if (is_down_ret) {
-      down_in_osds->insert(id);
-    } else {
-      up_in_osds->insert(id);
+    if (!is_out(id)) {
+      if (is_down_ret) {
+       down_in_osds->insert(id);
+      } else {
+       up_in_osds->insert(id);
+      }
     }
     return is_down_ret;
   }

#2 Updated by Sage Weil 11 months ago

  • Status changed from New to Resolved

Also available in: Atom PDF