Project

General

Profile

Bug #19989

"OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)" in rados run

Added by Yuri Weinstein 4 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
Start date:
05/19/2017
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Release:
master
Needs Doc:
No

Description

Run: http://qa-proxy.ceph.com/teuthology/yuriw-2017-05-19_03:47:57-rados-wip-yuri-testing_2017_5_19---basic-smithi
Job: 1194864
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2017-05-19_03:47:57-rados-wip-yuri-testing_2017_5_19---basic-smithi/1194864/teuthology.log

2017-05-19T03:57:24.271 INFO:tasks.ceph.mon.a.smithi077.stderr:/build/ceph-12.0.2-1341-g406a26a/src/mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::get_health(std::list<std::pair<health_status_t, std::basic_string<char> > >&, std::list<std::pair<health_status_t, std::basic_string<char> > >*, CephContext*) const' thread 7fcb8da81700 time 2017-05-19 03:57:23.973763
2017-05-19T03:57:24.271 INFO:tasks.ceph.mon.a.smithi077.stderr:/build/ceph-12.0.2-1341-g406a26a/src/mon/OSDMonitor.cc: 3545: FAILED assert(num_down_in_osds <= num_in_osds)
2017-05-19T03:57:24.272 INFO:tasks.ceph.mon.a.smithi077.stderr: ceph version 12.0.2-1341-g406a26a (406a26a1c327a13df48890994379a5ebe7ccda97)
2017-05-19T03:57:24.272 INFO:tasks.ceph.mon.a.smithi077.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x56528de789fe]
2017-05-19T03:57:24.272 INFO:tasks.ceph.mon.a.smithi077.stderr: 2: (OSDMonitor::get_health(std::list<std::pair<health_status_t, std::string>, std::allocator<std::pair<health_status_t, std::string> > >&, std::list<std::pair<health_status_t, std::string>, std::allocator<std::pair<health_status_t, std::string> > >*, CephContext*) const+0x2831) [0x56528dcfd471]
2017-05-19T03:57:24.273 INFO:tasks.ceph.mon.a.smithi077.stderr: 3: (Monitor::get_health(std::list<std::string, std::allocator<std::string> >&, ceph::buffer::list*, ceph::Formatter*)+0xca) [0x56528dc924da]
2017-05-19T03:57:24.273 INFO:tasks.ceph.mon.a.smithi077.stderr: 4: (MgrMonitor::send_digests()+0x324) [0x56528dd9ab04]
2017-05-19T03:57:24.273 INFO:tasks.ceph.mon.a.smithi077.stderr: 5: (C_MonContext::finish(int)+0x27) [0x56528dc7b9f7]
2017-05-19T03:57:24.274 INFO:tasks.ceph.mon.a.smithi077.stderr: 6: (Context::complete(int)+0x9) [0x56528dcb46a9]
2017-05-19T03:57:24.274 INFO:tasks.ceph.mon.a.smithi077.stderr: 7: (SafeTimer::timer_thread()+0xec) [0x56528de754bc]
2017-05-19T03:57:24.274 INFO:tasks.ceph.mon.a.smithi077.stderr: 8: (SafeTimerThread::entry()+0xd) [0x56528de76e4d]
2017-05-19T03:57:24.277 INFO:tasks.ceph.mon.a.smithi077.stderr: 9: (()+0x8184) [0x7fcb92ee1184]
2017-05-19T03:57:24.277 INFO:tasks.ceph.mon.a.smithi077.stderr: 10: (clone()+0x6d) [0x7fcb917a2bed]
2017-05-19T03:57:24.277 INFO:tasks.ceph.mon.a.smithi077.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

History

#1 Updated by Sage Weil 4 months ago

appears to be related to this code, which assumes the osd is out but may not be true. fix might be something like the below but i didn't look at this very carefully

diff --git a/src/osd/OSDMap.cc b/src/osd/OSDMap.cc
index 5503fb4..a89e82e 100644
--- a/src/osd/OSDMap.cc
+++ b/src/osd/OSDMap.cc
@@ -298,10 +298,12 @@ bool OSDMap::subtree_type_is_down(CephContext *cct, int id, int subtree_type, se
 {
   if (id >= 0) {
     bool is_down_ret = is_down(id);
-    if (is_down_ret) {
-      down_in_osds->insert(id);
-    } else {
-      up_in_osds->insert(id);
+    if (!is_out(id)) {
+      if (is_down_ret) {
+       down_in_osds->insert(id);
+      } else {
+       up_in_osds->insert(id);
+      }
     }
     return is_down_ret;
   }

#2 Updated by Sage Weil 4 months ago

  • Status changed from New to Resolved

Also available in: Atom PDF