Project

General

Profile

Bug #24524

Newly added OSDs do not start in Mimic

Added by Vitaliy Filippov over 3 years ago. Updated over 3 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi!

In my test cluster with Ceph Mimic installed (from scratch, not upgraded from luminous) newly added OSD fail to start with the following error:

-8> 2018-06-14 12:51:04.616 7f58cf8a0700  3 osd.3 0 handle_osd_map epochs [533,533], i have 0, src has [533,1182]
-7> 2018-06-14 12:51:04.616 7f58cf8a0700 -1 osd.3 0 failed to load OSD map for epoch 532, got 0 bytes

From reading the code here https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L7330 I suspect that when added OSDs try to load all osdmaps from oldest to newest (533-1182 in my case) for each Nth map it first loads (N-1)th map and compares them... But there is no (N-1)th map for the oldest one, so it dies with 'assertion failed'.

I'm trying to fix it like this:

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index a6cf188..b026401 100644
--- a/src/osd/OSD.cc
++ b/src/osd/OSD.cc
@ -7357,6 +7357,9 @ void OSD::handle_osd_map(MOSDMap *m)
// check for deleted pools
OSDMapRef lastmap;
for (auto& i : added_maps) {
if (i.first <= first) {
+ continue;
+ }
if (!lastmap) {
lastmap = get_map(i.first - 1);
}

So please tell me if I'm correct and push this fix to your repository if yes :)


Related issues

Duplicates RADOS - Bug #24423: failed to load OSD map for epoch X, got 0 bytes Resolved 06/05/2018

History

#1 Updated by Vitaliy Filippov over 3 years ago

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index a6cf188..b026401 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -7357,6 +7357,9 @@ void OSD::handle_osd_map(MOSDMap *m)
   // check for deleted pools
   OSDMapRef lastmap;
   for (auto& i : added_maps) {
+    if (i.first <= first) {
+      continue;
+    }
     if (!lastmap) {
       lastmap = get_map(i.first - 1);
     }

#3 Updated by Greg Farnum over 3 years ago

  • Duplicates Bug #24423: failed to load OSD map for epoch X, got 0 bytes added

#4 Updated by Greg Farnum over 3 years ago

  • Status changed from New to Duplicate

#5 Updated by Igor Fedotov over 3 years ago

  • Duplicated by Bug #24450: OSD Caught signal (Aborted) added

#6 Updated by Igor Fedotov over 3 years ago

  • Duplicated by deleted (Bug #24450: OSD Caught signal (Aborted))

Also available in: Atom PDF