Project

General

Profile

Actions

Bug #8371

closed

osd not booting

Added by Samuel Just almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
firefly, dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm not entirely clear on how the monmap subscriptions work, but it
looks from the log as if

OSD::_maybe_boot with

osdmap->get_epoch() 0,
oldest 1, newset == 253 (_maybe_boot mon has osdmaps 1..253)

causes maybe_boot to fall through to the else branch at the end
osdmap_subscribe(oldest - 1, true);

osdmap_subscribe then observes osdmap->get_epoch() >= (oldest - 1) and
skips the subscription causing boot to hang.

The check in osdmap_subscribe was not present before
4584f60653bee0305e85418323d80332ceecd0cf.

Actions #1

Updated by Greg Farnum almost 10 years ago

  • Status changed from New to 7

wip-osdmap-sub-bug
wip-dumpling-osdmap-sub-bug

It'll also need a firefly backport, but maybe we should just remove the if-else entirely since I believe we can do osdmap->get_epoch()+1 in both cases with the same result.

Actions #2

Updated by Greg Farnum almost 10 years ago

The master version passed my local test (I got master to hang, and then ran the same test and it did not hang). Scheduled a suite run as well; then we can merge and do proper backports.

Actions #3

Updated by Greg Farnum almost 10 years ago

  • Status changed from 7 to Pending Backport

Sage put this in master (290ac818696414758978b78517b137c226110bb4), and it passed the suite run overnight.

Actions #4

Updated by Greg Farnum almost 10 years ago

  • Status changed from Pending Backport to Resolved

This is in the dumpling branch as bd5d6f116416d1b410d57ce00cb3e2abf6de102b, and Sage has it in firefly-next as 5c8afaa8861345efbcc5488e0336327a8a38d3bc.

Actions

Also available in: Atom PDF