Project

General

Profile

Actions

Bug #6605

closed

mon: remove full osd state on "osd rm"

Added by Greg Farnum over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It turns out that "osd rm" does not eliminate the osd's auth keys from the monitor. That's liable to cause issues. Correct the oversight and check for any other osd state that might be left around.

This is particularly unfortunate because it looks like preprocess_boot will not block "old" instances of the OSD from joining and booting the "new" one out. :/

Actions #1

Updated by Greg Farnum over 10 years ago

We should also be checking that the osd matches what we expect based on more than the ID. I think we have enough information in the monitor and boot message to do that...

Actions #2

Updated by Samuel Just over 10 years ago

  • Assignee set to Joao Eduardo Luis
Actions #3

Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from New to Fix Under Review

wip-6605, pull request 787, e02740ac5da7c9f5e4c1fdd603918e56c05123de

Greg Farnum wrote:

It turns out that "osd rm" does not eliminate the osd's auth keys from the monitor. That's liable to cause issues. Correct the oversight and check for any other osd state that might be left around.

This was not fixed. Fixing this would need us to be able to seamlessly pack a change to the AuthMonitor state into the OSDMonitor's paxos proposal. Although this would be possible with a little bit of an effort, it would be a new feature altogether rather than a bug fix. Proposing both changes at different times isn't an option either as we would lose the expected atomicity. Instead, we delegate to the user, as we currently do, the responsibility of removing the osd key from the keyring upon osd removal.

Greg Farnum wrote:

This is particularly unfortunate because it looks like preprocess_boot will not block "old" instances of the OSD from joining and booting the "new" one out. :/

This was indeed a bug, however unrelated to the above. This behavior is fixed by wip-6605 and we will now let an osd boot given the following circumstances:

  • osd exists and it's fsid is the same we have on record (in the osdmap)
  • osd dne and fsid on record is nil
Actions #4

Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from Fix Under Review to Resolved

merged into master

Actions

Also available in: Atom PDF