Bug #6605
mon: remove full osd state on "osd rm"
0%
Description
It turns out that "osd rm" does not eliminate the osd's auth keys from the monitor. That's liable to cause issues. Correct the oversight and check for any other osd state that might be left around.
This is particularly unfortunate because it looks like preprocess_boot will not block "old" instances of the OSD from joining and booting the "new" one out. :/
Associated revisions
mon: OSDMonitor: only allow an osd to boot iff it has the fsid on record
Fixes: #6605
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Merge pull request #788 from ceph/wip-6605
mon: OSDMonitor: only allow an osd to boot iff it has the fsid on record
Fixes: #6605
Reviewed-by: Sage Weil <sage@inktank.com>
History
#1 Updated by Greg Farnum over 9 years ago
We should also be checking that the osd matches what we expect based on more than the ID. I think we have enough information in the monitor and boot message to do that...
#2 Updated by Samuel Just over 9 years ago
- Assignee set to Joao Eduardo Luis
#3 Updated by Joao Eduardo Luis over 9 years ago
- Status changed from New to Fix Under Review
wip-6605, pull request 787, e02740ac5da7c9f5e4c1fdd603918e56c05123de
Greg Farnum wrote:
It turns out that "osd rm" does not eliminate the osd's auth keys from the monitor. That's liable to cause issues. Correct the oversight and check for any other osd state that might be left around.
This was not fixed. Fixing this would need us to be able to seamlessly pack a change to the AuthMonitor state into the OSDMonitor's paxos proposal. Although this would be possible with a little bit of an effort, it would be a new feature altogether rather than a bug fix. Proposing both changes at different times isn't an option either as we would lose the expected atomicity. Instead, we delegate to the user, as we currently do, the responsibility of removing the osd key from the keyring upon osd removal.
Greg Farnum wrote:
This is particularly unfortunate because it looks like preprocess_boot will not block "old" instances of the OSD from joining and booting the "new" one out. :/
This was indeed a bug, however unrelated to the above. This behavior is fixed by wip-6605 and we will now let an osd boot given the following circumstances:
- osd exists and it's fsid is the same we have on record (in the osdmap)
- osd dne and fsid on record is nil
#4 Updated by Joao Eduardo Luis over 9 years ago
- Status changed from Fix Under Review to Resolved
merged into master