Project

General

Profile

Fix #4708

MDS: journaler pre-zeroing is dangerous

Added by Greg Farnum almost 11 years ago. Updated almost 6 years ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, osdc
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See http://pastebin.com/NJd0UCfF

At first glance it looks like there's a short and a missing log object, and then several of the follow-on objects which did exist got deleted. Perhaps we should be a bit more careful about successfully starting up before pre-zeroing.
(Of course, it's also possible that my quick diagnosis is just wrong, but either way conceptually this is an issue.)

History

#1 Updated by Greg Farnum almost 11 years ago

  • Description updated (diff)

#2 Updated by Greg Farnum over 10 years ago

Possibly related to #6548.

#4 Updated by Greg Farnum over 10 years ago

Like Sage said, blacklisting. :)
It's been a while but I think the scenario I envisioned here is one in which the order we spread the blacklist to the OSDs isn't safe enough, so the blacklisted MDS continues writing to some of them but failing to do so on others.

#5 Updated by Greg Farnum about 10 years ago

  • Priority changed from Normal to High

#6458 could be a result of this issue, so I'm bumping up the priority.

#6 Updated by Greg Farnum about 10 years ago

  • Tracker changed from Bug to Fix

#7 Updated by Greg Farnum over 7 years ago

  • Category changed from 47 to Correctness/Safety
  • Component(FS) MDS, osdc added

#8 Updated by Patrick Donnelly almost 6 years ago

  • Target version set to v13.0.0

#9 Updated by Zheng Yan almost 6 years ago

  • Status changed from New to Need More Info

I don't think it's still a problem. new mds takes over a rank after it see old mds is blacklisted in osdmap. There is no way old mds can delete/modify objects that the new mds has modified.

#10 Updated by Patrick Donnelly almost 6 years ago

  • Status changed from Need More Info to Rejected
  • Labels (FS) multimds added

Thanks for explaining Zheng. Closing this.

Also available in: Atom PDF