Project

General

Profile

Feature #11815

mon: allow injecting new crushmap

Added by Joao Eduardo Luis almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Whenever a crushmap is faulty and the monitor, for some reason, did not pick up on it, we may need to inject a new version.

Theoretically, I think it would be possible to revert to the latest good osdmap by using ceph-kvstore-tool and setting osdmap:last_committed to the latest good version, but I worry that this may lead to data loss, especially on a big cluster and considering that we squash a bunch of updates into the same incremental.

The appropriate solution, imo, would be to use ceph-monstore-tool and have it grab the latest osdmap and incremental and fix crush in place. We would then reencode the osdmap and put it back in the store.


Related issues

Related to Ceph - Bug #11680: mon crashes when "ceph osd tree 85 --format json" Can't reproduce 05/19/2015
Related to Ceph - Bug #11814: implicit erasure code crush ruleset is not validated Resolved 05/29/2015

Associated revisions

Revision 30637342 (diff)
Added by Kefu Chai over 8 years ago

tools/ceph-monstore-tools: add rewrite command

"rewrite" command will
- add a new osdmap version to update current osdmap held by OSDMonitor
- add a new paxos version, as a proposal it will * rewrite all osdmap epochs from specified epoch to the last_committed
one with the specified crush map. * add the new osdmap which is added just now
so the leader monitor can trigger a recovery process to apply the transaction
to all monitors in quorum, and hence bring them back to normal after being
injected with a faulty crushmap.

Fixes: #11815
Signed-off-by: Kefu Chai <>

Revision 39e25b97 (diff)
Added by Kefu Chai over 8 years ago

tools: add ceph-monstore-update-crush.sh

Fixes: #11815
Signed-off-by: Kefu Chai <>

Revision 9d8b6d85 (diff)
Added by Kefu Chai over 8 years ago

test: add a test to exercise ceph-monstore-update-crush.sh

Fixes: #11815
Signed-off-by: Kefu Chai <>

Revision 50a33dea (diff)
Added by Kefu Chai over 8 years ago

package ceph-monstore-update-crush.sh

Fixes: #11815
Signed-off-by: Kefu Chai <>

Revision 9035c694 (diff)
Added by Kefu Chai about 8 years ago

tools/ceph-monstore-tools: add rewrite command

"rewrite" command will
- add a new osdmap version to update current osdmap held by OSDMonitor
- add a new paxos version, as a proposal it will * rewrite all osdmap epochs from specified epoch to the last_committed
one with the specified crush map. * add the new osdmap which is added just now
so the leader monitor can trigger a recovery process to apply the transaction
to all monitors in quorum, and hence bring them back to normal after being
injected with a faulty crushmap.

Fixes: #11815
Signed-off-by: Kefu Chai <>
(cherry picked from commit 306373427836ca0c2418dbe6caab26d74d94d12e)

History

#1 Updated by Greg Farnum almost 9 years ago

How are you thinking of doing this? I imagine you mean this for a map which broke the monitor so it failed to propagate — but if so we need to make sure this is set up to guarantee that. It would be really bad if we created a new epoch N that conflicted with a previously-circulated one...

#2 Updated by Loïc Dachary almost 9 years ago

  • Status changed from New to 12
  • Priority changed from High to Urgent

What about the mon can start in --use-crushmap false and will start in a mode that makes sure the crushmap is not used (no pgmap loaded, no osd connections accepted ?). Then a good crush map can be injected and the ceph injectargs mon.* --use-crushmap true can be used to resume operations.

#3 Updated by Loïc Dachary almost 9 years ago

  • Subject changed from mon: allow injecting new crushmap to mon: add --use-crushmap false to recover from a corrupted crushmap

#4 Updated by Loïc Dachary almost 9 years ago

  • Subject changed from mon: add --use-crushmap false to recover from a corrupted crushmap to mon: allow injecting new crushmap

as per greg & joao this is not a good idea (--use-crushmap false)

#5 Updated by Loïc Dachary almost 9 years ago

Although the previous crushmap is likely to be correct, it is entirely possible that it is corrupt. For instance you inject a crushmap with a bad ruleset (one that crashes). But the ruleset is not used by any pool just yet. And you inject the crushmap a few times, which is common when tuning the crushmap. Then you create a pool to use the ruleset. And boom. And you have an unknown series of invalid crushmaps and a pool that is using it.

#6 Updated by Kefu Chai almost 9 years ago

per the discussion of Joao, Greg and Loïc,

we need to have a CLI tool which will be designed to add a new paxos commit to update the leader's monitor storage. say we have broken osdmap epoch 50, and we crashed at epoch 60:

assuming we have a the cli tool will fix both the incremental and full maps from version 50 to version 60 and . to be specific, it will

  1. extracts the latest good crush map #50
  2. fix each individual osdmap and incremental from #50 to #60; only the incremental containing the bad crushmap should be changed
  3. create a new paxos version with those fixes (in fact, if we have many broken versions we may need to break this into several paxos versions, dunno)
after the lead's store is patched, we should restart the monitors:
  1. mon->store->apply_transaction() will overwrite the bad bits,
  2. so it can send that newly added paxos version to the rest of the quorum before the services init
  3. then each peon will get the new paxos versions and apply them before they init their services

#7 Updated by Kefu Chai almost 9 years ago

  • Assignee set to Kefu Chai

#8 Updated by Kefu Chai almost 9 years ago

  • Status changed from 12 to Fix Under Review

#9 Updated by Samuel Just almost 9 years ago

  • Target version set to v9.0.6

#10 Updated by Kefu Chai over 8 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF