Feature #2939
chef: Write up how cluster shrinking should work
0%
Description
Expanding the cluster is pretty trivial, and practically identical with initial install, but shrinking needs a little bit more care.
If I want to remove physical server node1234, what OSDs do I "ceph osd rm"?
Something should probably run on node1234 to bring the relevant disk back to "prepared" state, so it doesn't keep trying to talk to the cluster? Something like ceph-disk-VERB_HERE that stops the daemons, re-prepares the disk? Would that command also talk to ceph-mon to do the "ceph osd rm"?
Note that in actual usage, admins will probably want to step the crush weight slowly to 0 first.
History
#1 Updated by Anonymous over 11 years ago
- Description updated (diff)
#2 Updated by Anonymous over 11 years ago
- Category set to chef
#3 Updated by Anonymous over 11 years ago
- translation missing: en.field_story_points set to 8
#4 Updated by Anonymous over 11 years ago
Moving content from duplicate #3119:
DH cookbooks do this by setting a node attribute that maps osd.id -> desired action, one of the actions is destroy.
That does run into the annoyances of using Chef as an RPC mechanism, requires admin to manage id->node mapping, etc.
Try to solve this with a core product feature, then make that interop well with Chef/ceph-deploy/Juju.
Once destroyed, osd hotplugging MUST NOT create a new OSD on that disk automatically. That is, the lifecycle is
blank --ceph-disk-prepare--> prepared prepared --ceph-disk-activate--> active {prepared, active} --ceph-disk-destroy?--> blank (the last arrow goes to blank, not to prepared)
and the logic that would automatically trigger ceph-disk-prepare for listed block devices (#2554) MUST NOT re-prepare it
#5 Updated by Sage Weil almost 6 years ago
- Status changed from New to Rejected