Project

General

Profile

Actions

Bug #54142

closed

quincy cephadm-purge-cluster needs work

Added by Tim Wilkinson over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

For the sake of tracking ...

The purge process in quincy is not yet ready for prime time in this early stage. The preflight & purge playbooks were used but ultimately I went through the manual steps I've used previously when the process fails somewhere ...

# Clean all hosts excluding bootstrap
cephadm_in_host=$(ls /var/lib/ceph/$fsid/cephadm*)
python3 $cephadm_in_host rm-cluster --fsid $fsid --force
systemctl stop ceph.target
systemctl disable ceph.target
rm -f /etc/systemd/system/ceph.target
systemctl daemon-reload
systemctl reset-failed
rm -rf /var/log/ceph/*
rm -rf /var/lib/ceph/*

# clean bootstrap 
cephadm_in_host=$(ls /var/lib/ceph/$fsid/cephadm*)
python3 $cephadm_in_host rm-cluster --fsid $fsid --force
#cephadm rm-cluster --fsid $fsid --force
systemctl stop ceph.target
systemctl disable ceph.target
rm -f /etc/systemd/system/ceph.target
systemctl daemon-reload
systemctl reset-failed
rm -rf /etc/ceph/*
rm -rf /var/log/ceph/*
rm -rf /var/lib/ceph/*

# on OSD nodes
declare -a devList=("/dev/nvme0n1" "/dev/nvme1n1" "/dev/sdc" "/dev/sdd" "/dev/sde" "/dev/sdf" "/dev/sdg" "/dev/sdh" "/dev/sdi" "/dev/sdj" "/dev/sdk" "/dev/sdl" "/dev/sdm" "/dev/sdn" "/dev/sdo" "/dev/sdp" "/dev/sdq" "/dev/sdr" "/dev/sds" "/dev/sdt" "/dev/sdu" "/dev/sdv" "/dev/sdw" "/dev/sdx" "/dev/sdy" "/dev/sdz" "/dev/sdaa" "/dev/sdab" "/dev/sdac" "/dev/sdad" "/dev/sdae" "/dev/sdaf" "/dev/sdag" "/dev/sdah" "/dev/sdai" "/dev/sdaj" "/dev/sdak" "/dev/sdal")
for device in ${devList[@]}; do
  echo $device
  sgdisk --zap-all $device
done
for fsid in `systemctl list-units ceph*.target |grep target|grep -v services|awk '{print$NF}'` ; do
  echo $fsid
  /perf1/tim/tools/svc-clean.sh $fsid
done
for fsid in `ls /etc/systemd/system/ceph-*.target |cut -c 26- |cut -d. -f1` ; do
  echo $fsid
  /perf1/tim/tools/svc-clean.sh $fsid
done
for i in `lsblk -ro NAME |grep ceph` ; do
  echo $i
  dmsetup remove -f $i
done

... but that was insufficient. Subsequent Pacific deployments would fail due to remnant pods still running and holding onto ports, etc. Those had to be searched out and stopped. A couple of purge output examples are included FWIW.


Files


Related issues 3 (0 open3 closed)

Related to Orchestrator - Bug #54018: Suspicious behavior when deleting a cluster (by running cephadm rm-cluster)ResolvedRedouane Kachach Elhichou

Actions
Related to Orchestrator - Feature #53815: cephadm rm-cluster should delete log filesResolvedRedouane Kachach Elhichou

Actions
Related to Orchestrator - Bug #53010: cehpadm rm-cluster does not clean up /var/run/cephResolvedRedouane Kachach Elhichou

Actions
Actions

Also available in: Atom PDF