Bug #54142: quincy cephadm-purge-cluster needs work - Orchestrator - Ceph

Actions

Copy link

Bug #54142

closed

quincy cephadm-purge-cluster needs work

Added by Tim Wilkinson over 2 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

Redouane Kachach Elhichou

Category:

cephadm

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v17.0.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

For the sake of tracking ...

The purge process in quincy is not yet ready for prime time in this early stage. The preflight & purge playbooks were used but ultimately I went through the manual steps I've used previously when the process fails somewhere ...

# Clean all hosts excluding bootstrap
cephadm_in_host=$(ls /var/lib/ceph/$fsid/cephadm*)
python3 $cephadm_in_host rm-cluster --fsid $fsid --force
systemctl stop ceph.target
systemctl disable ceph.target
rm -f /etc/systemd/system/ceph.target
systemctl daemon-reload
systemctl reset-failed
rm -rf /var/log/ceph/*
rm -rf /var/lib/ceph/*

# clean bootstrap 
cephadm_in_host=$(ls /var/lib/ceph/$fsid/cephadm*)
python3 $cephadm_in_host rm-cluster --fsid $fsid --force
#cephadm rm-cluster --fsid $fsid --force
systemctl stop ceph.target
systemctl disable ceph.target
rm -f /etc/systemd/system/ceph.target
systemctl daemon-reload
systemctl reset-failed
rm -rf /etc/ceph/*
rm -rf /var/log/ceph/*
rm -rf /var/lib/ceph/*

# on OSD nodes
declare -a devList=("/dev/nvme0n1" "/dev/nvme1n1" "/dev/sdc" "/dev/sdd" "/dev/sde" "/dev/sdf" "/dev/sdg" "/dev/sdh" "/dev/sdi" "/dev/sdj" "/dev/sdk" "/dev/sdl" "/dev/sdm" "/dev/sdn" "/dev/sdo" "/dev/sdp" "/dev/sdq" "/dev/sdr" "/dev/sds" "/dev/sdt" "/dev/sdu" "/dev/sdv" "/dev/sdw" "/dev/sdx" "/dev/sdy" "/dev/sdz" "/dev/sdaa" "/dev/sdab" "/dev/sdac" "/dev/sdad" "/dev/sdae" "/dev/sdaf" "/dev/sdag" "/dev/sdah" "/dev/sdai" "/dev/sdaj" "/dev/sdak" "/dev/sdal")
for device in ${devList[@]}; do
  echo $device
  sgdisk --zap-all $device
done
for fsid in `systemctl list-units ceph*.target |grep target|grep -v services|awk '{print$NF}'` ; do
  echo $fsid
  /perf1/tim/tools/svc-clean.sh $fsid
done
for fsid in `ls /etc/systemd/system/ceph-*.target |cut -c 26- |cut -d. -f1` ; do
  echo $fsid
  /perf1/tim/tools/svc-clean.sh $fsid
done
for i in `lsblk -ro NAME |grep ceph` ; do
  echo $i
  dmsetup remove -f $i
done

... but that was insufficient. Subsequent Pacific deployments would fail due to remnant pods still running and holding onto ports, etc. Those had to be searched out and stopped. A couple of purge output examples are included FWIW.

Files

Download all files

220127-1930_cephadm-purge_f28-h28-000-r630 (35.2 KB) 220127-1930_cephadm-purge_f28-h28-000-r630		Tim Wilkinson, 02/04/2022 06:35 PM
220127-1942_cephadm-purge_f28-h28-000-r630 (32.8 KB) 220127-1942_cephadm-purge_f28-h28-000-r630		Tim Wilkinson, 02/04/2022 06:35 PM

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #54142

quincy cephadm-purge-cluster needs work

Updated by Vikhyat Umrao over 2 years ago

Updated by Redouane Kachach Elhichou about 2 years ago

Updated by Redouane Kachach Elhichou about 2 years ago

Updated by Redouane Kachach Elhichou about 2 years ago

Updated by Redouane Kachach Elhichou about 2 years ago

Updated by Redouane Kachach Elhichou about 2 years ago

Updated by Redouane Kachach Elhichou about 2 years ago

Updated by Tim Wilkinson almost 2 years ago

Updated by Redouane Kachach Elhichou over 1 year ago