Bug #52919: ceph orch device zap validation can result in osd issues and problematic error messages - Orchestrator - Ceph

Actions

Copy link

Bug #52919

closed

ceph orch device zap validation can result in osd issues and problematic error messages

Added by Paul Cuzner over 2 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Paul Cuzner

Category:

cephadm

Target version:

Ceph - v17.0.0

% Done:

100%

Source:

Tags:

Backport:

pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

43560

Crash signature (v1):

Crash signature (v2):

Description

Any fat-fingered moment with hostname or device path can cause zap to do things it shouldn't.

For example;
1. bogus host name
[ceph: root@f34cluster /]# ceph orch device zap orac disk --force
Error EINVAL: host address is empty

---> clearly the hostname was not empty - the msg should be that the host is not a member of the cluster

2. host in maintenance .. not checked for. If the host is in maintenance we should not attempt any actions against it.

3. valid host, but a bogus device
[ceph: root@f34cluster /]# ceph orch device zap f34cluster disk --force
Error EINVAL: Zap failed: ceph-volume lvm list d
i
s
k

---> not a very clear error message for the admin

4. osd stopped, but still valid...this fails the "is-active" check, allowing the zap to proceed against a valid osd that just happened to be down at that point. The result looks like this (acting against osd.3)
[ceph: root@f34cluster /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 4.00000 root default
-3 4.00000 host f34cluster
0 hdd 1.00000 osd.0 up 1.00000 1.00000
1 hdd 1.00000 osd.1 up 1.00000 1.00000
2 hdd 1.00000 osd.2 up 1.00000 1.00000
3 hdd 1.00000 osd.3 down 1.00000 1.00000

---> osd.3 was zapped, but to ceph it still exists and obviously the host still has it as an entry in systemd. the orch osd rm should have been run first

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #52919

ceph orch device zap validation can result in osd issues and problematic error messages

Updated by Paul Cuzner over 2 years ago

Updated by Sebastian Wagner over 2 years ago

Updated by Paul Cuzner over 2 years ago

Updated by Paul Cuzner over 2 years ago

Updated by Sebastian Wagner over 2 years ago