Project

General

Profile

Actions

Bug #45792

closed

cephadm: zapped OSD gets re-added to the cluster.

Added by David Capone almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
ux
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Using version 15.2.1 with octopus cluster running centos 8.

When the cluster was initially deployed, OSDs were created using

ceph orch apply osd --all-available-devices

2 weeks later any drive added to the server is immediately provisioned as an OSD by cephadm process. The orchestrator apparently never stops looking to provision new drives. This also prevents a problem is you attempt to zap a drive after removing it from the cluster with the intent to safely remove the drive physically from the server. The moment the zap completes successfully, cephadm sees the drive as available and reprovisions the drive as a new OSD.

If this is by design some documentation on how to stop this background progress when a user no longer wants it to deploy new drives would be appreciated.


Related issues 3 (2 open1 closed)

Related to Dashboard - Bug #44808: mgr/dashboard: Allow users to specify an unmanaged ServiceSpec when creating OSDsNew

Actions
Related to ceph-volume - Feature #45374: Add support for BLACKLISTED_DEVICES env var parsing.New

Actions
Related to Orchestrator - Bug #45907: cepham: daemon rm for managed services is completely brokenResolved

Actions
Actions #1

Updated by David Capone almost 4 years ago

One thing I didn't mentioned is that cephadm attempting to apply OSDs to available devices SURVIVES disabling and enabling cephadm.

If you run ceph mgr module disable cephadm, it will stop this behavior WHILE cephadm remains disabled, however the moment cephadm is reenabled with ceph mgr module enable cephadm, cephadm immediately starts looking for available devices to apply as OSDs.

Actions #2

Updated by Kiefer Chang almost 4 years ago

  • Related to Bug #44808: mgr/dashboard: Allow users to specify an unmanaged ServiceSpec when creating OSDs added
Actions #3

Updated by Kiefer Chang almost 4 years ago

When using command `ceph orch apply osd --all-available-devices` to create OSDs, an OSDSpec is created to apply all usable devices as OSDs.
And the spec is "managed" within cephadm, which means the spec will be applied by cephadm repeatedly (10 minutes by default).

A PR (https://github.com/ceph/ceph/pull/35084) was created to allow setting the "managed" state to false.
But unfortunately it's not backported to v15 yet.

Actions #4

Updated by Sebastian Wagner almost 4 years ago

  • Subject changed from cephadm: apply osd all available devices never stops running to cephadm: zapped OSD gets re-added to the cluster.
  • Description updated (diff)
  • Category changed from cephadm (binary) to cephadm
Actions #5

Updated by Sebastian Wagner almost 4 years ago

  • Priority changed from Normal to Urgent
Actions #6

Updated by Sebastian Wagner almost 4 years ago

  • Priority changed from Urgent to High
Actions #7

Updated by Joshua Schmid almost 4 years ago

I'd say that this is the intended behavior. There is a tracking issue for adding a `blacklisting` command that allows you 'lock'(or make unavailable) a certain disk.

See: https://tracker.ceph.com/issues/45374

Actions #8

Updated by Jan Fajerski almost 4 years ago

Hmm there is something racey going on. I have a cluster too deployed with --all-available-devices. I remove and osd via ceph orch osd rm 5 which works fine, seeing osd count go from 20 to 19. Then I go and zap that device ceph orch device zap master /dev/vdb --force and while that runs, ceph -s reports that that I have 20 in OSDs but only 19 up and its rebalancing data. Zap comes back successful, but the cluster ends up with 20 OSDs, i.e. my removed and zapped OSD was resurrected.
So the apply seems to race with remove or zap? Certainly not what I expected.

Ok I should have thought for two minutes longer before commenting. I suppose that is the design. While I see (and like) the logic behind it, this is very surprising to unsuspecting users. Maybe blacklisting a device by default after zapping can make this more predictable?

Actions #9

Updated by David Capone almost 4 years ago

That is exactly the behavior we see.

I do not think that is intuitive or should be the expected behavior.

Maybe the addition of an option like --once that can be used with --all-available-devices that would apply the spec and then remove the spec so it doesn't run forever.

Actions #10

Updated by Igor Fedotov almost 4 years ago

We stepped into that too, IMO quite non-intuitive behavior too. One removes an OSD and it reappears again shortly after....

Actions #11

Updated by David Capone almost 4 years ago

As an additional comment on this...

I also think similar logic should apply to any drivespec that is used to apply osds. There should be some time limit where cephadm tries to match the drivespec for and then it should stop.

I foresee a situation where a particular drivespec is applied when a cluster is initially deployed. Then 6 months down the road new OSDs nodes are added to the cluster where it might be more appropriate to manually provision the new OSDs at that point and the admin does not recall the initial OSD drive spec was used to configure the cluster initially and is wondering why all of the new drives being added to the cluster have suddenly starting autoprovisioning when they didn't intend that to happen.

An expensive rebalance is then needed to correct the errant / unexpected OSD deployments.

Actions #12

Updated by Joshua Schmid almost 4 years ago

There is an `unmanaged` flag that can be set for any ServiceSpec

service_type: osd
service_id: foo
...
unmanaged: True

That would prevent cephadm from automatically provisioning new or available disks (any new daemons really).

re:

I foresee a situation where a particular drivespec is applied when a cluster is initially deployed. Then 6 months down the road new OSDs nodes are added to the cluster where it might be more appropriate to manually provision the new OSDs at that point and the admin does not recall the initial OSD drive spec was used to configure the cluster initially and is wondering why all of the new drives being added to the cluster have suddenly starting autoprovisioning when they didn't intend that to happen.

That can also be prevented by correctly setting the `placement` specification which explicitly matches the existing hosts (and not newly added ones) or using `unmanaged: True`.

However, with the increasing amount of reports that we get regarding this specific behavior we probably have to rethink the workflow.
It feels that the 'rolling' fashion in which cephadm acts is counter-intuitive to most users. I'm not sure if this is due to missing documentation or just actually counter-intuitive..

Actions #13

Updated by Sebastian Wagner almost 4 years ago

  • Related to Feature #45374: Add support for BLACKLISTED_DEVICES env var parsing. added
Actions #14

Updated by Sebastian Wagner almost 4 years ago

  • Tags set to ux
Actions #15

Updated by Sebastian Wagner almost 4 years ago

  • Related to Bug #45907: cepham: daemon rm for managed services is completely broken added
Actions #16

Updated by Joshua Schmid almost 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 35744

As this is the intended behavior, this needs to be documented. See https://github.com/ceph/ceph/pull/35744

Actions #17

Updated by David Capone almost 4 years ago

I am still putting additional input and requesting that if this behavior remains as is that a way to stop the process be implemented.

In development we have run into 2 issues with this orchestration and the inability to stop individual orchestration events.

1. https://tracker.ceph.com/issues/44749 because of that bug new db drives and data drives cannot be added without first pausing the entire orchestration engine. It would be much better to be able to pause/stop just a drive spec (or any spec) deployment.

2. With orch osd rm ... similar issue of being able to unable to delete queued orchestration events. I have run into a situation on 3 node clusters where the osds do not fully rebalance and rm "hangs" because rebalance cannot fully complete. As an administrator I want to be able to forcibly destroy that OSD since I explicitly understand the risks of doing so. You are able to forcibly destroy it by stopping the container and marking it destroyed, but once a new drive is replaced, added, and available, reuses the osd id, and is deployed, cephadm continues with the previously issued remove command on the empty, newly redeployed osd.

Functionality and control would be greatly improved by implementation of some way to stop individual orchestration events.

Actions #18

Updated by Sebastian Wagner almost 4 years ago

  • Priority changed from High to Urgent
Actions #19

Updated by Sebastian Wagner over 3 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF