Project

General

Profile

Actions

Documentation #24091

closed

"ceph osd purge": meaning of "down" is ambiguous

Added by Jesse Williamson almost 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Low
Category:
documentation
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

The documentation for "ceph osd purge" does not contain a lot of information about the command and its expectations. In
particular, the meaning of what is meant by this message is unclear.

In response to the command "ceph osd purge 1 --yes-i-really-mean-it", we get:
2018-05-10 15:18:03.444 7f29c0ae2700 2 mon.a@0(leader) e1
send_reply 0x556281707d40 0x5562818cdcc0 mon_command_ack([{"prefix": "osd purge", "sure": "--yes-i-really-mean-it", "id": 0}]=-16 osd.1 is not `down`.
v16) v1

...notice that "down" is in quotes, suggesting it has a special meaning. The command requires that the OSD process not be running, and this is what
is meant by "down". However, it's entirely reasonable for a user to think that it means the process should be /marked/ down, eg. with the "osd down"
command, which does not satisfy the requirement.

At least two things would improve this:
1) a brief mention that "osd purge" requires the OSD process to not be running in the documentation;
2) change the response to something more explicit and/or suggestive of the solution, perhaps along the lines of "cannot purge, osd.1 process [PID] is still running"

It also may be worth considering a couple of new features for the "ceph" command:
1) consistently print user-facing console messages, especially on failure (ie. print the above message to the user rather than requiring them to go to the logfile), provide
an option --quiet to disable this for scripting;
2) print helpful suggestions, similar to gcc and clang: "cannot purge... Did you try killing process [PID] first?"

Actions #1

Updated by Joao Eduardo Luis almost 6 years ago

notice that "down" is in quotes, suggesting it has a special meaning. The command requires that the OSD process not be running, and this is what is meant by "down"

"down" is a concept in Ceph; it means the OSD is marked "down" in the osdmap. It doesn't necessarily mean the osd process is not running.

An OSD may be down due to other factors, such as the osd not reaching or being reachable by the monitors and other osds.

If the documentation is not clear on the meaning of "down", then it should be adjusted. However, tying "down" to just the concept of "process is not running" is likely going to cause a lot of confusion in degenerated cluster states. Whomever handles this ticket, please take that into consideration.

Also, all the other bits of this ticket, regarding CLI behavior, should be filed under the RADOS tracker, category usability or something.

Actions #2

Updated by Greg Farnum almost 6 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF