Project

General

Profile

Actions

Bug #20995

closed

ceph -w does not exit, but it doesn't display any messages either

Added by Nathan Cutler over 6 years ago. Updated over 6 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

So, as of luminous, "ceph -w" doesn't update in real time anymore. This is mentioned in the release notes.

As far as I can tell, "ceph -w" in luminous does the following:

1. print the same info as "ceph -s"
2. go into an infinite wait loop, doing nothing

Can we fix that somehow? Either make the behavior the same as "ceph -s" (i.e. exit instead of entering the infinite wait loop) or just deprecate/drop the command?

Actions #1

Updated by Greg Farnum over 6 years ago

  • Priority changed from Normal to Urgent

This seems pretty not-great to me. :( Do we know where the change came from?
I suspect it was because of the central log and paxos version dump changes but I've found ceph -w to be pretty useful...

Actions #2

Updated by Neha Ojha over 6 years ago

Not sure what the expected output of ceph -w should be now, since after https://github.com/ceph/ceph/pull/16345, some part of its job should be done by --watch-channel.

Actions #3

Updated by Greg Farnum over 6 years ago

I don't think it does anything else besides that and printing the same stuff as "ceph -s"?

I didn't dig in to see exactly why it's behaving the way it is. From my perspective I think it should print out cluster state updates by default — so we could just have it turn on the "cluster" watch channel by default. That would probably come closest to preserving the old behavior we wanted it to have.

Another option is to have it print out a message about using the watch-channel flag and then exiting if the user doesn't invoke anything — as best I can tell if you don't use that flag it is a worse invocation of "ceph -s" that probably nobody is going to use that way on purpose! John may have thoughts on this if you ping him.

Actions #4

Updated by Neha Ojha over 6 years ago

  • Status changed from New to 12
Actions #5

Updated by Neha Ojha over 6 years ago

  • Status changed from 12 to Need More Info

I wouldn't say that ceph -w only shows ceph status. I ran an osd out command and this was logged by ceph -w along with health check updates for the PG state changes. Though the amount of information logged by ceph -w was less than ceph -w --watch-channel=*.

For other commands like osd in, osd pool create etc. I don't see any information being logged by ceph -w.

I guess we need to find out whether this was done by design or there is some issue with the implementation.

Actions #6

Updated by John Spray over 6 years ago

From ceph-devel

On Thu, Aug 17, 2017 at 2:27 AM, Neha Ojha <nojha@redhat.com> wrote:
> Hi John,
>
> I am working on the following tracker issue:
> http://tracker.ceph.com/issues/20995.
> I was wondering if you could help me figure out a couple of things -
>
> 1. What should be the expected output of only using ceph -w? Adding
> --watch-channel=* logs things properly, but as for ceph -w alone, I am
> unable to understand why I see very few log messages(in some cases
> none).

"ceph -w" just follows the cluster log at "info" severity, so if there
is nothing going to the cluster log, the command doesn't output
anything.

It used to also include the audit log, but that is now filtered out by
default, because it's usually not interesting (the administrator
doesn't need to be told what they just typed), and because it's very
hard to read (the log lines are basically JSON dumps).  All that
content is still going to the log file, it's just not visible by
default in ceph -w.

This is causing some confusion, because historically we had code that
wrote the PG status to the cluster log continuously (every 5 seconds
or so), so people are accustomed to always seeing something.  Also the
old documentation talked about this command "watching the cluster's
events" which created the impression that this was something other
than a log tailing command.

> 2. For which cases would it show log messages? Like for example, for
> "ceph osd out" it does, but not for "ceph osd in", "ceph osd pool
> create" etc.

In general you'll see messages about bad or unexpected things (every
failing health check now comes with log messages).  For actions that
are done by the administrator, the audit log is still present (if
hidden by default).  If you look at the log at "debug" level of
severity (--watch-debug), you can still see the summary prints of the
cluster maps (excluding the pg map) when they change.

The motivation behind all this is to have a cluster log that tells a
somewhat human-readable story about failures and recovery.  There is
certainly scope for adding in some more logging at carefully chosen
points, such as:
 - if a user marks an OSD out which was down, we could log that to
indicate that the user pre-empted the timeout-based "out" path.
 - when a pool is created we could print a message to say that it is
being created, and then when all its PGs are out of "creating" we
could print another message to that effect.

The key thing is to make sure the messages are intelligible to
non-expert users and that they are not overwhelming: there are a set
of guidelines here http://docs.ceph.com/docs/master/dev/logging/
Actions #7

Updated by Sage Weil over 6 years ago

  • Status changed from Need More Info to Won't Fix

I don't think there is anything to fix here? It would be nice to have some new thing that gives you similar "iostat" type functionality that -w used to (sort of) give you. There is a card for this in trello, although I don't think anybody is working on it. Something in the style of daemonperf (which is modeled after dstat?) would be great.

Actions

Also available in: Atom PDF