Bug #20995
closedceph -w does not exit, but it doesn't display any messages either
0%
Description
So, as of luminous, "ceph -w" doesn't update in real time anymore. This is mentioned in the release notes.
As far as I can tell, "ceph -w" in luminous does the following:
1. print the same info as "ceph -s"
2. go into an infinite wait loop, doing nothing
Can we fix that somehow? Either make the behavior the same as "ceph -s" (i.e. exit instead of entering the infinite wait loop) or just deprecate/drop the command?
Updated by Greg Farnum over 6 years ago
- Priority changed from Normal to Urgent
This seems pretty not-great to me. :( Do we know where the change came from?
I suspect it was because of the central log and paxos version dump changes but I've found ceph -w to be pretty useful...
Updated by Neha Ojha over 6 years ago
Not sure what the expected output of ceph -w should be now, since after https://github.com/ceph/ceph/pull/16345, some part of its job should be done by --watch-channel.
Updated by Greg Farnum over 6 years ago
I don't think it does anything else besides that and printing the same stuff as "ceph -s"?
I didn't dig in to see exactly why it's behaving the way it is. From my perspective I think it should print out cluster state updates by default — so we could just have it turn on the "cluster" watch channel by default. That would probably come closest to preserving the old behavior we wanted it to have.
Another option is to have it print out a message about using the watch-channel flag and then exiting if the user doesn't invoke anything — as best I can tell if you don't use that flag it is a worse invocation of "ceph -s" that probably nobody is going to use that way on purpose! John may have thoughts on this if you ping him.
Updated by Neha Ojha over 6 years ago
- Status changed from 12 to Need More Info
I wouldn't say that ceph -w only shows ceph status. I ran an osd out command and this was logged by ceph -w along with health check updates for the PG state changes. Though the amount of information logged by ceph -w was less than ceph -w --watch-channel=*.
For other commands like osd in, osd pool create etc. I don't see any information being logged by ceph -w.
I guess we need to find out whether this was done by design or there is some issue with the implementation.
Updated by John Spray over 6 years ago
From ceph-devel
On Thu, Aug 17, 2017 at 2:27 AM, Neha Ojha <nojha@redhat.com> wrote: > Hi John, > > I am working on the following tracker issue: > http://tracker.ceph.com/issues/20995. > I was wondering if you could help me figure out a couple of things - > > 1. What should be the expected output of only using ceph -w? Adding > --watch-channel=* logs things properly, but as for ceph -w alone, I am > unable to understand why I see very few log messages(in some cases > none). "ceph -w" just follows the cluster log at "info" severity, so if there is nothing going to the cluster log, the command doesn't output anything. It used to also include the audit log, but that is now filtered out by default, because it's usually not interesting (the administrator doesn't need to be told what they just typed), and because it's very hard to read (the log lines are basically JSON dumps). All that content is still going to the log file, it's just not visible by default in ceph -w. This is causing some confusion, because historically we had code that wrote the PG status to the cluster log continuously (every 5 seconds or so), so people are accustomed to always seeing something. Also the old documentation talked about this command "watching the cluster's events" which created the impression that this was something other than a log tailing command. > 2. For which cases would it show log messages? Like for example, for > "ceph osd out" it does, but not for "ceph osd in", "ceph osd pool > create" etc. In general you'll see messages about bad or unexpected things (every failing health check now comes with log messages). For actions that are done by the administrator, the audit log is still present (if hidden by default). If you look at the log at "debug" level of severity (--watch-debug), you can still see the summary prints of the cluster maps (excluding the pg map) when they change. The motivation behind all this is to have a cluster log that tells a somewhat human-readable story about failures and recovery. There is certainly scope for adding in some more logging at carefully chosen points, such as: - if a user marks an OSD out which was down, we could log that to indicate that the user pre-empted the timeout-based "out" path. - when a pool is created we could print a message to say that it is being created, and then when all its PGs are out of "creating" we could print another message to that effect. The key thing is to make sure the messages are intelligible to non-expert users and that they are not overwhelming: there are a set of guidelines here http://docs.ceph.com/docs/master/dev/logging/
Updated by Sage Weil over 6 years ago
- Status changed from Need More Info to Won't Fix
I don't think there is anything to fix here? It would be nice to have some new thing that gives you similar "iostat" type functionality that -w used to (sort of) give you. There is a card for this in trello, although I don't think anybody is working on it. Something in the style of daemonperf (which is modeled after dstat?) would be great.