Bug #22662
closedceph osd df json output validation reported invalid numbers (-nan) (jewel)
100%
Description
Hi,
we have a monitoring script which parses the 'ceph osd df -f json' output, but from time to time it will happen, that one or more OSDs are down and the JSON object is then invalid.
Files
Updated by Greg Farnum over 6 years ago
- Project changed from Ceph to mgr
- Category deleted (
ceph cli)
Updated by Sage Weil over 6 years ago
- Project changed from mgr to RADOS
- Subject changed from ceph osd df json output validation reported invalid numbers to ceph osd df json output validation reported invalid numbers (-nan)
- Status changed from New to 12
- Priority changed from Normal to Urgent
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use
Updated by Sage Weil over 6 years ago
- Subject changed from ceph osd df json output validation reported invalid numbers (-nan) to ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Updated by Chang Liu about 6 years ago
This bug has been fixed by https://github.com/ceph/ceph/pull/13531. We should backport it to Jewel.
Updated by Chang Liu about 6 years ago
Sage Weil wrote:
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use
when there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?
Updated by Enrico Labedzki about 6 years ago
- File Bildschirmfoto 2018-01-25 um 09.39.38.jpg Bildschirmfoto 2018-01-25 um 09.39.38.jpg added
- File Bildschirmfoto 2018-01-25 um 09.50.04.png Bildschirmfoto 2018-01-25 um 09.50.04.png added
Chang Liu wrote:
Sage Weil wrote:
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to usewhen there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?
Why not simply set those values to zero as Sage mentioned before, which should be ok i think.
Those zero values can than be handled otherwise.
It will look like this (see attachment), so i can see all OSDs in down state.
Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.
Updated by Chang Liu about 6 years ago
Enrico Labedzki wrote:
Chang Liu wrote:
Sage Weil wrote:
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to usewhen there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?
Why not simply set those values to zero as Sage mentioned before, which should be ok i think.
Those zero values can than be handled otherwise.
It will look like this (see attachment), so i can see all OSDs in down state.
Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.
thanks, I afraid that using zero as NaN/Inf is not a perfect solution. in some cases, 0 is a valid value(likes wr_io_rate), we will hide the true issue.
Updated by Enrico Labedzki about 6 years ago
Chang Liu wrote:
Enrico Labedzki wrote:
Chang Liu wrote:
Sage Weil wrote:
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to usewhen there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?
Why not simply set those values to zero as Sage mentioned before, which should be ok i think.
Those zero values can than be handled otherwise.
It will look like this (see attachment), so i can see all OSDs in down state.
Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.
thanks, I afraid that using zero as NaN/Inf is not a perfect solution. in some cases, 0 is a valid value(likes wr_io_rate), we will hide the true issue.
yes you are right and what is with -1 as value (indication), can this be a solution!?
Or did that also clash with with any values?
Updated by Chang Liu about 6 years ago
Enrico Labedzki wrote:
Chang Liu wrote:
Enrico Labedzki wrote:
Chang Liu wrote:
Sage Weil wrote:
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to usewhen there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?
Why not simply set those values to zero as Sage mentioned before, which should be ok i think.
Those zero values can than be handled otherwise.
It will look like this (see attachment), so i can see all OSDs in down state.
Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.
thanks, I afraid that using zero as NaN/Inf is not a perfect solution. in some cases, 0 is a valid value(likes wr_io_rate), we will hide the true issue.
yes you are right and what is with -1 as value (indication), can this be a solution!?
Or did that also clash with with any values?
I do not think using a normal integer as a invalid number is a good solution. in Python json.dumps function. it will raise a ValueError when there is a NaN/Inf number. and json.dumps has a param called allow_nan. json.dumps will dump NaN directly when allow_nan is True.
Updated by Enrico Labedzki about 6 years ago
Chang Liu wrote:
Enrico Labedzki wrote:
Chang Liu wrote:
Enrico Labedzki wrote:
Chang Liu wrote:
Sage Weil wrote:
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to usewhen there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?
Why not simply set those values to zero as Sage mentioned before, which should be ok i think.
Those zero values can than be handled otherwise.
It will look like this (see attachment), so i can see all OSDs in down state.
Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.
thanks, I afraid that using zero as NaN/Inf is not a perfect solution. in some cases, 0 is a valid value(likes wr_io_rate), we will hide the true issue.
yes you are right and what is with -1 as value (indication), can this be a solution!?
Or did that also clash with with any values?
I do not think using a normal integer as a invalid number is a good solution. in Python json.dumps function. it will raise a ValueError when there is a NaN/Inf number. and json.dumps has a param called allow_nan. json.dumps will dump NaN directly when allow_nan is True.
Ok you are right, maybe not a good choice to use integer values as error indicator.
I did read an gambled a little with python, perl and ruby JSON parsers (we mainly use ruby as our prefered language in my company).
As the json specification told, valid values are (string,number,object,array,true,false and null), so why not add null as value, which should work in python, perl and ruby and gives (perl undef, python None, ruby == nil) and can be handled by the programer and the best of it, the json object keeps intact, there is no need to raise a exception or something else.
What do you think!?
Updated by Nathan Cutler about 6 years ago
+1 for null, which is an English word and hence far more comprehensible than "NaN", which is what I would call "Programmer Slang".
"undef", "undefined", or "out of range" are other candidates (from a purely linguistic perspective)
Or (re-reading the bug description) possibly "error", "OSD down", "n/a" (which stands for "not applicable" or "not available")
Oh, wait - nevermind, this is just a request to backport https://github.com/ceph/ceph/pull/13531 to jewel. Marking appropriately.
Updated by Nathan Cutler about 6 years ago
- Backport changed from jewel luminous to jewel
Updated by Nathan Cutler about 6 years ago
- Status changed from 12 to Pending Backport
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #22866: jewel: ceph osd df json output validation reported invalid numbers (-nan) (jewel) added
Updated by Nathan Cutler about 6 years ago
- Status changed from Pending Backport to Resolved