Project

General

Profile

Bug #22662

ceph osd df json output validation reported invalid numbers (-nan) (jewel)

Added by Enrico Labedzki 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
12/13/2016
Due date:
% Done:

100%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):

Description

Hi,

we have a monitoring script which parses the 'ceph osd df -f json' output, but from time to time it will happen, that one or more OSDs are down and the JSON object is then invalid.

Bildschirmfoto 2018-01-11 um 11.06.01.png View - json validator (125 KB) Enrico Labedzki, 01/11/2018 10:29 AM

Bildschirmfoto 2018-01-25 um 09.39.38.jpg View (517 KB) Enrico Labedzki, 01/25/2018 08:47 AM

Bildschirmfoto 2018-01-25 um 09.50.04.png View (760 KB) Enrico Labedzki, 01/25/2018 08:52 AM


Related issues

Copied to RADOS - Backport #22866: jewel: ceph osd df json output validation reported invalid numbers (-nan) (jewel) Resolved

History

#1 Updated by Greg Farnum 3 months ago

  • Project changed from Ceph to mgr
  • Category deleted (ceph cli)

#2 Updated by Sage Weil 3 months ago

  • Project changed from mgr to RADOS
  • Subject changed from ceph osd df json output validation reported invalid numbers to ceph osd df json output validation reported invalid numbers (-nan)
  • Status changed from New to Verified
  • Priority changed from Normal to Urgent

1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use

#3 Updated by Sage Weil 3 months ago

  • Subject changed from ceph osd df json output validation reported invalid numbers (-nan) to ceph osd df json output validation reported invalid numbers (-nan) (jewel)

#4 Updated by Nathan Cutler 3 months ago

  • Backport set to jewel luminous

#5 Updated by Chang Liu 3 months ago

  • Assignee set to Chang Liu

#6 Updated by Chang Liu 3 months ago

This bug has been fixed by https://github.com/ceph/ceph/pull/13531. We should backport it to Jewel.

#7 Updated by Chang Liu 3 months ago

Sage Weil wrote:

1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use

when there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?

#8 Updated by Enrico Labedzki 3 months ago

Chang Liu wrote:

Sage Weil wrote:

1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use

when there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?

Why not simply set those values to zero as Sage mentioned before, which should be ok i think.

Those zero values can than be handled otherwise.

It will look like this (see attachment), so i can see all OSDs in down state.

Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.

#9 Updated by Chang Liu 3 months ago

Enrico Labedzki wrote:

Chang Liu wrote:

Sage Weil wrote:

1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use

when there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?

Why not simply set those values to zero as Sage mentioned before, which should be ok i think.

Those zero values can than be handled otherwise.

It will look like this (see attachment), so i can see all OSDs in down state.

Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.

thanks, I afraid that using zero as NaN/Inf is not a perfect solution. in some cases, 0 is a valid value(likes wr_io_rate), we will hide the true issue.

#10 Updated by Enrico Labedzki 3 months ago

Chang Liu wrote:

Enrico Labedzki wrote:

Chang Liu wrote:

Sage Weil wrote:

1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use

when there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?

Why not simply set those values to zero as Sage mentioned before, which should be ok i think.

Those zero values can than be handled otherwise.

It will look like this (see attachment), so i can see all OSDs in down state.

Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.

thanks, I afraid that using zero as NaN/Inf is not a perfect solution. in some cases, 0 is a valid value(likes wr_io_rate), we will hide the true issue.

yes you are right and what is with -1 as value (indication), can this be a solution!?

Or did that also clash with with any values?

#11 Updated by Chang Liu 3 months ago

Enrico Labedzki wrote:

Chang Liu wrote:

Enrico Labedzki wrote:

Chang Liu wrote:

Sage Weil wrote:

1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use

when there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?

Why not simply set those values to zero as Sage mentioned before, which should be ok i think.

Those zero values can than be handled otherwise.

It will look like this (see attachment), so i can see all OSDs in down state.

Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.

thanks, I afraid that using zero as NaN/Inf is not a perfect solution. in some cases, 0 is a valid value(likes wr_io_rate), we will hide the true issue.

yes you are right and what is with -1 as value (indication), can this be a solution!?

Or did that also clash with with any values?

I do not think using a normal integer as a invalid number is a good solution. in Python json.dumps function. it will raise a ValueError when there is a NaN/Inf number. and json.dumps has a param called allow_nan. json.dumps will dump NaN directly when allow_nan is True.

#12 Updated by Enrico Labedzki 3 months ago

Chang Liu wrote:

Enrico Labedzki wrote:

Chang Liu wrote:

Enrico Labedzki wrote:

Chang Liu wrote:

Sage Weil wrote:

1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use

when there is a NaN or Inf in JsonFormatter, what should we do? throw a exception directly ?

Why not simply set those values to zero as Sage mentioned before, which should be ok i think.

Those zero values can than be handled otherwise.

It will look like this (see attachment), so i can see all OSDs in down state.

Before we fixed this by ourself with a json validate (hack), the graphs looking this (did you see the gaps where is nothing), which isn't very helpful.

thanks, I afraid that using zero as NaN/Inf is not a perfect solution. in some cases, 0 is a valid value(likes wr_io_rate), we will hide the true issue.

yes you are right and what is with -1 as value (indication), can this be a solution!?

Or did that also clash with with any values?

I do not think using a normal integer as a invalid number is a good solution. in Python json.dumps function. it will raise a ValueError when there is a NaN/Inf number. and json.dumps has a param called allow_nan. json.dumps will dump NaN directly when allow_nan is True.

Ok you are right, maybe not a good choice to use integer values as error indicator.

I did read an gambled a little with python, perl and ruby JSON parsers (we mainly use ruby as our prefered language in my company).

As the json specification told, valid values are (string,number,object,array,true,false and null), so why not add null as value, which should work in python, perl and ruby and gives (perl undef, python None, ruby == nil) and can be handled by the programer and the best of it, the json object keeps intact, there is no need to raise a exception or something else.

What do you think!?

#13 Updated by Nathan Cutler 3 months ago

+1 for null, which is an English word and hence far more comprehensible than "NaN", which is what I would call "Programmer Slang".

"undef", "undefined", or "out of range" are other candidates (from a purely linguistic perspective)

Or (re-reading the bug description) possibly "error", "OSD down", "n/a" (which stands for "not applicable" or "not available")

Oh, wait - nevermind, this is just a request to backport https://github.com/ceph/ceph/pull/13531 to jewel. Marking appropriately.

#14 Updated by Nathan Cutler 3 months ago

  • Backport changed from jewel luminous to jewel

#15 Updated by Nathan Cutler 3 months ago

  • Status changed from Verified to Pending Backport

#16 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #22866: jewel: ceph osd df json output validation reported invalid numbers (-nan) (jewel) added

#17 Updated by Nathan Cutler 2 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF