Project

General

Profile

Fix #6780

monitor errors when checking for quorum status

Added by Tamilarasi muthamizhan over 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

logs: ubuntu@teuthology:/a/teuthology-2013-11-13_14:42:07-upgrade-parallel-next-testing-basic-vps/97245

pasting the output from mon below,

2013-11-13T17:35:50.096 DEBUG:teuthology.orchestra.run:Running [10.214.138.59]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set metadata pgp_num 34'
2013-11-13T17:35:50.125 DEBUG:teuthology.task.ceph:Quorum: [u'a', u'b', u'c']
2013-11-13T17:35:50.409 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.406340 7fad97f76700 -1 bad boost::get: key val is not type long
2013-11-13T17:35:50.409 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.408206 7fad97f76700 -1 0x7fad97f73368
2013-11-13T17:35:50.410 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.408439 7fad97f76700 -1 bad boost::get: key val is not type float
2013-11-13T17:35:50.411 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.410450 7fad97f76700 -1 0x7fad97f73368

History

#1 Updated by Tamilarasi muthamizhan over 10 years ago

  • Subject changed from monitor warnings when checking for quorum status to monitor errors when checking for quorum status
  • Priority changed from High to Urgent

#2 Updated by Tamilarasi muthamizhan over 10 years ago

  • Priority changed from Urgent to High

#3 Updated by Joao Eduardo Luis over 10 years ago

what version was this on? I think sage fixed this particular issue last sprint.

#4 Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from New to Need More Info

#5 Updated by Tamilarasi muthamizhan over 10 years ago

  • Status changed from Need More Info to In Progress

this happens when some osds and mons are upgraded to next branch [emperor]

recent logs: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-11-19_19:40:02-upgrade-parallel-master-testing-basic-plana/109584

#6 Updated by Tamilarasi muthamizhan over 10 years ago

  • Priority changed from High to Urgent

#7 Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from In Progress to 4

Reason for this: the code in place to keep compatibility with previous versions of the monitor with regard to the CephString change that triggered #6796.

What the monitor currently does is attempting to first read an integer out of the provided value; if it fails, it will treat the value as a string and strict_strtoll() it. Then it will assume a float is also a possibility and do the very same thing. This is meant to keep compatibility with previous versions of the monitor that may supply said values -- although the float is never used.

So, considering that 'cmd_getval()' will always output to stderr its inability to parse a given value as a given type, we reach the point of always ending up outputting this error when the client obtains the command descriptions from an Emperor monitor. I propose whitelisting these messages for the time being, although that sucks if there is some other place in the monitor where some value happens to be misinterpreted. With the fix for #6796 being released we may not have another choice anyway, considering that we really want those calls to be there to interpret current Emperor, patched-for-6796-Emperor and Dumpling.

We could however force 'cmd_getval()' to output to dout() instead of derr, but I don't know if that is eligible for backport.

#8 Updated by Joao Eduardo Luis over 10 years ago

  • Tracker changed from Bug to Fix

#9 Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from 4 to In Progress

#10 Updated by Joao Eduardo Luis about 10 years ago

can no longer reproduce this on firefly. any objections on closing?

#11 Updated by Joao Eduardo Luis about 10 years ago

  • Status changed from In Progress to Closed

Also available in: Atom PDF