Project

General

Profile

Actions

Fix #6780

closed

monitor errors when checking for quorum status

Added by Tamilarasi muthamizhan over 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
Urgent
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

logs: ubuntu@teuthology:/a/teuthology-2013-11-13_14:42:07-upgrade-parallel-next-testing-basic-vps/97245

pasting the output from mon below,

2013-11-13T17:35:50.096 DEBUG:teuthology.orchestra.run:Running [10.214.138.59]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set metadata pgp_num 34'
2013-11-13T17:35:50.125 DEBUG:teuthology.task.ceph:Quorum: [u'a', u'b', u'c']
2013-11-13T17:35:50.409 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.406340 7fad97f76700 -1 bad boost::get: key val is not type long
2013-11-13T17:35:50.409 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.408206 7fad97f76700 -1 0x7fad97f73368
2013-11-13T17:35:50.410 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.408439 7fad97f76700 -1 bad boost::get: key val is not type float
2013-11-13T17:35:50.411 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.410450 7fad97f76700 -1 0x7fad97f73368

Actions #1

Updated by Tamilarasi muthamizhan over 10 years ago

  • Subject changed from monitor warnings when checking for quorum status to monitor errors when checking for quorum status
  • Priority changed from High to Urgent
Actions #2

Updated by Tamilarasi muthamizhan over 10 years ago

  • Priority changed from Urgent to High
Actions #3

Updated by Joao Eduardo Luis over 10 years ago

what version was this on? I think sage fixed this particular issue last sprint.

Actions #4

Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from New to Need More Info
Actions #5

Updated by Tamilarasi muthamizhan over 10 years ago

  • Status changed from Need More Info to In Progress

this happens when some osds and mons are upgraded to next branch [emperor]

recent logs: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-11-19_19:40:02-upgrade-parallel-master-testing-basic-plana/109584

Actions #6

Updated by Tamilarasi muthamizhan over 10 years ago

  • Priority changed from High to Urgent
Actions #7

Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from In Progress to 4

Reason for this: the code in place to keep compatibility with previous versions of the monitor with regard to the CephString change that triggered #6796.

What the monitor currently does is attempting to first read an integer out of the provided value; if it fails, it will treat the value as a string and strict_strtoll() it. Then it will assume a float is also a possibility and do the very same thing. This is meant to keep compatibility with previous versions of the monitor that may supply said values -- although the float is never used.

So, considering that 'cmd_getval()' will always output to stderr its inability to parse a given value as a given type, we reach the point of always ending up outputting this error when the client obtains the command descriptions from an Emperor monitor. I propose whitelisting these messages for the time being, although that sucks if there is some other place in the monitor where some value happens to be misinterpreted. With the fix for #6796 being released we may not have another choice anyway, considering that we really want those calls to be there to interpret current Emperor, patched-for-6796-Emperor and Dumpling.

We could however force 'cmd_getval()' to output to dout() instead of derr, but I don't know if that is eligible for backport.

Actions #8

Updated by Joao Eduardo Luis over 10 years ago

  • Tracker changed from Bug to Fix
Actions #9

Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from 4 to In Progress
Actions #10

Updated by Joao Eduardo Luis about 10 years ago

can no longer reproduce this on firefly. any objections on closing?

Actions #11

Updated by Joao Eduardo Luis about 10 years ago

  • Status changed from In Progress to Closed
Actions

Also available in: Atom PDF