Fix #6780
closed
monitor errors when checking for quorum status
Added by Tamilarasi muthamizhan over 10 years ago.
Updated about 10 years ago.
Description
logs: ubuntu@teuthology:/a/teuthology-2013-11-13_14:42:07-upgrade-parallel-next-testing-basic-vps/97245
pasting the output from mon below,
2013-11-13T17:35:50.096 DEBUG:teuthology.orchestra.run:Running [10.214.138.59]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set metadata pgp_num 34'
2013-11-13T17:35:50.125 DEBUG:teuthology.task.ceph:Quorum: [u'a', u'b', u'c']
2013-11-13T17:35:50.409 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.406340 7fad97f76700 -1 bad boost::get: key val is not type long
2013-11-13T17:35:50.409 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.408206 7fad97f76700 -1 0x7fad97f73368
2013-11-13T17:35:50.410 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.408439 7fad97f76700 -1 bad boost::get: key val is not type float
2013-11-13T17:35:50.411 INFO:teuthology.task.ceph.mon.a.err:[10.214.138.59]: 2013-11-13 20:35:50.410450 7fad97f76700 -1 0x7fad97f73368
- Subject changed from monitor warnings when checking for quorum status to monitor errors when checking for quorum status
- Priority changed from High to Urgent
- Priority changed from Urgent to High
what version was this on? I think sage fixed this particular issue last sprint.
- Status changed from New to Need More Info
- Status changed from Need More Info to In Progress
this happens when some osds and mons are upgraded to next branch [emperor]
recent logs: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-11-19_19:40:02-upgrade-parallel-master-testing-basic-plana/109584
- Priority changed from High to Urgent
- Status changed from In Progress to 4
Reason for this: the code in place to keep compatibility with previous versions of the monitor with regard to the CephString change that triggered #6796.
What the monitor currently does is attempting to first read an integer out of the provided value; if it fails, it will treat the value as a string and strict_strtoll() it. Then it will assume a float is also a possibility and do the very same thing. This is meant to keep compatibility with previous versions of the monitor that may supply said values -- although the float is never used.
So, considering that 'cmd_getval()' will always output to stderr its inability to parse a given value as a given type, we reach the point of always ending up outputting this error when the client obtains the command descriptions from an Emperor monitor. I propose whitelisting these messages for the time being, although that sucks if there is some other place in the monitor where some value happens to be misinterpreted. With the fix for #6796 being released we may not have another choice anyway, considering that we really want those calls to be there to interpret current Emperor, patched-for-6796-Emperor and Dumpling.
We could however force 'cmd_getval()' to output to dout() instead of derr, but I don't know if that is eligible for backport.
- Tracker changed from Bug to Fix
- Status changed from 4 to In Progress
can no longer reproduce this on firefly. any objections on closing?
- Status changed from In Progress to Closed
Also available in: Atom
PDF