Bug #3212
librados: failed to decode message of type 59 v1: buffer::end_of_buffer
Status:
Resolved
Priority:
High
Assignee:
-
Category:
librados
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
from ML:
Date: Mon, 24 Sep 2012 22:14:39 +0100 From: John Leach <john@brightbox.co.uk> To: ceph-devel@vger.kernel.org Subject: librados: failed to decode message of type 59 v1: buffer::end_of_buffer [ The following text is in the "ISO-8859-1" character set. ] [ Your display is set for the "ANSI_X3.4-1968" character set. ] [ Some characters may be displayed incorrectly. ] Hi, I'm calling rados_ioctx_pool_stat and it's hanging. logs show: > 2012-09-24 21:30:08.411947 7f0041251700 failed to decode message of type 59 v1: buffer::end_of_buffer > 2012-09-24 21:30:08.412286 7f0043255700 monclient: hunting for new mon my local client is the Ubuntu Precise provided librados2 package (0.41-1ubuntu2.1) my cluster is running the unstable packages provided by Ceph (0.51-1precise). If I upgrade my client just up to the stable 0.48.1argonaut-1precise package, it fixes the problem. If the protocol changed, then I'd expect librados would let me know. Is there some way to check this? rados_version returns the version of the library, but I can't see how to get the version of the cluster (or quite how I'd compare them in a meaningful way). Thanks, John.
Associated revisions
osd: make pool_stat_t encoding backward compatible with v0.41 and older
In particular, this is the encoding that is used in precise.
Fixes: #3212
Signed-off-by: Sage Weil <sage@inktank.com>
History
#1 Updated by Sage Weil over 11 years ago
see wip-3212
we weren't encoding using the pre-v0.42 pool_stat_t.. or at least that's what it looks like, i wasn't able to reproduce this.
#2 Updated by John Leach over 11 years ago
Ok, with packages 0.51-700-g1a9c8c7-1precise from wip-3212 and back to 0.41-1ubuntu2.1 on the client, it now refuses to connect entirely (but works fine with 0.48.1argonaut-1precise on the client). I presume that was the aim:
2012-09-25 10:35:33.119441 7fe5a607d700 -- 10.132.138.182:0/1002987 >> 10.232.29.142:6789/0 pipe(0xde91f0 sd=3 pgs=0 cs=0 l=0).connect protocol feature mismatch, my 1ffa < peer 41ffa missing 40000
I've got questions about detecting this in code, but I'll do that back on the mailing list.
#3 Updated by Sage Weil over 11 years ago
- Status changed from Fix Under Review to Resolved