Project

General

Profile

Bug #3212

librados: failed to decode message of type 59 v1: buffer::end_of_buffer

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
librados
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

from ML:


Date: Mon, 24 Sep 2012 22:14:39 +0100
From: John Leach <john@brightbox.co.uk>
To: ceph-devel@vger.kernel.org
Subject: librados: failed to decode message of type 59 v1: buffer::end_of_buffer

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ANSI_X3.4-1968" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

I'm calling rados_ioctx_pool_stat and it's hanging.

logs show:

> 2012-09-24 21:30:08.411947 7f0041251700 failed to decode message of type 59
v1: buffer::end_of_buffer
> 2012-09-24 21:30:08.412286 7f0043255700 monclient: hunting for new mon

my local client is the Ubuntu Precise provided librados2 package
(0.41-1ubuntu2.1)

my cluster is running the unstable packages provided by Ceph
(0.51-1precise).

If I upgrade my client just up to the stable 0.48.1argonaut-1precise
package, it fixes the problem.

If the protocol changed, then I'd expect librados would let me know. Is
there some way to check this? rados_version returns the version of the
library, but I can't see how to get the version of the cluster (or quite
how I'd compare them in a meaningful way).

Thanks,

John.

Associated revisions

Revision 25ea0696 (diff)
Added by Sage Weil over 11 years ago

osd: make pool_stat_t encoding backward compatible with v0.41 and older

In particular, this is the encoding that is used in precise.

Fixes: #3212
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 11 years ago

see wip-3212

we weren't encoding using the pre-v0.42 pool_stat_t.. or at least that's what it looks like, i wasn't able to reproduce this.

#2 Updated by John Leach over 11 years ago

Ok, with packages 0.51-700-g1a9c8c7-1precise from wip-3212 and back to 0.41-1ubuntu2.1 on the client, it now refuses to connect entirely (but works fine with 0.48.1argonaut-1precise on the client). I presume that was the aim:

2012-09-25 10:35:33.119441 7fe5a607d700 -- 10.132.138.182:0/1002987 >> 10.232.29.142:6789/0 pipe(0xde91f0 sd=3 pgs=0 cs=0 l=0).connect protocol feature mismatch, my 1ffa < peer 41ffa missing 40000

I've got questions about detecting this in code, but I'll do that back on the mailing list.

#3 Updated by Sage Weil over 11 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF