Project

General

Profile

Actions

Bug #2298

closed

rbd: broken encode_op for big-endian hosts?

Added by Sage Weil about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
rbd
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Date: Sat, 14 Apr 2012 03:34:20 +0100
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Sage Weil <sage@newdream.net>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: osd_req_encode_op() breakage?

static void osd_req_encode_op(struct ceph_osd_request *req,
                              struct ceph_osd_op *dst,
                              struct ceph_osd_req_op *src)
{
        dst->op = cpu_to_le16(src->op);

        switch (dst->op) {
        case CEPH_OSD_OP_READ:
        case CEPH_OSD_OP_WRITE:

is an interesting thing to say, seeing that CEPH_OSD_OP_READ et.al. are
all host-endian...  Should that be "switch (src->op)" instead?  AFAICS,
that sucker had appeared in that form back in
commit 68b4476b0bc13fef18266b4140309a30e86739d2
Author: Yehuda Sadeh <yehuda@hq.newdream.net>
Date:   Tue Apr 6 15:01:27 2010 -0700

    ceph: messenger and osdc changes for rbd

and it seems to be broken on big-endian hosts.  Doesn't look like a misspelled
le16_to_cpu() either, since dst->op ends up going on the wire...

I'm really mystified by that - it looks like it must've shown up immediately
on big-endian hosts; it's not like it was an obscure codepath, after all...
Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Actions #1

Updated by Alex Elder about 12 years ago

I haven't looked at this in any detail but I presume Al is correct.
We don't have any big endian hardware anywhere, do we? Might be
worth finding an old PowerPC or MIPS machine for occasional tests
to verify these sorts of things.

Actions #2

Updated by Sage Weil about 12 years ago

there are some old g5's in the closet here at aon that we can use.

in the past we've found/fixed these issues with sparse.. yehuda runs it periodically and it picks up the __le32 vs u32 eerrors.

Actions #3

Updated by Alex Elder about 12 years ago

  • Status changed from New to In Progress
  • Assignee set to Alex Elder

I sent a note to the various lists Al Viro posted to, to confirm the
bug (wasn't sure whether Sage had or not).

I also implemented his suggested fix and posted that to the lists and
will be running it through tests as soon as it's built.

We'll want this one to go to Linus for the current release cycle (3.4).

I also promised to ensure we were going to be doing endianness testing
for our software on a regular basis.

Actions #4

Updated by Alex Elder about 12 years ago

  • Status changed from In Progress to Resolved

This has been fixed. I have been testing it in a private branch
and will shortly be updating the ceph-client testing to have it.

commit d8bc680e64a856ed2489f2df89e9561bc3aa70b2
Author: Alex Elder <>
Date: Fri Apr 20 15:49:43 2012 -0500

ceph: osd_client: fix endianness bug in osd_req_encode_op()
Actions

Also available in: Atom PDF