Project

General

Profile

Actions

Bug #64611

open

Inconsistent usage of the return codes in the MDS code base

Added by Leonid Usov 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
task(easy), task(intern)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph cluster may comprise of daemons running on different platforms with incompatible numeric values of the errno definitions. To maintain coherency, the approach taken by Ceph is that daemons must exchange messages where the error codes on the wire are always encoded using a common error namespace. Historically, this namespace was chosen to be the Linux errno list, and it's often seen in the source code as a set of CEPHFS_xxx error code defines.

When writing a daemon, it's reasonable to always rely on the native platform errno definitions, as interacting with the underlying OS will inevitably respond with native codes. The way Ceph helps developers is by defining a set of conversion routines that map native error codes to the linux error codes, or in other words, the CEPHFS_* namespace, and back. These functions are

ceph_to_hostos_errno
hostos_to_ceph_errno

Furthermore, Ceph provides a custom type errorcode32_t which calls the corresponding conversion functions during encode and decode:

  void encode(ceph::buffer::list &bl) const {
    using ceph::encode;
    __s32 newcode = hostos_to_ceph_errno(code);
    encode(newcode, bl);
  }
  void decode(ceph::buffer::list::const_iterator &bl) {
    using ceph::decode;
    decode(code, bl);
    code = ceph_to_hostos_errno(code);
  }

This convenient type allows for consistent interop between daemons running on different host platforms without an explicit errno conversion required from the daemon's code.

The MCommandReply class already makes use of the errorcode32_t for the result code, meaning that when constructing the response the daemon must provide its native error code and not the one from the CEPHFS_* namespace.

However, MClientReply doesn't do this automatically. The good news is that the message follows the requirement of having the linux error code on the wire, but it's only doing the conversion to the host system in the getter:


  int get_result() const {
    #ifdef _WIN32
    // libclient and libcephfs return CEPHFS_E* errors, which are basically
    // Linux errno codes. If we convert mds errors to host errno values, we
    // end up mixing error codes.
    //
    // For Windows, we'll preserve the original error value, which is expected
    // to be a linux (CEPHFS_E*) error. It may be worth doing the same for
    // other platforms.
    return head.result;
    #else
    return ceph_to_hostos_errno((__s32)(__u32)head.result);
    #endif
  }

  void set_result(int r) { head.result = r; }

The code above is suboptimal and someone unaware of the internals will be surprised to get a different code than what's set. This unfortunate imbalance has spread over the MDS code base and currently, all places that construct an MClientReply have to provide a CEPHFS_* error code.

While it is handled correctly in most cases, this inconsistency makes it impossible to write a generic internal MDS handler that produces a result code usable both in MClientReply and MCommandReply.

We should change MClientReply to use the errorcode32_t just like MCommandReply does, and update all places in the MDS to use native error codes.

No data to display

Actions

Also available in: Atom PDF