Cleanup #2093: ceph-client: messenger: the "to" parameter to read_partial() needs to go - Linux kernel client - Ceph

Actions

Copy link

Cleanup #2093

closed

ceph-client: messenger: the "to" parameter to read_partial() needs to go

Added by Alex Elder about 12 years ago. Updated almost 12 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Alex Elder

Category:

libceph

Target version:

% Done:

Tags:

Backport:

Reviewed:

Affected Versions:

Description

I have been doing some refactoring of the net/ceph/messenger.c. One of
my aims was to understand the how (and why) the "to" argument to
read_partial() is used. I've become fairly certain the code
is only correct if the value passed in "to" is the same as the
value of con->in_base_pos at the time of the call.

I wanted to verify this, however, so I put in a warning if that
condition did not hold. I ran some tests, and eventually the warning
tripped.

Based on the information I displayed on that event, I know that
the errant call to read_partial() is the one at the beginning of
read_partial_message(), which is reading in a message header.
(Specifically, I know that the number of bytes to be read was 53,
which is sizeof con->in_hdr.) At the time the warning tripped,
the passed-in "to" value was 0, but the value of con->in_base_pos
was 53.

The result of this is that read_partial() reads nothing, but
passes back a 1 (indicating success). As a result, read_partial_message()
performs a CRC on what it thinks is a successfully-read header. For
all I know, that could succeed (based on an old message header still
in the message buffer). Or if the CRC fails, -EBADMSG is returned
(which we ought to be robust enough to handle, but I haven't checked).

If an old header were considered good, I expect things might
proceed with reading in successive bytes as if the header were
legitimate.

But I'm just not sure. And therein lies a part of the problem. I've
been chasing this particular "to" parameter because it just looks
like trouble, and I believe I have confirmed that it is.

This little experiment has convinced me that I should finish my
quest to kill off the "to" parameter, despite the fact it may be
a little bit tricky to do so.

Actions

Copy link

Updated by Sage Weil about 12 years ago

I think it's right as is... all of those read calls are non-blocking. So the first time around in_base_pos is 0 and we'll read however much is in the socket buffer, say 430. The next time through, *to will start out at 0, the first few read_partial calls with add in the size of the structures we expect and add to *to, but see that we already read them (in_base_pos == 430).. until we get to the one we haven't (completely) read, and copy what is in the socket buffer into the correct position.

Does that make sense?

Actions

Copy link