Project

General

Profile

Bug #4324

kernel messenger: leaks in ceph_con_in_msg_alloc()

Added by Alex Elder about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I posted some code for review and Greg caught what looks
like a memory leak in error handling. I looked a bit into
it but am giving up for now, because I think this function
is a bit of a mess and it's going to take a bit more focus
to get this right.

I'll get to this soon, but in the mean time I wanted to
document it so I can move on.

Here's a message that documents the discussion.
http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg12910.html

History

#1 Updated by Greg Farnum about 11 years ago

FYI I mentioned this to Sage and he said the code has been refactored incorrectly a couple of times so it's definitely become a mess. But my money is that a BUG_ON(con->in_msg) is actually appropriate. :)

#2 Updated by Alex Elder about 11 years ago

  • Status changed from New to In Progress

FYI I mentioned this to Sage and he said the code has been
refactored incorrectly a couple of times so it's definitely
become a mess.

That would likely be my handiwork. It's unfortunate if it
was known to be incorrect and allowed to get committed.

But my money is that a BUG_ON(con->in_msg) is actually appropriate. :)

The caller actually does that in this case. I'm looking at
it a little more broadly at the moment though, to see if I can
straighten things out a bit. The logic with the skip parameter
is a little janky.

#3 Updated by Alex Elder about 11 years ago

  • Status changed from In Progress to Fix Under Review

The following patch has been posted for review.

(This patch is available as the top commit in branch
"review/wip-4324" in the ceph-client git repository.)

    [PATCH] libceph: clean up skipped message logic

    In ceph_con_in_msg_alloc() it is possible for a connection's
    alloc_msg method to indicate an incoming message should be skipped.
    By default, read_partial_message() initializes the skip variable
    to 0 before it gets provided to ceph_con_in_msg_alloc().

    The osd client, mon client, and mds client each supply an alloc_msg
    method.  The mds client always assigns skip to be 0.

    The other two leave the skip value of as-is, or assigns it to zero,
    except:
        - if no (osd or mon) request having the given tid is found, in
          which case skip is set to 1 and NULL is returned; or
        - in the osd client, if the data of the reply message is not
          adequate to hold the message to be read, it assigns skip
          value 1 and returns NULL.
    So the returned message pointer will always be NULL if skip is ever
    non-zero.

    Clean up the logic a bit in ceph_con_in_msg_alloc() to make this
    state of affairs more obvious.  Add a comment explaining how a null
    message pointer can mean either a message that should be skipped or
    a problem allocating a message.

    This resolves:
        http://tracker.ceph.com/issues/4324

    Reported-by: Greg Farnum <greg@inktank.com>
    Signed-off-by: Alex Elder <elder@inktank.com>

#4 Updated by Sage Weil about 11 years ago

I seem to remember having a hand in it as well. :)

#5 Updated by Alex Elder about 11 years ago

  • Status changed from Fix Under Review to Resolved

This has been committed to the ceph-client testing branch:

d11db70 libceph: clean up skipped message logic

Also available in: Atom PDF