Bug #14120: Pipe::do_recv() may loop infinitely - Messengers - Ceph

Actions

Copy link

Bug #14120

closed

Pipe::do_recv() may loop infinitely

Added by Zheng Yan over 8 years ago. Updated about 5 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

After cutting down network between osd and client, I saw one OSD thread use 100% CPU, The thread loops Pipe::do_recv

#0 Pipe::do_recv (this=0x22d09000, buf=0x23262000 "", len=4096, flags=64) at /home/zhyan/Ceph/ceph/src/msg/simple/Pipe.cc:2508
#1 0x000000000205de08 in Pipe::buffered_recv (this=0x22d09000, buf=0x7f78a4c297fc "\001", len=53, flags=64) at /home/zhyan/Ceph/ceph/src/msg/simple/Pipe.cc:2552
#2 0x000000000205debe in Pipe::tcp_read_nonblocking (this=0x22d09000, buf=0x7f78a4c297f0 "\265", <incomplete sequence \356>, len=53) at /home/zhyan/Ceph/ceph/src/msg/simple/Pipe.cc:2570
#3 0x000000000205d927 in Pipe::tcp_read (this=0x22d09000, buf=0x7f78a4c297f0 "\265", <incomplete sequence \356>, len=53) at /home/zhyan/Ceph/ceph/src/msg/simple/Pipe.cc:2457
#4 0x000000000205970f in Pipe::read_message (this=0x22d09000, pm=0x7f78a4c29c18, auth_handler=0x226aaca0) at /home/zhyan/Ceph/ceph/src/msg/simple/Pipe.cc:1901
#5 0x00000000020564b8 in Pipe::reader (this=0x22d09000) at /home/zhyan/Ceph/ceph/src/msg/simple/Pipe.cc:1581
#6 0x000000000205e788 in Pipe::Reader::entry (this=0x22d09018) at /home/zhyan/Ceph/ceph/src/msg/simple/Pipe.h:50
#7 0x0000000001f66b60 in Thread::entry_wrapper (this=0x22d09018) at /home/zhyan/Ceph/ceph/src/common/Thread.cc:84
#8 0x0000000001f66aae in Thread::_entry_func (arg=0x22d09018) at /home/zhyan/Ceph/ceph/src/common/Thread.cc:66
#9 0x00007f78cfef460a in start_thread () from /lib64/libpthread.so.0
#10 0x00007f78ce521a9d in clone () from /lib64/libc.so.6

Files

perf.out.txt (844 KB) perf.out.txt

Zheng Yan, 05/05/2016 12:35 PM

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Zheng Yan over 8 years ago

Status changed from New to Fix Under Review

https://github.com/ceph/ceph/pull/6971

Actions

Copy link

Updated by Zheng Yan almost 8 years ago

File perf.out.txt perf.out.txt added

I encountered this again. I ran 'perf record -g -e cycles:u -p xxx', perf report > perf.out.txt

Actions

Copy link

Updated by Sage Weil almost 8 years ago

Status changed from Fix Under Review to Resolved

Actions

Copy link

Updated by Sage Weil over 7 years ago

Status changed from Resolved to Pending Backport
Backport set to jewel

Actions

Copy link

Updated by Abhishek Lekshmanan over 7 years ago

Copied to Backport #18155: Pipe::do_recv() may loop infinitely added

Actions

Copy link

Updated by Greg Farnum over 7 years ago

Status changed from Pending Backport to In Progress

So, I think this patch was a mistake and we need to revert it wherever it's appeared.

From "man 2 recv":

RETURN VALUE
       These calls return the number of bytes received, or -1 if an error occurred.  In the
event of an error, errno is set to indicate the error.  The return value will be 0 when the
peer has performed an orderly shutdown.

ERRORS
       These are some standard errors generated by the socket layer.  Additional errors may
be generated and returned from the underlying protocol modules; see their manual pages.

       EAGAIN or EWOULDBLOCK
              The socket is marked nonblocking and the receive operation would block, or a
receive timeout had been set and the timeout expired before data was received.  POSIX.1-2001
allows either error to be returned for this case, and does not require these constants to
have the same value, so a portable application should check for both  possibilities.

So, the only way we get EAGAIN is if we have no data available when making the ::recv() call, or it would block for another undetermined reason. But we always make sure data is available before doing invoking this function (by calling tcp_read_wait())! So marking a failure on EAGAIN is either blatantly incorrect, or masking some other kind of failure as turned out to be the root cause here (see #18184 for more details).

Actions

Copy link

Updated by Greg Farnum over 7 years ago

According to http://pubs.opengroup.org/onlinepubs/009695399/functions/recv.html,

[EAGAIN] or [EWOULDBLOCK]
The socket's file descriptor is marked O_NONBLOCK and no data is waiting to be received; or
MSG_OOB is set and no out-of-band data is available and either the socket's file descriptor
is marked O_NONBLOCK or the socket does not support blocking to await out-of-band data.

That makes it a little more explicit than the Linux manpage does that we shouldn't get EAGAIN if we have pre-checked for data, but I'm nervous about the edges. Sage suggests printing out a warning if we get EAGAIN here; we could also make it some kind of config that defaults to looping on EAGAIN but lets us disable that if we find errors in the future?

Actions

Copy link

Updated by Greg Farnum over 7 years ago

Revert PR at https://github.com/ceph/ceph/pull/12374 may get more discussion of options.

Actions

Copy link

Updated by Greg Farnum over 7 years ago

Assignee set to Greg Farnum

Actions

Copy link

#10

Updated by Zheng Yan over 7 years ago

Greg Farnum wrote:

So, I think this patch was a mistake and we need to revert it wherever it's appeared.

From "man 2 recv":
[...]

So, the only way we get EAGAIN is if we have no data available when making the ::recv() call, or it would block for another undetermined reason. But we always make sure data is available before doing invoking this function (by calling tcp_read_wait())! So marking a failure on EAGAIN is either blatantly incorrect, or masking some other kind of failure as turned out to be the root cause here (see #18184 for more details).

As I stated in commit message, tcp_read_wait() does not guarantee the socket has data (it return 0 if there are pending data). For example, the loop happens if Pipe::tcp_read() want to read 100 bytes, there are only 50 bytes pending data and the socket has no data.

Actions

Copy link

#11

Updated by Alexey Sheplyakov over 7 years ago

Greg Farnum wrote:

That makes it a little more explicit than the Linux manpage does that we shouldn't get EAGAIN if we have pre-checked for data

It's perfectly fine to get EAGAIN in non-blocking mode (even if the socket is reportedly ready for read):

Under Linux, select() may report a socket file descriptor as "ready for reading",
while nevertheless a subsequent read blocks. This could for example happen when data
has arrived but upon examination has wrong checksum and is discarded. There may be other
circumstances in which a file descriptor is spuriously reported as ready.
Thus it may be safer to use O_NONBLOCK on sockets that should not block

https://linux.die.net/man/2/select

Actions

Copy link

#12