Project

General

Profile

Bug #18184

SimpleMessenger Pipe threads are spinning when idle

Added by Greg Farnum over 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
Start date:
12/07/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

See the ceph-users thread at https://www.mail-archive.com/ceph-users@lists.ceph.com/msg34275.html
On upgrading, users are seeing their CPU usage go way up 15 minutes after OSDs boot, and sock_recvmesg is at the top when using perf.

We got a hint of this issue in #14120, but didn't realize how critical the bug was or that it was a new issue rather than a rare and untested one.

The actual problem is that in https://github.com/ceph/ceph/pull/8416, we changed Pipe::tcp_read_wait() to return -errno instead of "-1" when calling poll() and getting a return value <=0. The intention was to return the error on failure, but the actual return value spec is

On success, a positive number is returned; this is the number of structures which have nonzero revents fields (in other words, those descriptors with events or errors reported).  A value of 0 indicates that the call timed out and no file descriptors were ready.  On error, -1 is returned, and errno is set appropriately.

This means we get 0 on timed-out sockets! And return -errno in that case means -0 (assuming it was initialized; otherwise we negate a garbage value that makes the outcome positive half the time!).
Then, Pipe::tcp_read() checks tcp_read_wait()'s return value is <0 to error out; otherwise it continues in to tcp_read_nonblocking(), which will loop on calling recv() as long as it gets back EAGAIN or EINTR (because it expects we have already validated there is data available to read).

To fix this, we need to translate the "0" response from poll() into an error code.


Related issues

Copied to Ceph - Backport #18185: jewel: SimpleMessenger Pipe threads are spinning when idle Resolved

History

#2 Updated by Loic Dachary over 2 years ago

  • Copied to Backport #18185: jewel: SimpleMessenger Pipe threads are spinning when idle added

#3 Updated by Loic Dachary over 2 years ago

  • Backport set to jewel

#4 Updated by Loic Dachary over 2 years ago

  • Status changed from In Progress to Resolved

#5 Updated by Vikhyat Umrao over 2 years ago

We discussed yesterday and this needs to go to kraken.

#6 Updated by Vikhyat Umrao over 2 years ago

  • Backport changed from jewel to jewel,kraken

#7 Updated by Nathan Cutler over 2 years ago

Kraken backports have not started yet. Master is still being merged into kraken.

#8 Updated by Vikhyat Umrao over 2 years ago

Thank you Nathan for information. I have removed 'kraken' from backport field.

#9 Updated by Vikhyat Umrao over 2 years ago

  • Backport changed from jewel,kraken to jewel

#10 Updated by Greg Farnum 5 months ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)

Also available in: Atom PDF