Project

General

Profile

Actions

Bug #18184

closed

SimpleMessenger Pipe threads are spinning when idle

Added by Greg Farnum over 7 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See the ceph-users thread at https://www.mail-archive.com/ceph-users@lists.ceph.com/msg34275.html
On upgrading, users are seeing their CPU usage go way up 15 minutes after OSDs boot, and sock_recvmesg is at the top when using perf.

We got a hint of this issue in #14120, but didn't realize how critical the bug was or that it was a new issue rather than a rare and untested one.

The actual problem is that in https://github.com/ceph/ceph/pull/8416, we changed Pipe::tcp_read_wait() to return -errno instead of "-1" when calling poll() and getting a return value <=0. The intention was to return the error on failure, but the actual return value spec is

On success, a positive number is returned; this is the number of structures which have nonzero revents fields (in other words, those descriptors with events or errors reported).  A value of 0 indicates that the call timed out and no file descriptors were ready.  On error, -1 is returned, and errno is set appropriately.

This means we get 0 on timed-out sockets! And return -errno in that case means -0 (assuming it was initialized; otherwise we negate a garbage value that makes the outcome positive half the time!).
Then, Pipe::tcp_read() checks tcp_read_wait()'s return value is <0 to error out; otherwise it continues in to tcp_read_nonblocking(), which will loop on calling recv() as long as it gets back EAGAIN or EINTR (because it expects we have already validated there is data available to read).

To fix this, we need to translate the "0" response from poll() into an error code.


Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #18185: jewel: SimpleMessenger Pipe threads are spinning when idleResolvedSage WeilActions
Actions #2

Updated by Loïc Dachary over 7 years ago

  • Copied to Backport #18185: jewel: SimpleMessenger Pipe threads are spinning when idle added
Actions #3

Updated by Loïc Dachary over 7 years ago

  • Backport set to jewel
Actions #4

Updated by Loïc Dachary over 7 years ago

  • Status changed from In Progress to Resolved
Actions #5

Updated by Vikhyat Umrao over 7 years ago

We discussed yesterday and this needs to go to kraken.

Actions #6

Updated by Vikhyat Umrao over 7 years ago

  • Backport changed from jewel to jewel,kraken
Actions #7

Updated by Nathan Cutler over 7 years ago

Kraken backports have not started yet. Master is still being merged into kraken.

Actions #8

Updated by Vikhyat Umrao over 7 years ago

Thank you Nathan for information. I have removed 'kraken' from backport field.

Actions #9

Updated by Vikhyat Umrao over 7 years ago

  • Backport changed from jewel,kraken to jewel
Actions #10

Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)
Actions

Also available in: Atom PDF