Bug #18184
closedSimpleMessenger Pipe threads are spinning when idle
0%
Description
See the ceph-users thread at https://www.mail-archive.com/ceph-users@lists.ceph.com/msg34275.html
On upgrading, users are seeing their CPU usage go way up 15 minutes after OSDs boot, and sock_recvmesg is at the top when using perf.
We got a hint of this issue in #14120, but didn't realize how critical the bug was or that it was a new issue rather than a rare and untested one.
The actual problem is that in https://github.com/ceph/ceph/pull/8416, we changed Pipe::tcp_read_wait() to return -errno instead of "-1" when calling poll() and getting a return value <=0. The intention was to return the error on failure, but the actual return value spec is
On success, a positive number is returned; this is the number of structures which have nonzero revents fields (in other words, those descriptors with events or errors reported). A value of 0 indicates that the call timed out and no file descriptors were ready. On error, -1 is returned, and errno is set appropriately.
This means we get 0 on timed-out sockets! And return -errno in that case means -0 (assuming it was initialized; otherwise we negate a garbage value that makes the outcome positive half the time!).
Then, Pipe::tcp_read() checks tcp_read_wait()'s return value is <0 to error out; otherwise it continues in to tcp_read_nonblocking(), which will loop on calling recv() as long as it gets back EAGAIN or EINTR (because it expects we have already validated there is data available to read).
To fix this, we need to translate the "0" response from poll() into an error code.
Updated by Greg Farnum over 7 years ago
Updated by Loïc Dachary over 7 years ago
- Copied to Backport #18185: jewel: SimpleMessenger Pipe threads are spinning when idle added
Updated by Loïc Dachary over 7 years ago
- Status changed from In Progress to Resolved
Updated by Vikhyat Umrao over 7 years ago
We discussed yesterday and this needs to go to kraken.
Updated by Vikhyat Umrao over 7 years ago
- Backport changed from jewel to jewel,kraken
Updated by Nathan Cutler over 7 years ago
Kraken backports have not started yet. Master is still being merged into kraken.
Updated by Vikhyat Umrao over 7 years ago
Thank you Nathan for information. I have removed 'kraken' from backport field.
Updated by Vikhyat Umrao over 7 years ago
- Backport changed from jewel,kraken to jewel
Updated by Greg Farnum about 5 years ago
- Project changed from Ceph to Messengers
- Category deleted (
msgr)