Project

General

Profile

Bug #41195

[msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queue

Added by 相洋 于 3 months ago. Updated 3 months ago.

Status:
Need Review
Priority:
Normal
Assignee:
Category:
SimpleMessenger
Target version:
Start date:
08/12/2019
Due date:
08/31/2019
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Ceph version: 10.2.10

In our production environment, we met monitor memory leak a few times.
We opened the jemalloc debug mode, and got the memory allocation information.
We finded memory leak in pipe:write, and the memory was increased day and night and at the end, the monitor process is killed by OOM.

Let's take a example, as there are monitor A and monitor B, monitor A is the leader.

At the begin, every thing is working all right.
At time C, Pipe sending message for monitor A to B has in_seq 3000 and in_seq_ack 3000. Then monitor B is restarted.
Monitor A will try to connect to monitor B for several times as the pipe is lossyless and state is open.
At time D, monitor B is up, and monitor A will connect to monitor B , and monitor B will accept the request.
Also at time D, monitor B probe monitor A , so monitor B will try to connect monitor A.(this pipe connection will be abandened later.)
When monitor B get the accept the request and find that there already has a pipe (state=connecting) and existing connect seq is 0, monitor B will send a session reset reply to monitor A.
Monitor A will reset the session ,in function was_ression_reset, only connect_seq and in-seq will reset to zero.
At last, in monitor A the pipe has in_seq 0 and in_seq_ack 3000, in monitor B the out_seq 0.
So the message seq below 3000 will stay in sent queue in monitor B process.

Although the community will cut the simple messenger in the next release,there still has lots of cluster running with simple messenger.
I hope the PR will be merged.

History

#2 Updated by Kefu Chai 3 months ago

  • Status changed from New to Need Review
  • Assignee set to 相洋 于
  • Backport set to mimic
  • Pull request ID set to 29592

Also available in: Atom PDF