Bug #41195
closed[msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queue
0%
Description
Ceph version: 10.2.10
In our production environment, we met monitor memory leak a few times.
We opened the jemalloc debug mode, and got the memory allocation information.
We finded memory leak in pipe:write, and the memory was increased day and night and at the end, the monitor process is killed by OOM.
Let's take a example, as there are monitor A and monitor B, monitor A is the leader.
At the begin, every thing is working all right.
At time C, Pipe sending message for monitor A to B has in_seq 3000 and in_seq_ack 3000. Then monitor B is restarted.
Monitor A will try to connect to monitor B for several times as the pipe is lossyless and state is open.
At time D, monitor B is up, and monitor A will connect to monitor B , and monitor B will accept the request.
Also at time D, monitor B probe monitor A , so monitor B will try to connect monitor A.(this pipe connection will be abandened later.)
When monitor B get the accept the request and find that there already has a pipe (state=connecting) and existing connect seq is 0, monitor B will send a session reset reply to monitor A.
Monitor A will reset the session ,in function was_ression_reset, only connect_seq and in-seq will reset to zero.
At last, in monitor A the pipe has in_seq 0 and in_seq_ack 3000, in monitor B the out_seq 0.
So the message seq below 3000 will stay in sent queue in monitor B process.
Although the community will cut the simple messenger in the next release,there still has lots of cluster running with simple messenger.
I hope the PR will be merged.
Updated by Kefu Chai over 4 years ago
- Status changed from New to Fix Under Review
- Assignee set to 相洋 于
- Backport set to mimic
- Pull request ID set to 29592
Updated by Yuri Weinstein over 4 years ago
Updated by Nathan Cutler over 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42843: mimic: [msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queue added
Updated by Nathan Cutler over 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".