Bug #41195
closed[msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queue
0%
Description
Ceph version: 10.2.10
In our production environment, we met monitor memory leak a few times.
We opened the jemalloc debug mode, and got the memory allocation information.
We finded memory leak in pipe:write, and the memory was increased day and night and at the end, the monitor process is killed by OOM.
Let's take a example, as there are monitor A and monitor B, monitor A is the leader.
At the begin, every thing is working all right.
At time C, Pipe sending message for monitor A to B has in_seq 3000 and in_seq_ack 3000. Then monitor B is restarted.
Monitor A will try to connect to monitor B for several times as the pipe is lossyless and state is open.
At time D, monitor B is up, and monitor A will connect to monitor B , and monitor B will accept the request.
Also at time D, monitor B probe monitor A , so monitor B will try to connect monitor A.(this pipe connection will be abandened later.)
When monitor B get the accept the request and find that there already has a pipe (state=connecting) and existing connect seq is 0, monitor B will send a session reset reply to monitor A.
Monitor A will reset the session ,in function was_ression_reset, only connect_seq and in-seq will reset to zero.
At last, in monitor A the pipe has in_seq 0 and in_seq_ack 3000, in monitor B the out_seq 0.
So the message seq below 3000 will stay in sent queue in monitor B process.
Although the community will cut the simple messenger in the next release,there still has lots of cluster running with simple messenger.
I hope the PR will be merged.