Project

General

Profile

Actions

Bug #41195

closed

[msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queue

Added by 相洋 于 almost 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
SimpleMessenger
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph version: 10.2.10

In our production environment, we met monitor memory leak a few times.
We opened the jemalloc debug mode, and got the memory allocation information.
We finded memory leak in pipe:write, and the memory was increased day and night and at the end, the monitor process is killed by OOM.

Let's take a example, as there are monitor A and monitor B, monitor A is the leader.

At the begin, every thing is working all right.
At time C, Pipe sending message for monitor A to B has in_seq 3000 and in_seq_ack 3000. Then monitor B is restarted.
Monitor A will try to connect to monitor B for several times as the pipe is lossyless and state is open.
At time D, monitor B is up, and monitor A will connect to monitor B , and monitor B will accept the request.
Also at time D, monitor B probe monitor A , so monitor B will try to connect monitor A.(this pipe connection will be abandened later.)
When monitor B get the accept the request and find that there already has a pipe (state=connecting) and existing connect seq is 0, monitor B will send a session reset reply to monitor A.
Monitor A will reset the session ,in function was_ression_reset, only connect_seq and in-seq will reset to zero.
At last, in monitor A the pipe has in_seq 0 and in_seq_ack 3000, in monitor B the out_seq 0.
So the message seq below 3000 will stay in sent queue in monitor B process.

Although the community will cut the simple messenger in the next release,there still has lots of cluster running with simple messenger.
I hope the PR will be merged.


Related issues 1 (0 open1 closed)

Copied to Messengers - Backport #42843: mimic: [msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queueRejectedActions
Actions

Also available in: Atom PDF