Project

General

Profile

Bug #41195

[msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queue

Added by 相洋 于 over 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
SimpleMessenger
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph version: 10.2.10

In our production environment, we met monitor memory leak a few times.
We opened the jemalloc debug mode, and got the memory allocation information.
We finded memory leak in pipe:write, and the memory was increased day and night and at the end, the monitor process is killed by OOM.

Let's take a example, as there are monitor A and monitor B, monitor A is the leader.

At the begin, every thing is working all right.
At time C, Pipe sending message for monitor A to B has in_seq 3000 and in_seq_ack 3000. Then monitor B is restarted.
Monitor A will try to connect to monitor B for several times as the pipe is lossyless and state is open.
At time D, monitor B is up, and monitor A will connect to monitor B , and monitor B will accept the request.
Also at time D, monitor B probe monitor A , so monitor B will try to connect monitor A.(this pipe connection will be abandened later.)
When monitor B get the accept the request and find that there already has a pipe (state=connecting) and existing connect seq is 0, monitor B will send a session reset reply to monitor A.
Monitor A will reset the session ,in function was_ression_reset, only connect_seq and in-seq will reset to zero.
At last, in monitor A the pipe has in_seq 0 and in_seq_ack 3000, in monitor B the out_seq 0.
So the message seq below 3000 will stay in sent queue in monitor B process.

Although the community will cut the simple messenger in the next release,there still has lots of cluster running with simple messenger.
I hope the PR will be merged.


Related issues

Copied to Messengers - Backport #42843: mimic: [msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queue Rejected

History

#2 Updated by Kefu Chai over 4 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to 相洋 于
  • Backport set to mimic
  • Pull request ID set to 29592

#3 Updated by Yuri Weinstein over 4 years ago

相洋 于 wrote:

PR:
https://github.com/ceph/ceph/pull/29592

merged

#4 Updated by Nathan Cutler over 4 years ago

  • Status changed from Fix Under Review to Pending Backport

#5 Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #42843: mimic: [msg/simple] in_seq_ack in not reset to zero when pipe session is reset, as a result, messages may not be released in sent queue added

#6 Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF