Bug #41157: mgr: memory leak causing allocation failures - mgr - Ceph

Actions

Copy link

Bug #41157

closed

mgr: memory leak causing allocation failures

Added by Patrick Donnelly over 4 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

Kefu Chai

Category:

Target version:

Ceph - v15.0.0

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

29546

Crash signature (v1):

Crash signature (v2):

Description

Smoking gun job: /ceph/teuthology-archive/pdonnell-2019-08-08_18:11:18-fs-wip-pdonnell-testing-20190807.132723-distro-basic-smithi/4199128

root       10635  0.0  0.0 243252  4636 ?        Ss   18:46   0:00 sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term ceph-mgr -f --cluster ceph -i x
root       10663  0.0  0.0 151632  6184 ?        S    18:46   0:00 /usr/bin/python /bin/daemon-helper term ceph-mgr -f --cluster ceph -i x
root       10665  142 33.9 13638124 11097244 ?   Ssl  18:46  93:16 ceph-mgr -f --cluster ceph -i x

Using 150% CPU and 10.7GB of RAM (always increasing). Eventually the job fails as in:

/ceph/teuthology-archive/pdonnell-2019-08-07_15:57:31-fs-wip-pdonnell-testing-20190807.132723-distro-basic-smithi/4193689/teuthology.log

Because the system RAM is exhausted.

ceph-mgr log is spewing out non-stop, which is probably related to the cause:

2019-08-08T19:53:14.436+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.436+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Patrick Donnelly over 4 years ago

Subject changed from osd: abort in to mon|mgr: memory leak causing allocation failures

core dumps are just symptoms.

Actions

Copy link

Updated by Patrick Donnelly over 4 years ago

Subject changed from mon|mgr: memory leak causing allocation failures to mgr: memory leak causing allocation failures
Description updated (diff)
Priority changed from Urgent to Immediate