Project

General

Profile

Bug #41157

mgr: memory leak causing allocation failures

Added by Patrick Donnelly 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Pull request ID:
Crash signature:

Description

Smoking gun job: /ceph/teuthology-archive/pdonnell-2019-08-08_18:11:18-fs-wip-pdonnell-testing-20190807.132723-distro-basic-smithi/4199128

root       10635  0.0  0.0 243252  4636 ?        Ss   18:46   0:00 sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term ceph-mgr -f --cluster ceph -i x
root       10663  0.0  0.0 151632  6184 ?        S    18:46   0:00 /usr/bin/python /bin/daemon-helper term ceph-mgr -f --cluster ceph -i x
root       10665  142 33.9 13638124 11097244 ?   Ssl  18:46  93:16 ceph-mgr -f --cluster ceph -i x

Using 150% CPU and 10.7GB of RAM (always increasing). Eventually the job fails as in:

/ceph/teuthology-archive/pdonnell-2019-08-07_15:57:31-fs-wip-pdonnell-testing-20190807.132723-distro-basic-smithi/4193689/teuthology.log

Because the system RAM is exhausted.

ceph-mgr log is spewing out non-stop, which is probably related to the cause:

2019-08-08T19:53:14.436+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.436+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in
2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:

Related issues

Related to RADOS - Bug #41145: osd: bad alloc exception Duplicate

History

#1 Updated by Patrick Donnelly 3 months ago

  • Subject changed from osd: abort in to mon|mgr: memory leak causing allocation failures

core dumps are just symptoms.

#2 Updated by Patrick Donnelly 3 months ago

  • Subject changed from mon|mgr: memory leak causing allocation failures to mgr: memory leak causing allocation failures
  • Description updated (diff)
  • Priority changed from Urgent to Immediate

#3 Updated by Patrick Donnelly 3 months ago

  • Project changed from RADOS to mgr

#4 Updated by Patrick Donnelly 3 months ago

  • Status changed from New to Resolved
  • Assignee set to Kefu Chai
  • Pull request ID set to 29546

#5 Updated by Josh Durgin 2 months ago

  • Related to Bug #41145: osd: bad alloc exception added

Also available in: Atom PDF