Actions
Bug #41157
closedmgr: memory leak causing allocation failures
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
Smoking gun job: /ceph/teuthology-archive/pdonnell-2019-08-08_18:11:18-fs-wip-pdonnell-testing-20190807.132723-distro-basic-smithi/4199128
root 10635 0.0 0.0 243252 4636 ? Ss 18:46 0:00 sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term ceph-mgr -f --cluster ceph -i x root 10663 0.0 0.0 151632 6184 ? S 18:46 0:00 /usr/bin/python /bin/daemon-helper term ceph-mgr -f --cluster ceph -i x root 10665 142 33.9 13638124 11097244 ? Ssl 18:46 93:16 ceph-mgr -f --cluster ceph -i x
Using 150% CPU and 10.7GB of RAM (always increasing). Eventually the job fails as in:
/ceph/teuthology-archive/pdonnell-2019-08-07_15:57:31-fs-wip-pdonnell-testing-20190807.132723-distro-basic-smithi/4193689/teuthology.log
Because the system RAM is exhausted.
ceph-mgr log is spewing out non-stop, which is probably related to the cause:
2019-08-08T19:53:14.436+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.436+0000 7fce857fa700 10 module telemetry health checks: 2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks: 2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks: 2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks: 2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks: 2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks: 2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks: 2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks: 2019-08-08T19:53:14.437+0000 7fce857fa700 20 mgr[telemetry] Not sending report until user re-opts-in 2019-08-08T19:53:14.437+0000 7fce857fa700 10 module telemetry health checks:
Updated by Patrick Donnelly almost 5 years ago
- Subject changed from osd: abort in to mon|mgr: memory leak causing allocation failures
core dumps are just symptoms.
Updated by Patrick Donnelly almost 5 years ago
- Subject changed from mon|mgr: memory leak causing allocation failures to mgr: memory leak causing allocation failures
- Description updated (diff)
- Priority changed from Urgent to Immediate
Updated by Patrick Donnelly almost 5 years ago
- Project changed from RADOS to mgr
Updated by Patrick Donnelly almost 5 years ago
- Status changed from New to Resolved
- Assignee set to Kefu Chai
- Pull request ID set to 29546
Updated by Josh Durgin almost 5 years ago
- Related to Bug #41145: osd: bad alloc exception added
Actions