Project

General

Profile

Bug #51637

mgr/insights: mgr consumes excessive amounts of memory

Added by Thore K over 2 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
insights module
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of problem

After replacing a failed osd, and the ongoing replication the degraded pgs (roughly 4tb, ongoging for about 2 days), there have been a couple of oomkills and the mgr continues to consume lots of memory (15gb, increases rather quicky).

Furthermore the logfile is growing at similar pace since it keeps logging the all placement groups and an error that it can't dump the pg state into mgr/insights/health_history/2021-07-11_19

mgr set_store mon returned -27: error: entry size limited to 65536 bytes. Use 'mon config key max entry size' to manually adjust"

The thing it tries to dump is a 22k line json (when formatted), which contains a history of all placement groups since some point in time (i couldn't pin that down, but the same message is included multiple times, see attached json).

Environment

  • ceph version string: ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
  • Platform (OS/distro/release): CentOS Linux release 8.4.2105
  • Cluster details (nodes, monitors, OSDs): 6 nodes, 5 mons, 42 osds

How reproducible

Seems to be ongoing. Restarting the mgr does not fix it (see bumps in the graph, these were me restarting the mgr).

ceph-mgr-memgrowth.png View - mgr memory growth (30.2 KB) Thore K, 07/12/2021 04:47 PM

ceph_mgrdump.json.gz - JSOM the mgr desperately tries to store in the mon (258 KB) Thore K, 07/12/2021 04:51 PM


Related issues

Duplicates mgr - Bug #48269: insights module can generate too much data, fail to put in config-key Resolved

History

#1 Updated by Ernesto Puerta over 2 years ago

  • Subject changed from mgr/dashboard: mgr consumes excessive amounts of memory to mgr/insights: mgr consumes excessive amounts of memory
  • Category changed from ceph-mgr to insights module

#2 Updated by Thore K over 2 years ago

I've been able to reproduce this through the following operations:

systemctl stop ceph-osd@10
ceph osd out 10
ceph osd destroy 10 --yes-i-really-mean-it
ceph-volume lvm zap /dev/sda --destroy
ceph-volume lvm create --osd-id 10 --bluestore --crush-device-class=hdd --dmcrypt --data /dev/sda --block.db /dev/sdh4

And again, while the backfilling takes place the mgr memory consumption grows rapidly.

#3 Updated by Brad Hubbard over 2 years ago

  • Assignee set to Brad Hubbard

#4 Updated by Brad Hubbard over 2 years ago

  • Duplicates Bug #48269: insights module can generate too much data, fail to put in config-key added

#5 Updated by Konstantin Shalygin over 2 years ago

  • Status changed from New to Fix Under Review
  • Affected Versions v14.2.22 added
  • Affected Versions deleted (v14.2.23)

#6 Updated by Konstantin Shalygin over 2 years ago

  • Status changed from Fix Under Review to Duplicate

Also available in: Atom PDF