Bug #53485: monstore: logm entries are not garbage collected - RADOS - Ceph

Actions

Copy link

Bug #53485

open

monstore: logm entries are not garbage collected

Added by Daniel Poelzleithner over 2 years ago. Updated almost 2 years ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

44511

Crash signature (v1):

Crash signature (v2):

Description

We had to run a ceph cluster with a damaged cephfs for a while that got deleted already. We suspect this was the culpit, but we experienced heavy growth on the
ceph-mon database. Our small 6 server cluster already has a 45 GB monstore.

ceph-monstore-tool /var/lib/ceph/mon/ceph-server6/ dump-keys | awk '{print $1}' | uniq -c                                                                        
    147 auth
      2 config
     11 health
1546347 logm
      2 mkfs
      3 mon_sync
      7 monitor
      1 monitor_store
   2459 paxos
      ...

There seems to be no limit in place for the number of logm messages stored, nor seems there be a garbage collection mechanism. Partially, the monstore was growing so fast, that we feared
that the server runs out of storage just by logm spam.

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Project changed from Ceph to RADOS
Category deleted (~~Monitor~~)

Actions

Copy link

Updated by Loïc Dachary over 2 years ago

Target version deleted (~~v15.2.15~~)

Actions

Copy link

Updated by Daniel Poelzleithner over 2 years ago

We just grew to wopping 80 gb metadata server. I'm out ideas here and don't know how to stop the growth.
Somebody added some logging and no cleaning. The resync of a mon server takes hours now and I fear the worst.

This is our config:

[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         fsid = d7c5c9c7-a227-4e33-ab43-3f4aa1eb0630
         mon_allow_pool_delete = true
         mon_host = 172.20.77.6 172.20.77.8 172.20.77.9 172.20.77.5 172.20.77.3
         osd_journal_size = 5120
         osd_pool_default_min_size = 2
         osd_pool_default_pg_autoscale_mode = on
         osd_pool_default_size = 3

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         beacon_grace = 800
         # debug_default = 10/10
         keyring = /var/lib/ceph/mds/ceph-$id/keyring
         mds_beacon_grace = 800

[mon]
         # debug_default = 10/10
         keyring = /var/lib/ceph/mon/ceph-$id/keyring
         ms_bind_msgr1 = true
         ms_bind_msgr2 = true
         # mon_compact_on_start = true

[osd]
         osd_max_backfills = 16
         osd_recovery_max_active = 4

[mds.server5]
         host = server5
         mds_standby_for_name = pve

[mds.server6]
         host = server6
         mds_standby_for_name = pve

[mds.server8]
         host = server8
         mds_standby_for_name = pve

[mon.server3]
         public_addr = 172.20.77.3

[mon.server6]
         public_addr = 172.20.77.6

[mon.server8]
         public_addr = 172.20.77.8

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Assignee set to Prashant D

Actions

Copy link

Updated by Daniel Poelzleithner over 2 years ago

I changed the paxos debug level to 20 and fond this in mon store log:

2021-12-16T18:35:07.814+0100 7fec66e79700 20 mon.server6@0(leader).paxosservice(logm 56666064..29067286) maybe_trim 56666064~29067236
2021-12-16T18:35:07.814+0100 7fec66e79700 10 mon.server6@0(leader).paxosservice(logm 56666064..29067286) maybe_trim trim_to 29067236 < first_committed 56666064

56666064 is the get_first_committed version of paxos service
29067236 is calculated in LogMonitor.cc as:

  unsigned max = g_conf()->mon_max_log_epochs;
  version_t version = get_last_committed();
  if (version > max)
    return version - max;

This means that for some reason first_commited is now a lot larger then the last commit version on disc. This is why the old versions are not deleted.

This internal state corruption causes the monstore to grow indefinitly.

Actions

Copy link

Updated by Daniel Poelzleithner over 2 years ago

fix is in progress

Actions

Copy link

Updated by Peter Razumovsky almost 2 years ago

We observed this several times on a customer side. each out of 3 mon store.db rapidly growing, had tons of logm keys and 1.4TB (!!!). We started restarting 200 osds and on some osd mon suddenly started to trim it's size from 1.4TB to 100MB. Therefore please fix this as soon as possible because we already faced this 4 times.

Actions

Copy link