Bug #53485
open
monstore: logm entries are not garbage collected
Added by Daniel Poelzleithner over 2 years ago.
Updated almost 2 years ago.
Description
We had to run a ceph cluster with a damaged cephfs for a while that got deleted already. We suspect this was the culpit, but we experienced heavy growth on the
ceph-mon database. Our small 6 server cluster already has a 45 GB monstore.
ceph-monstore-tool /var/lib/ceph/mon/ceph-server6/ dump-keys | awk '{print $1}' | uniq -c
147 auth
2 config
11 health
1546347 logm
2 mkfs
3 mon_sync
7 monitor
1 monitor_store
2459 paxos
...
There seems to be no limit in place for the number of logm messages stored, nor seems there be a garbage collection mechanism. Partially, the monstore was growing so fast, that we feared
that the server runs out of storage just by logm spam.
- Project changed from Ceph to RADOS
- Category deleted (
Monitor)
- Target version deleted (
v15.2.15)
We just grew to wopping 80 gb metadata server. I'm out ideas here and don't know how to stop the growth.
Somebody added some logging and no cleaning. The resync of a mon server takes hours now and I fear the worst.
This is our config:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
fsid = d7c5c9c7-a227-4e33-ab43-3f4aa1eb0630
mon_allow_pool_delete = true
mon_host = 172.20.77.6 172.20.77.8 172.20.77.9 172.20.77.5 172.20.77.3
osd_journal_size = 5120
osd_pool_default_min_size = 2
osd_pool_default_pg_autoscale_mode = on
osd_pool_default_size = 3
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
beacon_grace = 800
# debug_default = 10/10
keyring = /var/lib/ceph/mds/ceph-$id/keyring
mds_beacon_grace = 800
[mon]
# debug_default = 10/10
keyring = /var/lib/ceph/mon/ceph-$id/keyring
ms_bind_msgr1 = true
ms_bind_msgr2 = true
# mon_compact_on_start = true
[osd]
osd_max_backfills = 16
osd_recovery_max_active = 4
[mds.server5]
host = server5
mds_standby_for_name = pve
[mds.server6]
host = server6
mds_standby_for_name = pve
[mds.server8]
host = server8
mds_standby_for_name = pve
[mon.server3]
public_addr = 172.20.77.3
[mon.server6]
public_addr = 172.20.77.6
[mon.server8]
public_addr = 172.20.77.8
- Assignee set to Prashant D
I changed the paxos debug level to 20 and fond this in mon store log:
2021-12-16T18:35:07.814+0100 7fec66e79700 20 mon.server6@0(leader).paxosservice(logm 56666064..29067286) maybe_trim 56666064~29067236
2021-12-16T18:35:07.814+0100 7fec66e79700 10 mon.server6@0(leader).paxosservice(logm 56666064..29067286) maybe_trim trim_to 29067236 < first_committed 56666064
56666064 is the get_first_committed version of paxos service
29067236 is calculated in LogMonitor.cc as:
unsigned max = g_conf()->mon_max_log_epochs;
version_t version = get_last_committed();
if (version > max)
return version - max;
This means that for some reason first_commited is now a lot larger then the last commit version on disc. This is why the old versions are not deleted.
This internal state corruption causes the monstore to grow indefinitly.
We observed this several times on a customer side. each out of 3 mon store.db rapidly growing, had tons of logm keys and 1.4TB (!!!). We started restarting 200 osds and on some osd mon suddenly started to trim it's size from 1.4TB to 100MB. Therefore please fix this as soon as possible because we already faced this 4 times.
Sorry, forgot to add - we faced this issue on v15.2.13 and on v14.2.22 as well.
- Pull request ID set to 44511
- Status changed from New to Fix Under Review
- Assignee deleted (
Prashant D)
Also available in: Atom
PDF