luminous: mons do not trim maps until restarted
Reported by several users, most recently at https://marc.info/?l=ceph-devel&m=154955388914036&w=2
#1 Updated by Dan van der Ster about 1 month ago
Here's an example on v12.2.8. The mon db is normally trim at around 700MB, but after some backfilling its currently at 6.3GB on all mons.
cephbeesly-mon-2a00f134e5.cern.ch: 6.3G /var/lib/ceph/mon/ p01001532021656.cern.ch: 6.3G /var/lib/ceph/mon/ p05517715d82373.cern.ch: 6.3G /var/lib/ceph/mon/ p05517715y01595.cern.ch: 6.3G /var/lib/ceph/mon/ p05517715y58557.cern.ch: 6.3G /var/lib/ceph/mon/
We follow Sage's procedure to debug:
- enable debug_mon = 20 on all mons (*before* restarting) ceph tell mon.* injectargs '--debug-mon 20' - wait for 10 minutes or so to generate some logs - add 'debug mon = 20' to ceph.conf (on mons only) - restart the monitors - wait for them to start trimming - remove 'debug mon = 20' from ceph.conf (on mons only)
Following the mon restarts the db's shrunk to ~500MB:
cephbeesly-mon-2a00f134e5.cern.ch: 532M /var/lib/ceph/mon p01001532021656.cern.ch: 532M /var/lib/ceph/mon p05517715d82373.cern.ch: 532M /var/lib/ceph/mon p05517715y01595.cern.ch: 532M /var/lib/ceph/mon p05517715y58557.cern.ch: 532M /var/lib/ceph/mon
The logs are here: ceph-post-file: 877cc29d-e697-4f76-9d52-70f08511cfca
#2 Updated by Joao Eduardo Luis about 1 month ago
I have a feeling this is actually due to what led me to open this PR: https://github.com/ceph/ceph/pull/19076
The problem was that I was unable to reproduce it in mimic, so that was annoying enough to just let it linger. I'll get a branch with this for luminous, and hopefully someone will be able to test it.