Project

General

Profile

Bug #38322

luminous: mons do not trim maps until restarted

Added by Sage Weil 7 months ago. Updated 26 days ago.

Status:
Need Review
Priority:
High
Category:
Correctness/Safety
Target version:
-
Start date:
02/14/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:

Description

Reported by several users, most recently at https://marc.info/?l=ceph-devel&m=154955388914036&w=2

History

#1 Updated by Dan van der Ster 7 months ago

Here's an example on v12.2.8. The mon db is normally trim at around 700MB, but after some backfilling its currently at 6.3GB on all mons.

cephbeesly-mon-2a00f134e5.cern.ch:
6.3G    /var/lib/ceph/mon/

p01001532021656.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715d82373.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715y01595.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715y58557.cern.ch:
6.3G    /var/lib/ceph/mon/

We follow Sage's procedure to debug:

- enable debug_mon = 20 on all mons (*before* restarting)
   ceph tell mon.* injectargs '--debug-mon 20'
- wait for 10 minutes or so to generate some logs
- add 'debug mon = 20' to ceph.conf (on mons only)
- restart the monitors
- wait for them to start trimming
- remove 'debug mon = 20' from ceph.conf (on mons only)

Following the mon restarts the db's shrunk to ~500MB:


cephbeesly-mon-2a00f134e5.cern.ch:
532M    /var/lib/ceph/mon

p01001532021656.cern.ch:
532M    /var/lib/ceph/mon

p05517715d82373.cern.ch:
532M    /var/lib/ceph/mon

p05517715y01595.cern.ch:
532M    /var/lib/ceph/mon

p05517715y58557.cern.ch:
532M    /var/lib/ceph/mon

The logs are here: ceph-post-file: 877cc29d-e697-4f76-9d52-70f08511cfca

#2 Updated by Joao Eduardo Luis 7 months ago

I have a feeling this is actually due to what led me to open this PR: https://github.com/ceph/ceph/pull/19076

The problem was that I was unable to reproduce it in mimic, so that was annoying enough to just let it linger. I'll get a branch with this for luminous, and hopefully someone will be able to test it.

#3 Updated by Joao Eduardo Luis 7 months ago

  • Category set to Correctness/Safety
  • Assignee set to Joao Eduardo Luis
  • Component(RADOS) Monitor added

#4 Updated by Swami Reddy 7 months ago

seen this issue with 10.2.4

#5 Updated by Neha Ojha 26 days ago

  • Status changed from Need More Info to Need Review
  • Pull request ID set to 19076

Also available in: Atom PDF