Project

General

Profile

Bug #38322

luminous: mons do not trim maps until restarted

Added by Sage Weil over 1 year ago. Updated 10 days ago.

Status:
Closed
Priority:
High
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature:

Description

Reported by several users, most recently at https://marc.info/?l=ceph-devel&m=154955388914036&w=2


Related issues

Related to Ceph - Backport #45403: luminous: mon/OSDMonitor: maps not trimmed if osds are down Rejected
Related to Ceph - Bug #45400: mon/OSDMonitor: maps not trimmed if osds are down Pending Backport

History

#1 Updated by Dan van der Ster over 1 year ago

Here's an example on v12.2.8. The mon db is normally trim at around 700MB, but after some backfilling its currently at 6.3GB on all mons.

cephbeesly-mon-2a00f134e5.cern.ch:
6.3G    /var/lib/ceph/mon/

p01001532021656.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715d82373.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715y01595.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715y58557.cern.ch:
6.3G    /var/lib/ceph/mon/

We follow Sage's procedure to debug:

- enable debug_mon = 20 on all mons (*before* restarting)
   ceph tell mon.* injectargs '--debug-mon 20'
- wait for 10 minutes or so to generate some logs
- add 'debug mon = 20' to ceph.conf (on mons only)
- restart the monitors
- wait for them to start trimming
- remove 'debug mon = 20' from ceph.conf (on mons only)

Following the mon restarts the db's shrunk to ~500MB:


cephbeesly-mon-2a00f134e5.cern.ch:
532M    /var/lib/ceph/mon

p01001532021656.cern.ch:
532M    /var/lib/ceph/mon

p05517715d82373.cern.ch:
532M    /var/lib/ceph/mon

p05517715y01595.cern.ch:
532M    /var/lib/ceph/mon

p05517715y58557.cern.ch:
532M    /var/lib/ceph/mon

The logs are here: ceph-post-file: 877cc29d-e697-4f76-9d52-70f08511cfca

#2 Updated by Joao Eduardo Luis over 1 year ago

I have a feeling this is actually due to what led me to open this PR: https://github.com/ceph/ceph/pull/19076

The problem was that I was unable to reproduce it in mimic, so that was annoying enough to just let it linger. I'll get a branch with this for luminous, and hopefully someone will be able to test it.

#3 Updated by Joao Eduardo Luis over 1 year ago

  • Category set to Correctness/Safety
  • Assignee set to Joao Eduardo Luis
  • Component(RADOS) Monitor added

#4 Updated by Swami Reddy over 1 year ago

seen this issue with 10.2.4

#5 Updated by Neha Ojha 12 months ago

  • Status changed from Need More Info to Fix Under Review
  • Pull request ID set to 19076

#6 Updated by Joao Eduardo Luis 10 days ago

  • Status changed from Fix Under Review to Resolved

#7 Updated by Joao Eduardo Luis 10 days ago

  • Status changed from Resolved to Closed

#8 Updated by Joao Eduardo Luis 10 days ago

  • Related to Backport #45403: luminous: mon/OSDMonitor: maps not trimmed if osds are down added

#9 Updated by Joao Eduardo Luis 10 days ago

  • Related to Bug #45400: mon/OSDMonitor: maps not trimmed if osds are down added

Also available in: Atom PDF