Project

General

Profile

Actions

Bug #38322

closed

luminous: mons do not trim maps until restarted

Added by Sage Weil about 5 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
High
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Reported by several users, most recently at https://marc.info/?l=ceph-devel&m=154955388914036&w=2


Related issues 2 (0 open2 closed)

Related to Ceph - Backport #45403: luminous: mon/OSDMonitor: maps not trimmed if osds are downRejectedJoao Eduardo LuisActions
Related to Ceph - Bug #45400: mon/OSDMonitor: maps not trimmed if osds are downResolvedJoao Eduardo Luis

Actions
Actions #1

Updated by Dan van der Ster about 5 years ago

Here's an example on v12.2.8. The mon db is normally trim at around 700MB, but after some backfilling its currently at 6.3GB on all mons.

cephbeesly-mon-2a00f134e5.cern.ch:
6.3G    /var/lib/ceph/mon/

p01001532021656.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715d82373.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715y01595.cern.ch:
6.3G    /var/lib/ceph/mon/

p05517715y58557.cern.ch:
6.3G    /var/lib/ceph/mon/

We follow Sage's procedure to debug:

- enable debug_mon = 20 on all mons (*before* restarting)
   ceph tell mon.* injectargs '--debug-mon 20'
- wait for 10 minutes or so to generate some logs
- add 'debug mon = 20' to ceph.conf (on mons only)
- restart the monitors
- wait for them to start trimming
- remove 'debug mon = 20' from ceph.conf (on mons only)

Following the mon restarts the db's shrunk to ~500MB:


cephbeesly-mon-2a00f134e5.cern.ch:
532M    /var/lib/ceph/mon

p01001532021656.cern.ch:
532M    /var/lib/ceph/mon

p05517715d82373.cern.ch:
532M    /var/lib/ceph/mon

p05517715y01595.cern.ch:
532M    /var/lib/ceph/mon

p05517715y58557.cern.ch:
532M    /var/lib/ceph/mon

The logs are here: ceph-post-file: 877cc29d-e697-4f76-9d52-70f08511cfca

Actions #2

Updated by Joao Eduardo Luis about 5 years ago

I have a feeling this is actually due to what led me to open this PR: https://github.com/ceph/ceph/pull/19076

The problem was that I was unable to reproduce it in mimic, so that was annoying enough to just let it linger. I'll get a branch with this for luminous, and hopefully someone will be able to test it.

Actions #3

Updated by Joao Eduardo Luis about 5 years ago

  • Category set to Correctness/Safety
  • Assignee set to Joao Eduardo Luis
  • Component(RADOS) Monitor added
Actions #4

Updated by Swami Reddy about 5 years ago

seen this issue with 10.2.4

Actions #5

Updated by Neha Ojha over 4 years ago

  • Status changed from Need More Info to Fix Under Review
  • Pull request ID set to 19076
Actions #6

Updated by Joao Eduardo Luis over 3 years ago

  • Status changed from Fix Under Review to Resolved
Actions #7

Updated by Joao Eduardo Luis over 3 years ago

  • Status changed from Resolved to Closed
Actions #8

Updated by Joao Eduardo Luis over 3 years ago

  • Related to Backport #45403: luminous: mon/OSDMonitor: maps not trimmed if osds are down added
Actions #9

Updated by Joao Eduardo Luis over 3 years ago

  • Related to Bug #45400: mon/OSDMonitor: maps not trimmed if osds are down added
Actions

Also available in: Atom PDF