Project

General

Profile

Actions

Bug #24160

closed

Monitor down when large store data needs to compact triggered by ceph tell mon.xx compact command

Added by 相洋 于 almost 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/jewel-x
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have met a monitor problem with capacity too large in our production environment.

This logical volume for monitor is 100GB, the capacity can grow to 95GB, and then go down. Sometimes, the monitor process can not start

,we have to rebuild the monitor.

Then we set a crontab task, every week we compact the monitor at a assigned point time. The command is "ceph tell mon.xx

compact", While the monitor is compacting(it can last for 80 seconds), the monitor is down by lease timeout. When the compact task

complete, the monitors vote again ,the monitor is up and active.

In monitor message dispatch code, while one command is executing , it must have the dispatch lock. So indeed all command are executing

synchronously. If one command is executing for long time, others command must wait.


Related issues 1 (0 open1 closed)

Has duplicate RADOS - Bug #24159: Monitor down when large store data needs to compact triggered by ceph tell mon.xx compact commandDuplicate05/17/201805/31/2018

Actions
Actions #1

Updated by Nathan Cutler almost 6 years ago

  • Has duplicate Bug #24159: Monitor down when large store data needs to compact triggered by ceph tell mon.xx compact command added
Actions #2

Updated by Joao Eduardo Luis almost 6 years ago

  • Project changed from Ceph to RADOS
  • Category changed from Monitor to Correctness/Safety
  • Component(RADOS) Monitor added
Actions #4

Updated by Josh Durgin almost 6 years ago

  • Status changed from New to Fix Under Review
Actions #5

Updated by Kefu Chai almost 6 years ago

  • Status changed from Fix Under Review to Resolved
  • Target version deleted (v10.2.11)
Actions

Also available in: Atom PDF