Bug #5176: leveldb: Compaction makes things time-out yielding spurious elections - Ceph - Ceph

Actions

Copy link

Bug #5176

closed

leveldb: Compaction makes things time-out yielding spurious elections

Added by Sylvain Munaut almost 11 years ago. Updated almost 11 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Sage Weil

Category:

Monitor

Target version:

% Done:

Source:

other

Tags:

Backport:

cuttlefish

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

It seems that compaction can take a few seconds (despite running on 10k SAS disks) and can cause peons to not renew the lease on time.

The problem is made worse by some logic issue in the mon.

Once the compaction has run and took some time, it may end up "propose_queued", and this cancels the "lease_renew" timeout. The problem is that this does not actually trigger an immediate renew, the actual renew will only happen at the end of the update cycle which will take a few second by itself and by then the lease will have expired.

Before cancelling the lease_renew timeout, it should check if there is enough time for an update cycle or if it should trigger a lease renew immediately, this would give much more margin for the leveldb to make its thing without breaking quorum and forcing elections for nothing (and potentially ejecting a mon and triggering an HEALTH_WARN, triggerng any monitoring system you might be using)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #5176

leveldb: Compaction makes things time-out yielding spurious elections

Updated by Anonymous almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sylvain Munaut almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sylvain Munaut almost 11 years ago

Updated by Sage Weil almost 11 years ago