Project

General

Profile

Bug #11470

mon: leaked Messenger, MLog on shutdown

Added by Sage Weil over 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Monitor
Target version:
-
Start date:
04/24/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
firefly, hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:

Description

/a/teuthology-2015-04-23_16:02:01-rgw-hammer-distro-basic-typica/3611


Related issues

Copied to Ceph - Backport #12485: mon: leaked Messenger, MLog on shutdown Resolved 04/24/2015
Copied to Ceph - Backport #12486: mon: leaked Messenger, MLog on shutdown Resolved 04/24/2015

Associated revisions

Revision 1551ebb6 (diff)
Added by Joao Luis about 3 years ago

mon: PaxosService: call post_refresh() instead of post_paxos_update()

Whenever the monitor finishes committing a proposal, we call
Monitor::refresh_from_paxos() to nudge the services to refresh. Once
all services have refreshed, we would then call each services
post_paxos_update().

However, due to an unfortunate, non-critical bug, some services (mainly
the LogMonitor) could have messages pending in their
'waiting_for_finished_proposal' callback queue [1], and we need to nudge
those callbacks.

This patch adds a new step during the refresh phase: instead of calling
directly the service's post_paxos_update(), we introduce a
PaxosService::post_refresh() which will call the services
post_paxos_update() function first and then nudge those callbacks when
appropriate.

[1] - Given the monitor will send MLog messages to itself, and given the
service is not readable before its initial state is proposed and
committed, some of the initial MLog's would be stuck waiting for the
proposal to finish. However, by design, we only nudge those message's
callbacks when an election finishes or, if the leader, when the proposal
finishes. On peons, however, we would only nudge those callbacks if an
election happened to be triggered, hence the need for an alternate path
to retry any message waiting for the initial proposal to finish.

Fixes: #11470

Signed-off-by: Joao Eduardo Luis <>

Revision 2f35a415 (diff)
Added by Joao Luis about 3 years ago

mon: PaxosService: call post_refresh() instead of post_paxos_update()

Whenever the monitor finishes committing a proposal, we call
Monitor::refresh_from_paxos() to nudge the services to refresh. Once
all services have refreshed, we would then call each services
post_paxos_update().

However, due to an unfortunate, non-critical bug, some services (mainly
the LogMonitor) could have messages pending in their
'waiting_for_finished_proposal' callback queue [1], and we need to nudge
those callbacks.

This patch adds a new step during the refresh phase: instead of calling
directly the service's post_paxos_update(), we introduce a
PaxosService::post_refresh() which will call the services
post_paxos_update() function first and then nudge those callbacks when
appropriate.

[1] - Given the monitor will send MLog messages to itself, and given the
service is not readable before its initial state is proposed and
committed, some of the initial MLog's would be stuck waiting for the
proposal to finish. However, by design, we only nudge those message's
callbacks when an election finishes or, if the leader, when the proposal
finishes. On peons, however, we would only nudge those callbacks if an
election happened to be triggered, hence the need for an alternate path
to retry any message waiting for the initial proposal to finish.

Fixes: #11470

Signed-off-by: Joao Eduardo Luis <>
(cherry picked from commit 1551ebb63238073d2fd30201e6b656a8988e958c)

Revision d08db7a0 (diff)
Added by Joao Luis about 3 years ago

mon: PaxosService: call post_refresh() instead of post_paxos_update()

Whenever the monitor finishes committing a proposal, we call
Monitor::refresh_from_paxos() to nudge the services to refresh. Once
all services have refreshed, we would then call each services
post_paxos_update().

However, due to an unfortunate, non-critical bug, some services (mainly
the LogMonitor) could have messages pending in their
'waiting_for_finished_proposal' callback queue [1], and we need to nudge
those callbacks.

This patch adds a new step during the refresh phase: instead of calling
directly the service's post_paxos_update(), we introduce a
PaxosService::post_refresh() which will call the services
post_paxos_update() function first and then nudge those callbacks when
appropriate.

[1] - Given the monitor will send MLog messages to itself, and given the
service is not readable before its initial state is proposed and
committed, some of the initial MLog's would be stuck waiting for the
proposal to finish. However, by design, we only nudge those message's
callbacks when an election finishes or, if the leader, when the proposal
finishes. On peons, however, we would only nudge those callbacks if an
election happened to be triggered, hence the need for an alternate path
to retry any message waiting for the initial proposal to finish.

Fixes: #11470

Signed-off-by: Joao Eduardo Luis <>
(cherry picked from commit 1551ebb63238073d2fd30201e6b656a8988e958c)

History

#1 Updated by Sage Weil over 3 years ago

/a/teuthology-2015-04-24_16:02:02-rgw-hammer-distro-basic-typica/4331

#2 Updated by Joao Luis over 3 years ago

  • Regression set to No

looks like I'll need access to the typica lab. For some reason assumed those results would end up in the teuthology machine in the old lab.

I'll figure this out next week.

#3 Updated by Joao Luis about 3 years ago

  • Status changed from New to In Progress

I believe I found where these come from while testing the optracker on my local machine \o/

Looks like these happen on peons, easily reproducible using vstart.sh. They will be some MLog's that will get stuck in waiting_for_finished_proposal.

#4 Updated by Joao Luis about 3 years ago

  • Status changed from In Progress to Pending Backport

#5 Updated by Joao Luis about 3 years ago

  • Category set to Monitor

still need to assess how far back this needs to be backported.

#6 Updated by Joao Luis about 3 years ago

  • Backport set to firefly, hammer

1551ebb63238073d2fd30201e6b656a8988e958c needs to be backported to firefly and hammer.

cherry-picking is trivial.

wip-11470.firefly has the patch targeting current firefly (https://github.com/ceph/ceph/pull/5358)
wip-11470.hammer has the patch targeting current hammer (https://github.com/ceph/ceph/pull/5359)

#7 Updated by Nathan Cutler almost 3 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF