Bug #11470
mon: leaked Messenger, MLog on shutdown
0%
Description
/a/teuthology-2015-04-23_16:02:01-rgw-hammer-distro-basic-typica/3611
Related issues
Associated revisions
mon: PaxosService: call post_refresh() instead of post_paxos_update()
Whenever the monitor finishes committing a proposal, we call
Monitor::refresh_from_paxos() to nudge the services to refresh. Once
all services have refreshed, we would then call each services
post_paxos_update().
However, due to an unfortunate, non-critical bug, some services (mainly
the LogMonitor) could have messages pending in their
'waiting_for_finished_proposal' callback queue [1], and we need to nudge
those callbacks.
This patch adds a new step during the refresh phase: instead of calling
directly the service's post_paxos_update(), we introduce a
PaxosService::post_refresh() which will call the services
post_paxos_update() function first and then nudge those callbacks when
appropriate.
[1] - Given the monitor will send MLog messages to itself, and given the
service is not readable before its initial state is proposed and
committed, some of the initial MLog's would be stuck waiting for the
proposal to finish. However, by design, we only nudge those message's
callbacks when an election finishes or, if the leader, when the proposal
finishes. On peons, however, we would only nudge those callbacks if an
election happened to be triggered, hence the need for an alternate path
to retry any message waiting for the initial proposal to finish.
Fixes: #11470
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: PaxosService: call post_refresh() instead of post_paxos_update()
Whenever the monitor finishes committing a proposal, we call
Monitor::refresh_from_paxos() to nudge the services to refresh. Once
all services have refreshed, we would then call each services
post_paxos_update().
However, due to an unfortunate, non-critical bug, some services (mainly
the LogMonitor) could have messages pending in their
'waiting_for_finished_proposal' callback queue [1], and we need to nudge
those callbacks.
This patch adds a new step during the refresh phase: instead of calling
directly the service's post_paxos_update(), we introduce a
PaxosService::post_refresh() which will call the services
post_paxos_update() function first and then nudge those callbacks when
appropriate.
[1] - Given the monitor will send MLog messages to itself, and given the
service is not readable before its initial state is proposed and
committed, some of the initial MLog's would be stuck waiting for the
proposal to finish. However, by design, we only nudge those message's
callbacks when an election finishes or, if the leader, when the proposal
finishes. On peons, however, we would only nudge those callbacks if an
election happened to be triggered, hence the need for an alternate path
to retry any message waiting for the initial proposal to finish.
Fixes: #11470
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
(cherry picked from commit 1551ebb63238073d2fd30201e6b656a8988e958c)
mon: PaxosService: call post_refresh() instead of post_paxos_update()
Whenever the monitor finishes committing a proposal, we call
Monitor::refresh_from_paxos() to nudge the services to refresh. Once
all services have refreshed, we would then call each services
post_paxos_update().
However, due to an unfortunate, non-critical bug, some services (mainly
the LogMonitor) could have messages pending in their
'waiting_for_finished_proposal' callback queue [1], and we need to nudge
those callbacks.
This patch adds a new step during the refresh phase: instead of calling
directly the service's post_paxos_update(), we introduce a
PaxosService::post_refresh() which will call the services
post_paxos_update() function first and then nudge those callbacks when
appropriate.
[1] - Given the monitor will send MLog messages to itself, and given the
service is not readable before its initial state is proposed and
committed, some of the initial MLog's would be stuck waiting for the
proposal to finish. However, by design, we only nudge those message's
callbacks when an election finishes or, if the leader, when the proposal
finishes. On peons, however, we would only nudge those callbacks if an
election happened to be triggered, hence the need for an alternate path
to retry any message waiting for the initial proposal to finish.
Fixes: #11470
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
(cherry picked from commit 1551ebb63238073d2fd30201e6b656a8988e958c)
History
#1 Updated by Sage Weil almost 9 years ago
/a/teuthology-2015-04-24_16:02:02-rgw-hammer-distro-basic-typica/4331
#2 Updated by Joao Eduardo Luis almost 9 years ago
- Regression set to No
looks like I'll need access to the typica lab. For some reason assumed those results would end up in the teuthology machine in the old lab.
I'll figure this out next week.
#3 Updated by Joao Eduardo Luis almost 9 years ago
- Status changed from New to In Progress
I believe I found where these come from while testing the optracker on my local machine \o/
Looks like these happen on peons, easily reproducible using vstart.sh. They will be some MLog's that will get stuck in waiting_for_finished_proposal.
#4 Updated by Joao Eduardo Luis over 8 years ago
- Status changed from In Progress to Pending Backport
#5 Updated by Joao Eduardo Luis over 8 years ago
- Category set to Monitor
still need to assess how far back this needs to be backported.
#6 Updated by Joao Eduardo Luis over 8 years ago
- Backport set to firefly, hammer
1551ebb63238073d2fd30201e6b656a8988e958c needs to be backported to firefly and hammer.
cherry-picking is trivial.
wip-11470.firefly has the patch targeting current firefly (https://github.com/ceph/ceph/pull/5358)
wip-11470.hammer has the patch targeting current hammer (https://github.com/ceph/ceph/pull/5359)
#7 Updated by Nathan Cutler over 8 years ago
- Status changed from Pending Backport to Resolved