Project

General

Profile

Actions

Bug #4620

closed

mon: Paxos proposals take too long to finish when transaction is huge

Added by Joao Eduardo Luis about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
High
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Jim Schutt reported on ceph-devel that a Paxos proposal (on 0.59) was taking about a minute to finish (http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/14109).

When I redo the test with pg_bits = 4 (27696 PGs total) I get this:

2013-04-01 14:25:16.993255 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-01 14:25:17.051729 7ffff7fbe780  5 mon.cs31@0(leader).paxos(paxos active c 1..3) queue_proposal bl 9887726 bytes; ctx = 0x11e81f0
2013-04-01 14:25:17.051882 7ffff7fbe780  5 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued 4 9887726 bytes
2013-04-01 14:25:17.051888 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued list_proposals 1 in queue:
2013-04-01 14:26:08.185143 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) begin for 4 9887726 bytes
2013-04-01 14:26:08.296351 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos updating c 1..3) commit 4
2013-04-01 14:26:08.408900 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finishing proposal
2013-04-01 14:26:08.408928 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finish it (proposal = 0x1260f50)
2013-04-01 14:26:08.408931 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal proposal took 51.357088 to finish
2013-04-01 14:26:08.408974 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) _active

and at pg_bits = 5 (55392 PGs total) I get this:

2013-04-01 14:36:15.385280 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-01 14:36:15.500292 7ffff7fbe780  5 mon.cs31@0(leader).paxos(paxos active c 1..3) queue_proposal bl 19775198 bytes; ctx = 0x11e81f0
2013-04-01 14:36:15.500433 7ffff7fbe780  5 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued 4 19775198 bytes
2013-04-01 14:36:15.500438 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued list_proposals 1 in queue:
2013-04-01 14:39:43.645062 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) begin for 4 19775198 bytes
2013-04-01 14:39:43.875099 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos updating c 1..3) commit 4
2013-04-01 14:39:44.086097 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finishing proposal
2013-04-01 14:39:44.086128 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finish it (proposal = 0x1260f50)
2013-04-01 14:39:44.086136 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal proposal took 208.585710 to finish
2013-04-01 14:39:44.086156 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) _active

For this last configuration, after collecting the above
I waited a bit, started all the OSDs, waited a bit longer,
then collected this:

2013-04-01 14:54:37.364686 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-01 14:54:37.433641 7ffff328d700  5 mon.cs31@0(leader).paxos(paxos active c 1..27) queue_proposal bl 10629660 bytes; ctx = 0x11ece50
2013-04-01 14:54:37.433750 7ffff328d700  5 mon.cs31@0(leader).paxos(paxos preparing update c 1..27) propose_queued 28 10629660 bytes
2013-04-01 14:54:37.433755 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..27) propose_queued list_proposals 1 in queue:
2013-04-01 14:55:38.684532 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..27) begin for 28 10629660 bytes
2013-04-01 14:55:38.814528 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos updating c 1..27) commit 28
2013-04-01 14:55:38.937087 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal finishing proposal
2013-04-01 14:55:38.937120 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal finish it (proposal = 0x1c6e3c0)
2013-04-01 14:55:38.937124 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal proposal took 61.503375 to finish
2013-04-01 14:55:38.937168 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) _active
Actions

Also available in: Atom PDF