Project

General

Profile

Actions

Bug #5424

closed

mon/Paxos.cc: 549: FAILED assert(begin->last_committed == last_committed)

Added by Sage Weil almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

all peons died with teh above assert. the leader did this:

2013-06-21 21:32:57.785783 7f0f63adb700 10 mon.a@0(leader).paxos(paxos active c 40663..40875) finish_proposal state 1 proposals left 1
2013-06-21 21:32:57.786195 7f0f63adb700 10 mon.a@0(leader).paxos(paxos active c 40663..40874) propose_queued 40875 6782 bytes

i.e., it wrote last_committed, and a moment later read back in the old value. the leveldb log says

2013/06/21-21:32:07.885283 7f0f61ad1700 Generated table #9749: 34 keys, 2309069 bytes
2013/06/21-21:32:07.885427 7f0f61ad1700 compacted to: files[ 2 5 49 504 361 0 0 ]
2013/06/21-21:32:07.885751 7f0f61ad1700 Compaction error: IO error: /var/lib/ceph/mon/ceph-a/store.db/009750.sst: Too many open files
2013/06/21-21:32:07.885888 7f0f61ad1700 Manual compaction at level-1 from 'pgmap .. 'pgmap; will stop at (end)
2013/06/21-21:32:07.886033 7f0f61ad1700 Manual compaction at level-2 from 'pgmap .. 'pgmap; will stop at (end)
2013/06/21-21:32:10.490773 7f0f61ad1700 Expanding@0 1+5 to 2+5
2013/06/21-21:32:10.490836 7f0f61ad1700 Manual compaction at level-0 from 'paxos .. 'paxos; will stop at 'pgmap
2013/06/21-21:32:10.490853 7f0f61ad1700 Compacting 2@0 + 5@1 files
2013/06/21-21:32:10.490899 7f0f61ad1700 compacted to: files[ 2 5 49 504 361 0 0 ]
2013/06/21-21:32:10.491257 7f0f61ad1700 Compaction error: IO error: /var/lib/ceph/mon/ceph-a/store.db/009752.sst: Too many open files
2013/06/21-21:32:10.491388 7f0f61ad1700 Manual compaction at level-1 from 'paxos .. 'paxos; will stop at (end)
2013/06/21-21:32:10.491444 7f0f61ad1700 Manual compaction at level-2 from 'paxos .. 'paxos; will stop at (end)

and there are a zillion open files for leveldb sst.

increasing max_open_files.

triggered by the big suite.

Actions #1

Updated by Greg Farnum almost 11 years ago

Shouldn't that cause LevelDB to block or throw an error or something? I'm not quite sure how it leads to us not reading back what we've written.

Actions #2

Updated by Sage Weil almost 11 years ago

Greg Farnum wrote:

Shouldn't that cause LevelDB to block or throw an error or something? I'm not quite sure how it leads to us not reading back what we've written.

i guess is the problem is the error happens in the background compaction thread, isn't handled, and all goes to hell. very disappointing.

Actions #3

Updated by Ian Colle almost 11 years ago

  • Assignee set to Joao Eduardo Luis
Actions #4

Updated by Sage Weil almost 11 years ago

  • Status changed from Fix Under Review to 15
Actions #5

Updated by Sage Weil almost 11 years ago

  • Status changed from 15 to Resolved
Actions

Also available in: Atom PDF