Actions
Bug #5424
closedmon/Paxos.cc: 549: FAILED assert(begin->last_committed == last_committed)
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
all peons died with teh above assert. the leader did this:
2013-06-21 21:32:57.785783 7f0f63adb700 10 mon.a@0(leader).paxos(paxos active c 40663..40875) finish_proposal state 1 proposals left 1 2013-06-21 21:32:57.786195 7f0f63adb700 10 mon.a@0(leader).paxos(paxos active c 40663..40874) propose_queued 40875 6782 bytes
i.e., it wrote last_committed, and a moment later read back in the old value. the leveldb log says
2013/06/21-21:32:07.885283 7f0f61ad1700 Generated table #9749: 34 keys, 2309069 bytes 2013/06/21-21:32:07.885427 7f0f61ad1700 compacted to: files[ 2 5 49 504 361 0 0 ] 2013/06/21-21:32:07.885751 7f0f61ad1700 Compaction error: IO error: /var/lib/ceph/mon/ceph-a/store.db/009750.sst: Too many open files 2013/06/21-21:32:07.885888 7f0f61ad1700 Manual compaction at level-1 from 'pgmap .. 'pgmap; will stop at (end) 2013/06/21-21:32:07.886033 7f0f61ad1700 Manual compaction at level-2 from 'pgmap .. 'pgmap; will stop at (end) 2013/06/21-21:32:10.490773 7f0f61ad1700 Expanding@0 1+5 to 2+5 2013/06/21-21:32:10.490836 7f0f61ad1700 Manual compaction at level-0 from 'paxos .. 'paxos; will stop at 'pgmap 2013/06/21-21:32:10.490853 7f0f61ad1700 Compacting 2@0 + 5@1 files 2013/06/21-21:32:10.490899 7f0f61ad1700 compacted to: files[ 2 5 49 504 361 0 0 ] 2013/06/21-21:32:10.491257 7f0f61ad1700 Compaction error: IO error: /var/lib/ceph/mon/ceph-a/store.db/009752.sst: Too many open files 2013/06/21-21:32:10.491388 7f0f61ad1700 Manual compaction at level-1 from 'paxos .. 'paxos; will stop at (end) 2013/06/21-21:32:10.491444 7f0f61ad1700 Manual compaction at level-2 from 'paxos .. 'paxos; will stop at (end)
and there are a zillion open files for leveldb sst.
increasing max_open_files.
triggered by the big suite.
Updated by Greg Farnum almost 11 years ago
Shouldn't that cause LevelDB to block or throw an error or something? I'm not quite sure how it leads to us not reading back what we've written.
Updated by Sage Weil almost 11 years ago
Greg Farnum wrote:
Shouldn't that cause LevelDB to block or throw an error or something? I'm not quite sure how it leads to us not reading back what we've written.
i guess is the problem is the error happens in the background compaction thread, isn't handled, and all goes to hell. very disappointing.
Updated by Sage Weil almost 11 years ago
- Status changed from Fix Under Review to 15
Actions