Bug #6003: journal Unable to read past sequence 406 ... - Ceph - Ceph

Actions

Copy link

Bug #6003

closed

journal Unable to read past sequence 406 ...

Added by Sage Weil over 10 years ago. Updated about 9 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Samuel Just

Category:

OSD

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

dumpling, firefly, giant

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

fix : bae1f3eaa09c4747b8bfc6fb5dc673aa6989b695

2013-08-15 12:44:50.877768 7f51d4230780 -1 journal Unable to read past sequence 406 but header indicates the journal has committed up through 416, journal is corrupt
2013-08-15 12:44:50.901410 7f51d4230780 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f51d4230780 time 2013-08-15 12:44:50.877826
os/FileJournal.cc: 1697: FAILED assert(0)

 ceph version 0.67-125-g4422f21 (4422f21a6586467a63ce6841552d0f60aa849cf1)
 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x7d6) [0x77e966]
 2: (FileJournal::open(unsigned long)+0x733) [0x77bf73]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x271) [0x7cad61]
 4: (FileStore::mount()+0x34f3) [0x7a1503]
 5: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x66dcba]
 6: (OSD::convertfs(std::string const&, std::string const&)+0x47) [0x66e737]
 7: (main()+0x1c13) [0x5d0e13]
 8: (__libc_start_main()+0xed) [0x7f51d1ccc76d]
 9: ceph-osd() [0x5d479d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was

ubuntu@teuthology:/a/teuthology-2013-08-15_01:00:05-rados-next-testing-basic-plana/107661$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 68d04c9a036bea12d76fdd51e926aef771e8e041
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: 4422f21a6586467a63ce6841552d0f60aa849cf1
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: 4422f21a6586467a63ce6841552d0f60aa849cf1
  s3tests:
    branch: next
  workunit:
    sha1: 4422f21a6586467a63ce6841552d0f60aa849cf1
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 2
    chance_pgpnum_fix: 1
    timeout: 1200
- radosbench:
    clients:
    - client.0
    time: 1800
teuthology_branch: next

Files

Download all files

ceph-osd.3.log (27.8 MB) ceph-osd.3.log		David Zafman, 10/30/2013 09:03 PM
Log79set_5.zip (8.73 MB) Log79set_5.zip		Shambhu Rajak, 09/28/2014 11:02 PM

Related issues 5 (0 open — 5 closed)

Actions

Copy link

Updated by Ian Colle over 10 years ago

Assignee set to Samuel Just

Actions

Copy link

Updated by Sage Weil over 10 years ago

Status changed from New to Need More Info

Actions

Copy link

Updated by Ian Colle over 10 years ago

Run this tasks repeatedly with logging

Actions

Copy link

Updated by Sage Weil over 10 years ago

Assignee changed from Samuel Just to Sage Weil

Actions

Copy link

Updated by Sage Weil over 10 years ago

200 runs later and no luck reproducing this with logs.

/var/lib/teuthworker/archive/sage-bug-6003-a
200 passes

Actions

Copy link

Updated by Sage Weil over 10 years ago

Priority changed from Urgent to High

Actions

Copy link

Updated by Sage Weil over 10 years ago

Priority changed from High to Urgent

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-08-22_01:00:13-rados-next-testing-basic-plana/804

Actions

Copy link

Updated by Sage Weil over 10 years ago

teuthology-2013-08-23_01:00:10-rados-master-testing-basic-plana/1275

Actions

Copy link

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-09-04_17:24:00-rados-master-testing-basic-plana/21102

Actions

Copy link

#10

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-09-04_17:24:00-rados-master-testing-basic-plana/21102

Actions

Copy link

#11

Updated by Samuel Just over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-10-10_22:04:00-rados-master-testing-basic-plana/44493/remote

Actions

Copy link

#12

Updated by Sage Weil over 10 years ago

Assignee deleted (~~Sage Weil~~)

Actions

Copy link

#13

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-10-23_19:00:21-rados-dumpling-testing-basic-plana/65408

Actions

Copy link

#14

Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/teuthology-2013-10-27_19:00:21-rados-dumpling-testing-basic-plana/71126

Actions

Copy link

#15

Updated by David Zafman over 10 years ago

File ceph-osd.3.log ceph-osd.3.log added

I hit this same problem in recovery testing. Attached is the complete log.

Actions

Copy link

#16

Updated by Greg Farnum over 10 years ago

/a/teuthology-2013-10-28_23:01:45-kcephfs-master-testing-basic-plana/73927

Actions

Copy link

#17

Updated by Samuel Just over 10 years ago

slider:~samuelj/bug_logs/13-10-31-04:05:43/remote

With journal logging! I seem to be able to reproduce this pretty much constantly. ext4. David's was ext4 as well, iirc.

Actions

Copy link

#18

Updated by Samuel Just over 10 years ago

Kernel: 3.8.0-21-generic

Actions

Copy link

#19

Updated by Sage Weil over 10 years ago

Status changed from Need More Info to 12

Actions

Copy link

#20

Updated by David Zafman over 10 years ago

This looks like the same problem seen in a unit test.

2013-10-31T15:23:41.910 INFO:teuthology.orchestra.run.err:[10.214.132.32]: 2013-10-31 15:23:41.909749 7f87dfb7d780 1 journal _open /tmp/ceph_test_filejournal.tmp.558768921 fd 8: 209715200 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-10-31T15:23:41.986 INFO:teuthology.orchestra.run.out:[10.214.132.32]: test/test_filejournal.cc:199: Failure
2013-10-31T15:23:41.986 INFO:teuthology.orchestra.run.out:[10.214.132.32]: Value of: j.read_entry(inbl, seq)
2013-10-31T15:23:41.986 INFO:teuthology.orchestra.run.out:[10.214.132.32]: Actual: false
2013-10-31T15:23:41.986 INFO:teuthology.orchestra.run.out:[10.214.132.32]: Expected: true
2013-10-31T15:23:41.986 INFO:teuthology.orchestra.run.err:[10.214.132.32]: 2013-10-31 15:23:41.985155 7f87dfb7d780 -1 journal Unable to read past sequence 0 but header indicates the journal has committed up through 1, journal is corrupt
2013-10-31T15:23:41.989 INFO:teuthology.orchestra.run.out:[10.214.132.32]: [ FAILED ] TestFileJournal.WriteManyVecs (355 ms)

Actions

Copy link

#21

Updated by Sage Weil over 10 years ago

Assignee set to Samuel Just

Actions

Copy link

#22

Updated by Samuel Just over 10 years ago

Status changed from 12 to Need More Info

af0c5b415ad5a863940add7c2e8f3c1ac8040ef2 should fix the most recent iteration, but it doesn't explain the instances prior to the typo.

Actions

Copy link

#23

Updated by Samuel Just over 10 years ago

Assignee deleted (~~Samuel Just~~)

Actions

Copy link

#24

Updated by Samuel Just over 10 years ago

Status changed from Need More Info to Can't reproduce

Actions

Copy link

#25

Updated by Sage Weil over 10 years ago

Status changed from Can't reproduce to 12

ubuntu@teuthology:/a/teuthology-2013-12-28_23:00:01-rados-next-distro-basic-plana/18074

    -4> 2013-12-29 08:27:34.549948 7f602df89780  2 journal open /var/lib/ceph/osd/ceph-0/journal fsid dad1a3cb-5fce-44d2-ae19-1df8090c97fd fs_op_seq 1285
    -3> 2013-12-29 08:27:34.549978 7f602df89780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
    -2> 2013-12-29 08:27:34.549998 7f602df89780  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 20: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0
    -1> 2013-12-29 08:27:34.571046 7f602df89780 -1 journal Unable to read past sequence 1274 but header indicates the journal has committed up through 1280, journal is corrupt
     0> 2013-12-29 08:27:34.573614 7f602df89780 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f602df89780 time 2013-12-29 08:27:34.571062
os/FileJournal.cc: 1697: FAILED assert(0)

Actions

Copy link

#26

Updated by Sage Weil about 10 years ago

ubuntu@teuthology:/a/sage-2014-02-08_15:50:23-rados:thrash-wip-agent-testing-basic-plana/72854

Actions

Copy link

#27

Updated by Sage Weil about 10 years ago

Status changed from 12 to Need More Info

Actions

Copy link

#28

Updated by David Zafman about 10 years ago

Status changed from Need More Info to 12

/a/dzafman-2014-02-28_12:09:58-rados:thrash-wip-7458-testing-basic-plana/111738

2014-03-01 02:09:40.455986 7f2b0c373780 10 filestore(/var/lib/ceph/osd/ceph-2) open_journal at /var/lib/ceph/osd/ceph-2/journal
2014-03-01 02:09:40.456008 7f2b0c373780  0 filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-03-01 02:09:40.456014 7f2b0c373780 10 filestore(/var/lib/ceph/osd/ceph-2) list_collections
2014-03-01 02:09:40.456462 7f2b0c373780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-03-01 02:09:40.456498 7f2b0c373780  1 journal _open /var/lib/ceph/osd/ceph-2/journal fd 20: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0
2014-03-01 02:09:40.474869 7f2b0c373780 -1 journal Unable to read past sequence 958 but header indicates the journal has committed up through 967, journal is corrupt
2014-03-01 02:09:40.489832 7f2b0c373780 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f2b0c373780 time 2014-03-01 02:09:40.474886
os/FileJournal.cc: 1673: FAILED assert(0)

 ceph version 0.77-618-g123ff9e (123ff9e18a27c6517a291aab1e0613b27d70d711)
 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x86f) [0x9244af]
 2: (FileJournal::open(unsigned long)+0x6ad) [0x921a7d]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x14f) [0x779b6f]
 4: (FileStore::mount()+0x30c2) [0x74e812]
 5: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x6243ba]
 6: (main()+0x1d88) [0x60e7b8]
 7: (__libc_start_main()+0xed) [0x7f2b09fd076d]
 8: ceph-osd() [0x6129b9]

Actions

Copy link

#29

Updated by Sage Weil about 10 years ago

Severity changed from 3 - minor to 2 - major

Actions

Copy link

#30

Updated by Sage Weil about 10 years ago

Status changed from 12 to Need More Info

I think we need a log for this.. hopefully we will hit it now that things are cranked up on the nightlies?

Actions

Copy link

#31

Updated by Samuel Just about 10 years ago

Status changed from Need More Info to 12

Actions

Copy link

#32

Updated by Sage Weil about 10 years ago

Status changed from 12 to Need More Info

Actions

Copy link

#33

Updated by Sage Weil almost 10 years ago

Priority changed from Urgent to High

Actions

Copy link

#34

Updated by Sage Weil almost 10 years ago

ubuntu@teuthology:/a/teuthology-2014-06-15_02:30:07-rados-next-testing-basic-plana/308538 (btrfs)

the problem in this case looks like premature trimming. the filestore start seq is X and the first journal entry is X+20 or something. appears that perhaps the btrfs snap wait method returned before it was really durable?

Actions

Copy link

#35

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-07-27_02:30:01-rados-next-testing-basic-plana/380261
rados/thrash/{clusters/fixed-2.yaml fs/ext4.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/radosbench.yaml}

this time on ext4, not btrfs.

The header prints out valid info:

     -6> 2014-07-27 15:46:20.583861 7f92aa089780 10 journal read_header
    -5> 2014-07-27 15:46:20.597188 7f92aa089780 10 journal header: block_size 4096 alignment 4096 max_size 104857600
    -4> 2014-07-27 15:46:20.597210 7f92aa089780 10 journal header: start 100716544
    -3> 2014-07-27 15:46:20.597213 7f92aa089780 10 journal  write_pos 4096
    -2> 2014-07-27 15:46:20.597220 7f92aa089780 10 journal open header.fsid = 0d6e2a30-a9e5-4c1d-ba4c-fd203e3cc64f

but a few lines down when we start reading entries we fail. printing the header shows bad info:

$27 = {flags = 41735368, fsid = {uuid = "\000\214{\002\000\000\000\000\300S\231\002\000\000\000"}, block_size = 15938360, alignment = 0, 
  max_size = 140736923900312, start = 140267867036691, committed_up_to = 0, start_seq = 41651200}

including a corrupt stack from

#11 0x00000000009815ef in FileJournal::read_entry (this=0x53b, bl=..., next_seq=<optimized out>, corrupt=0x0) at os/FileJournal.cc:1677
1677    in os/FileJournal.cc

some of FileJournal fields are okay...

(gdb) p write_pos
$21 = 10726185

others come from a bad header

(gdb) p read_pos
$29 = 140736923900312
(gdb) p header.start
$30 = 140267867036691

and other are also corrupt. fn follows header:

(gdb) p fn
$31 = {static npos = <optimized out>, 
  _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
    _M_p = 0x27cda80 "\275P\274"}}

and these precede it by a bunch

(gdb) p journaled_seq
$33 = 140267894734912
(gdb) p /x journaled_seq
$34 = 0x7f92aa0a0040
(gdb) p finisher_lock
$35 = {name = 0x0, id = -564455648, recursive = 255, lockdep = 127, backtrace = false, _m = {__data = {__lock = 1315, __count = 0, 
      __owner = 1698063408, __nusers = 808673586, __kind = 1698259245, __spins = 1664363829, __list = {__prev = 0x2d633461622d6431, 
        __next = 0x6333653330326466}}, __size = "#\005\000\000\000\000\000\000\060d6e2a30-a9e5-4c1d-ba4c-fd203e3c", __align = 1315}, 
  nlock = 1714697827, locked_by = 15961913567585126656, cct = 0x27bb230, logger = 0x27cd1d8}

and immediately prior is

 (gdb) p completions_lock
$39 = {name = 0x8 <Address 0x8 out of bounds>, id = -564455576, recursive = 255, lockdep = 127, backtrace = false, _m = {__data = {
      __lock = -1467469952, __count = 32658, __owner = 0, __nusers = 0, __kind = 41812256, __spins = 0, __list = {__prev = 0x27d76f0, 
        __next = 0x7f92a8883200}}, 
    __size = "\200/\210\250\222\177\000\000\000\000\000\000\000\000\000\000 \001~\002\000\000\000\000\360v}\002\000\000\000\000\000\062\210\250\222\177\000", __align = 140267869450112}, nlock = -1467468512, locked_by = 140267869451536, cct = 0x27ef480, logger = 0x27cd4c0}
(gdb) p completions
$40 = {<std::_Deque_base<FileJournal::completion_item, std::allocator<FileJournal::completion_item> >> = {
    _M_impl = {<std::allocator<FileJournal::completion_item>> = {<__gnu_cxx::new_allocator<FileJournal::completion_item>> = {<No data fields>}, <No data fields>}, _M_map = 0x27cd4c0, _M_map_size = 9215206, _M_start = {_M_cur = 0x0, _M_first = 0x27cd420, _M_last = 0x27cd420, 
        _M_node = 0x27cd4c0}, _M_finish = {_M_cur = 0x0, _M_first = 0x0, _M_last = 0x2995408, _M_node = 0x8c7c34}}}, <No data fields>}

so.. looks like something scribbled all over memory... ? :/

Actions

Copy link

#36

Updated by Sage Weil over 9 years ago

Priority changed from High to Urgent

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-08-01_02:32:01-rados-master-testing-basic-plana/392342

description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml thrashers/default.yaml workloads/radosbench.yaml}

this time, the read_entry frame is partly corrupt, too: this is bad, although the other fields look ok:

#11 0x0000000000984eff in FileJournal::read_entry (this=0x59e, bl=..., next_seq=<optimized out>, corrupt=0x0) at os/FileJournal.cc:1678
1678    os/FileJournal.cc: No such file or directory.
(gdb) p this
$13 = (FileJournal * const) 0x59e
(gdb) p pos
$14 = 44053728
(gdb) p next_pos
$15 = 97988608
(gdb) p seq
$16 = 1413
(gdb) up
#12 0x00000000009824cd in FileJournal::open (this=0x7fff89c59f30, fs_op_seq=1444) at os/FileJournal.cc:514
514     in os/FileJournal.cc
(gdb) p header
$17 = {flags = 43775176, fsid = {uuid = "\000\\\232\002\000\000\000\000\300O\246\002\000\000\000"}, block_size = 15946952, alignment = 0, max_size = 140735504818856, 
  start = 140553230486547, committed_up_to = 0, start_seq = 43670528}

*this contents still screwed up, though!

Actions

Copy link

#37

Updated by Shambhu Rajak over 9 years ago

File Log79set_5.zip Log79set_5.zip added

I saw similar issue hitting my 3 node cluster:
node1 - mon.0
        osd.0
        osd.1
        osd.2
node2 - mon.1
        osd.3
        osd.4
        osd.5
node3 - mon.2
        client.0

Platform: Ubuntu 14.04 Trusty

Snippet from teuthology Log:

While reviving osd.4 saw error os/FileJournal.cc: 1693: FAILED assert(0)

Snippet from teuthology log(complete Teuthology attached Log79set_5.zip):

2014-09-15T15:02:46.135 INFO:teuthology.task.thrashosds.thrasher:Reviving osd 4
2014-09-15T15:02:46.135 INFO:teuthology.task.ceph.osd.4:Restarting
2014-09-15T15:02:46.135 DEBUG:teuthology.orchestra.run:Running [10.15.16.60]: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 4'
2014-09-15T15:02:46.693 INFO:teuthology.task.ceph.osd.4:Started
2014-09-15T15:02:46.694 DEBUG:teuthology.orchestra.run:Running [10.15.16.60]: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.4.asok dump_ops_in_flight'
2014-09-15T15:02:46.758 INFO:teuthology.task.ceph.osd.4.out:[10.15.16.60]: starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4 /var/lib/ceph/osd/ceph-4/journal
2014-09-15T15:02:46.831 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 2014-09-15 09:32:46.701026 7f3be6f96900 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-09-15T15:02:46.831 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 2014-09-15 09:32:46.701614 7f3be6f96900 -1 journal Unable to read past sequence 1738 but header indicates the journal has committed up through 1774, journal is corrupt
2014-09-15T15:02:46.866 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f3be6f96900 time 2014-09-15 09:32:46.701625
2014-09-15T15:02:46.866 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: os/FileJournal.cc: 1693: FAILED assert(0)
2014-09-15T15:02:46.874 INFO:teuthology.task.radosbench.radosbench.0.out:[10.15.16.59]:    336      16      1790      1774   21.1158         0         -   2.98315
2014-09-15T15:02:46.883 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
2014-09-15T15:02:46.884 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb776db]
2014-09-15T15:02:46.884 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xaa4) [0xa3ec34]
2014-09-15T15:02:46.884 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  3: (FileJournal::open(unsigned long)+0x66d) [0xa4055d]
2014-09-15T15:02:46.884 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  4: (JournalingObjectStore::journal_replay(unsigned long)+0x103) [0x92e9b3]
2014-09-15T15:02:46.885 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  5: (FileStore::mount()+0x3eaa) [0x902e3a]
2014-09-15T15:02:46.885 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  6: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x66b9ba]
2014-09-15T15:02:46.885 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  7: (main()+0x23ee) [0x64379e]
2014-09-15T15:02:46.885 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  8: (__libc_start_main()+0xf5) [0x7f3be3f4dec5]
2014-09-15T15:02:46.885 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  9: ceph-osd() [0x65bc07]
2014-09-15T15:02:46.885 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-09-15T15:02:46.886 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 2014-09-15 09:32:46.741006 7f3be6f96900 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f3be6f96900 time 2014-09-15 09:32:46.701625
2014-09-15T15:02:46.886 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: os/FileJournal.cc: 1693: FAILED assert(0)
2014-09-15T15:02:46.886 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 
2014-09-15T15:02:46.886 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
2014-09-15T15:02:46.887 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb776db]
2014-09-15T15:02:46.887 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xaa4) [0xa3ec34]
2014-09-15T15:02:46.887 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  3: (FileJournal::open(unsigned long)+0x66d) [0xa4055d]
2014-09-15T15:02:46.887 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  4: (JournalingObjectStore::journal_replay(unsigned long)+0x103) [0x92e9b3]
2014-09-15T15:02:46.887 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  5: (FileStore::mount()+0x3eaa) [0x902e3a]
2014-09-15T15:02:46.888 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  6: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x66b9ba]
2014-09-15T15:02:46.888 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  7: (main()+0x23ee) [0x64379e]
2014-09-15T15:02:46.888 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  8: (__libc_start_main()+0xf5) [0x7f3be3f4dec5]
2014-09-15T15:02:46.888 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  9: ceph-osd() [0x65bc07]
2014-09-15T15:02:46.888 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-09-15T15:02:46.888 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 
2014-09-15T15:02:47.058 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:     -3> 2014-09-15 09:32:46.701026 7f3be6f96900 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-09-15T15:02:47.058 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:     -1> 2014-09-15 09:32:46.701614 7f3be6f96900 -1 journal Unable to read past sequence 1738 but header indicates the journal has committed up through 1774, journal is corrupt
2014-09-15T15:02:47.059 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:      0> 2014-09-15 09:32:46.741006 7f3be6f96900 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f3be6f96900 time 2014-09-15 09:32:46.701625
2014-09-15T15:02:47.059 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: os/FileJournal.cc: 1693: FAILED assert(0)
2014-09-15T15:02:47.059 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 
2014-09-15T15:02:47.060 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
2014-09-15T15:02:47.060 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb776db]
2014-09-15T15:02:47.060 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xaa4) [0xa3ec34]
2014-09-15T15:02:47.060 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  3: (FileJournal::open(unsigned long)+0x66d) [0xa4055d]
2014-09-15T15:02:47.060 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  4: (JournalingObjectStore::journal_replay(unsigned long)+0x103) [0x92e9b3]
2014-09-15T15:02:47.061 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  5: (FileStore::mount()+0x3eaa) [0x902e3a]
2014-09-15T15:02:47.061 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  6: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x66b9ba]
2014-09-15T15:02:47.061 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  7: (main()+0x23ee) [0x64379e]
2014-09-15T15:02:47.061 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  8: (__libc_start_main()+0xf5) [0x7f3be3f4dec5]
2014-09-15T15:02:47.061 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  9: ceph-osd() [0x65bc07]
2014-09-15T15:02:47.061 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-09-15T15:02:47.062 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 
2014-09-15T15:02:47.062 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: terminate called after throwing an instance of 'ceph::FailedAssertion'
2014-09-15T15:02:47.062 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: *** Caught signal (Aborted) **
2014-09-15T15:02:47.062 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  in thread 7f3be6f96900
2014-09-15T15:02:47.062 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
2014-09-15T15:02:47.063 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  1: ceph-osd() [0xa8c7ff]
2014-09-15T15:02:47.063 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  2: (()+0x10340) [0x7f3be5c49340]
2014-09-15T15:02:47.063 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  3: (gsignal()+0x39) [0x7f3be3f62bb9]
2014-09-15T15:02:47.063 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  4: (abort()+0x148) [0x7f3be3f65fc8]
2014-09-15T15:02:47.063 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3be486e6b5]
2014-09-15T15:02:47.063 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  6: (()+0x5e836) [0x7f3be486c836]
2014-09-15T15:02:47.064 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  7: (()+0x5e863) [0x7f3be486c863]
2014-09-15T15:02:47.064 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  8: (()+0x5eaa2) [0x7f3be486caa2]
2014-09-15T15:02:47.064 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xb778c8]
2014-09-15T15:02:47.064 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  10: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xaa4) [0xa3ec34]
2014-09-15T15:02:47.064 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  11: (FileJournal::open(unsigned long)+0x66d) [0xa4055d]
2014-09-15T15:02:47.064 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  12: (JournalingObjectStore::journal_replay(unsigned long)+0x103) [0x92e9b3]
2014-09-15T15:02:47.065 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  13: (FileStore::mount()+0x3eaa) [0x902e3a]
2014-09-15T15:02:47.065 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  14: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x66b9ba]
2014-09-15T15:02:47.065 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  15: (main()+0x23ee) [0x64379e]
2014-09-15T15:02:47.065 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  16: (__libc_start_main()+0xf5) [0x7f3be3f4dec5]
2014-09-15T15:02:47.065 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  17: ceph-osd() [0x65bc07]
2014-09-15T15:02:47.104 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 2014-09-15 09:32:46.767054 7f3be6f96900 -1 *** Caught signal (Aborted) **
2014-09-15T15:02:47.104 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  in thread 7f3be6f96900
2014-09-15T15:02:47.104 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 
2014-09-15T15:02:47.105 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
2014-09-15T15:02:47.105 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  1: ceph-osd() [0xa8c7ff]
2014-09-15T15:02:47.105 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  2: (()+0x10340) [0x7f3be5c49340]
2014-09-15T15:02:47.105 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  3: (gsignal()+0x39) [0x7f3be3f62bb9]
2014-09-15T15:02:47.105 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  4: (abort()+0x148) [0x7f3be3f65fc8]
2014-09-15T15:02:47.106 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3be486e6b5]
2014-09-15T15:02:47.106 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  6: (()+0x5e836) [0x7f3be486c836]
2014-09-15T15:02:47.106 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  7: (()+0x5e863) [0x7f3be486c863]
2014-09-15T15:02:47.106 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  8: (()+0x5eaa2) [0x7f3be486caa2]
2014-09-15T15:02:47.106 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xb778c8]
2014-09-15T15:02:47.106 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  10: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xaa4) [0xa3ec34]
2014-09-15T15:02:47.107 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  11: (FileJournal::open(unsigned long)+0x66d) [0xa4055d]
2014-09-15T15:02:47.107 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  12: (JournalingObjectStore::journal_replay(unsigned long)+0x103) [0x92e9b3]
2014-09-15T15:02:47.107 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  13: (FileStore::mount()+0x3eaa) [0x902e3a]
2014-09-15T15:02:47.107 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  14: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x66b9ba]
2014-09-15T15:02:47.107 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  15: (main()+0x23ee) [0x64379e]
2014-09-15T15:02:47.108 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  16: (__libc_start_main()+0xf5) [0x7f3be3f4dec5]
2014-09-15T15:02:47.108 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  17: ceph-osd() [0x65bc07]
2014-09-15T15:02:47.108 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-09-15T15:02:47.108 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 
2014-09-15T15:02:47.110 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:      0> 2014-09-15 09:32:46.767054 7f3be6f96900 -1 *** Caught signal (Aborted) **
2014-09-15T15:02:47.110 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  in thread 7f3be6f96900
2014-09-15T15:02:47.110 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]: 
2014-09-15T15:02:47.110 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
2014-09-15T15:02:47.110 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  1: ceph-osd() [0xa8c7ff]
2014-09-15T15:02:47.111 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  2: (()+0x10340) [0x7f3be5c49340]
2014-09-15T15:02:47.111 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  3: (gsignal()+0x39) [0x7f3be3f62bb9]
2014-09-15T15:02:47.111 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  4: (abort()+0x148) [0x7f3be3f65fc8]
2014-09-15T15:02:47.111 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3be486e6b5]
2014-09-15T15:02:47.111 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  6: (()+0x5e836) [0x7f3be486c836]
2014-09-15T15:02:47.111 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  7: (()+0x5e863) [0x7f3be486c863]
2014-09-15T15:02:47.112 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  8: (()+0x5eaa2) [0x7f3be486caa2]
2014-09-15T15:02:47.112 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xb778c8]
2014-09-15T15:02:47.112 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  10: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xaa4) [0xa3ec34]
2014-09-15T15:02:47.112 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  11: (FileJournal::open(unsigned long)+0x66d) [0xa4055d]
2014-09-15T15:02:47.112 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  12: (JournalingObjectStore::journal_replay(unsigned long)+0x103) [0x92e9b3]
2014-09-15T15:02:47.112 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  13: (FileStore::mount()+0x3eaa) [0x902e3a]
2014-09-15T15:02:47.113 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  14: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x66b9ba]
2014-09-15T15:02:47.113 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  15: (main()+0x23ee) [0x64379e]
2014-09-15T15:02:47.113 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  16: (__libc_start_main()+0xf5) [0x7f3be3f4dec5]
2014-09-15T15:02:47.113 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  17: ceph-osd() [0x65bc07]
2014-09-15T15:02:47.113 INFO:teuthology.task.ceph.osd.4.err:[10.15.16.60]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions

Copy link

#38

Updated by Sage Weil over 9 years ago

    -1> 2014-10-06 16:29:02.286272 7f2368425780 -1 journal Unable to read past sequence 39581 but header indicates the journal has committed up through 39647, journal is corrupt
     0> 2014-10-06 16:29:02.297264 7f2368425780 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f2368425780 time 2014-10-06 16:29:02.286293
os/FileJournal.cc: 1677: FAILED assert(0)

 ceph version 0.80.6-48-g884f7c4 (884f7c40c4a28d519847d3995c8d98e5837ceaf0)
 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x86f) [0x97918f]
 2: (FileJournal::open(unsigned long)+0x6b5) [0x9766f5]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x26d) [0x8bd39d]
 4: (FileStore::mount()+0x30c2) [0x890792]
 5: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x61fd8a]
 6: (main()+0x1d88) [0x6079b8]
 7: (__libc_start_main()+0xed) [0x7f236622776d]
 8: ceph-osd() [0x60bca9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2014-10-05_10:00:04-upgrade:firefly-firefly-distro-basic-multi/528734

Actions

Copy link

#39

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/a/sage-2014-11-20_17:03:30-rados:thrash-wip-watch-notify-distro-basic-multi/611427

Actions

Copy link

#40

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/a/sage-2014-12-01_11:11:52-rados-next-distro-basic-multi/629379

Actions

Copy link

#41

Updated by Loïc Dachary over 9 years ago

http://qa-proxy.ceph.com/teuthology/sage-2015-01-06_09:44:19-rados-wip-sage-testing-firefly---basic-multi/688065/

   -13> 2015-01-07 16:52:19.063303 7f6a93337780 10 filestore(/var/lib/ceph/osd/ceph-2) list_collections
   -12> 2015-01-07 16:52:19.088037 7f6a93337780 10 journal journal_replay fs op_seq 2835
   -11> 2015-01-07 16:52:19.088052 7f6a93337780  2 journal open /var/lib/ceph/osd/ceph-2/journal fsid 5af9380a-a5a6-4ca2-9310-311ee01ac6cc fs_op_seq 2835
   -10> 2015-01-07 16:52:19.088074 7f6a93337780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
    -9> 2015-01-07 16:52:19.088082 7f6a93337780 10 journal _open journal is not a block device, NOT checking disk write cache on '/var/lib/ceph/osd/ceph-2/journal'
    -8> 2015-01-07 16:52:19.088084 7f6a93337780  1 journal _open /var/lib/ceph/osd/ceph-2/journal fd 20: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 0
    -7> 2015-01-07 16:52:19.088094 7f6a93337780 10 journal read_header
    -6> 2015-01-07 16:52:19.088099 7f6a8dd70700 20 filestore(/var/lib/ceph/osd/ceph-2) sync_entry waiting for max_interval 5.000000
    -5> 2015-01-07 16:52:19.092091 7f6a93337780 10 journal header: block_size 4096 alignment 4096 max_size 104857600
    -4> 2015-01-07 16:52:19.092105 7f6a93337780 10 journal header: start 101089280
    -3> 2015-01-07 16:52:19.092107 7f6a93337780 10 journal  write_pos 4096
    -2> 2015-01-07 16:52:19.092111 7f6a93337780 10 journal open header.fsid = 5af9380a-a5a6-4ca2-9310-311ee01ac6cc
    -1> 2015-01-07 16:52:19.096659 7f6a93337780 -1 journal Unable to read past sequence 2751 but header indicates the journal has committed up through 2778, journal is corrupt
     0> 2015-01-07 16:52:19.111136 7f6a93337780 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f6a93337780 time 2015-01-07 16:52:19.096681
os/FileJournal.cc: 1693: FAILED assert(0)

 ceph version 0.80.7-185-gf43f147 (f43f14748d5efd382edc7f27104f29898e39b0d5)
 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x86f) [0x978c8f]
 2: (FileJournal::open(unsigned long)+0x6b5) [0x9761f5]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x26d) [0x8bca8d]
 4: (FileStore::mount()+0x30c2) [0x88fe82]
 5: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x61f30a]
 6: (main()+0x1d88) [0x606db8]
 7: (__libc_start_main()+0xed) [0x7f6a9113a76d]
 8: ceph-osd() [0x60b0a9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions

Copy link

#42

Updated by Samuel Just about 9 years ago

Status changed from Need More Info to 7
Assignee set to Samuel Just

The journalq is being populated with positions one sequence number off in do_read_entry. If we trim committed_thru to a sequence number within the journalq portion populated during replay, we end up with a header.start one forward of where it should be causing FileJournal::open() to assume that it's at the end of the journal. Subsequent calls into read_entry will be with the wrong offset/sequence number combo causing the assert to fail.

Actions

Copy link

#43