Project

General

Profile

Actions

Bug #4527

closed

journal Unable to read past sequence 0 but header indicates the journal has committed up through 1437, journal is corrupt

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-03-21T23:16:53.913 INFO:teuthology.task.ceph.osd.5.out:starting osd.5 at :/0 osd_data /var/lib/ceph/osd/ceph-5 /var/lib/ceph/osd/ceph-5/journal
2013-03-21T23:16:54.651 INFO:teuthology.task.ceph.osd.5.err:2013-03-21 23:16:52.780108 7f1624bd4780 -1 journal Unable to read past sequence 0 but header indicates the journal has committed up through 1437, journal is corrupt
2013-03-21T23:16:54.665 INFO:teuthology.task.ceph.osd.5.err:os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f1624bd4780 time 2013-03-21 23:16:52.780134
2013-03-21T23:16:54.665 INFO:teuthology.task.ceph.osd.5.err:os/FileJournal.cc: 1697: FAILED assert(0)
2013-03-21T23:16:54.682 INFO:teuthology.task.ceph.osd.5.err: ceph version 0.59-384-gd67eee1 (d67eee1d11d313817fae5a672ab835777a6601e2)
2013-03-21T23:16:54.682 INFO:teuthology.task.ceph.osd.5.err: 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xcbb) [0x70d16b]
2013-03-21T23:16:54.683 INFO:teuthology.task.ceph.osd.5.err: 2: (FileJournal::open(unsigned long)+0x6c3) [0x70a303]
2013-03-21T23:16:54.683 INFO:teuthology.task.ceph.osd.5.err: 3: (JournalingObjectStore::journal_replay(unsigned long)+0x271) [0x753bf1]
2013-03-21T23:16:54.683 INFO:teuthology.task.ceph.osd.5.err: 4: (FileStore::mount()+0x38f4) [0x738aa4]
2013-03-21T23:16:54.683 INFO:teuthology.task.ceph.osd.5.err: 5: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x605a9a]
2013-03-21T23:16:54.683 INFO:teuthology.task.ceph.osd.5.err: 6: (OSD::convertfs(std::string const&, std::string const&)+0x47) [0x606507]
2013-03-21T23:16:54.683 INFO:teuthology.task.ceph.osd.5.err: 7: (main()+0x2142) [0x57b3d2]
2013-03-21T23:16:54.684 INFO:teuthology.task.ceph.osd.5.err: 8: (__libc_start_main()+0xed) [0x7f162267476d]
2013-03-21T23:16:54.684 INFO:teuthology.task.ceph.osd.5.err: 9: ceph-osd() [0x57db3d]

job was
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-21_01:00:32-rbd-next-testing-basic/640$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 06fb6a9f87bb1377a6549602fff230d4b352afe9
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        rbd cache: true
        rbd cache max dirty: 0
      global:
        ms inject socket failures: 5000
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: d67eee1d11d313817fae5a672ab835777a6601e2
  s3tests:
    branch: next
  workunit:
    sha1: d67eee1d11d313817fae5a672ab835777a6601e2
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- rbd_fsx:
    clients:
    - client.0
    ops: 2000

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #4487: osd: FAILED assert(0 == "FileJournal::read_entry(): corrupt journal")DuplicateSamuel Just03/18/2013

Actions
Actions #1

Updated by Sage Weil about 11 years ago

  • Status changed from New to 12

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-28_01:01:45-rados-next-testing-basic/4834

Actions #2

Updated by Sage Weil about 11 years ago

  • Priority changed from Urgent to Immediate

ubuntu@teuthology:/a/teuthology-2013-03-30_01:00:05-rados-next-testing-basic/6231

Actions #3

Updated by Sage Weil about 11 years ago

ubuntu@teuthology:/a/teuthology-2013-03-30_01:00:05-rados-next-testing-basic/6250

Actions #4

Updated by Sage Weil about 11 years ago

ubuntu@teuthology:/a/teuthology-2013-03-30_01:00:30-rbd-next-testing-basic/6357

Actions #5

Updated by Ian Colle about 11 years ago

  • Assignee set to Samuel Just
Actions #6

Updated by Ian Colle about 11 years ago

  • Target version set to v0.61 - Cuttlefish
Actions #7

Updated by Samuel Just about 11 years ago

2013-04-03 13:31:00.938602 7f083ffef700 5 journal submit_entry seq 927 len 4195890 (0x7f082c0126f0)
2013-04-03 13:31:00.938603 7f083ffef700 30 journal XXX throttle take 4195890
2013-04-03 13:31:00.938620 7f083ffef700 5 filestore(/var/lib/ceph/osd/ceph-3) queue_op 0x7f082c051f90 seq 927 osr(0.14 0x7f08140fac00) 4195878 bytes (queue has 1 ops and 4195878 bytes)
2013-04-03 13:31:00.938627 7f083ffef700 10 journal op_submit_finish 927
2013-04-03 13:31:00.938624 7f085effd700 20 journal write_thread_entry woke up
2013-04-03 13:31:00.938630 7f083ffef700 10 osd.3 10 dequeue_op 0x7f07e404e230 finish
2013-04-03 13:31:00.938633 7f085effd700 10 journal room 104853503 max_size 104857600 pos 42311680 header.start 42311680 top 4096
2013-04-03 13:31:00.938637 7f085effd700 10 journal check_for_full at 42311680 : 4202496 < 104853503
2013-04-03 13:31:00.938639 7f0846ffd700 10 journal op_apply_start 927 open_ops 0 > 1
2013-04-03 13:31:00.938639 7f085effd700 15 journal prepare_single_write 1 will write 42311680 : seq 927 len 4195890 -> 4202496 (head 40 pre_pad 2957 ebl 4195890 post_pad 3569 tail 40) (ebl alignment 2997)
2013-04-03 13:31:00.938644 7f0846ffd700 5 filestore(/var/lib/ceph/osd/ceph-3) do_op 0x7f082c051f90 seq 927 osr(0.14 0x7f08140fac00)/0x7f08140fac00 start
2013-04-03 13:31:00.938648 7f0846ffd700 10 filestore(/var/lib/ceph/osd/ceph-3) _do_transaction on 0x7f082c002c20
2013-04-03 13:31:00.938654 7f0846ffd700 15 filestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys meta/5172cfcb/pglog_0.14/0//-1
2013-04-03 13:31:00.938759 7f0846ffd700 10 filestore hoid: 5172cfcb/pglog_0.14/0//-1 not skipping op, *spos 927.0.0
2013-04-03 13:31:00.938762 7f0846ffd700 10 filestore > header.spos 0.0.0
2013-04-03 13:31:00.938820 7f0846ffd700 15 filestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys meta/16ef7597/infos/head//-1
2013-04-03 13:31:00.938883 7f0846ffd700 10 filestore hoid: 16ef7597/infos/head//-1 not skipping op, *spos 927.0.1
2013-04-03 13:31:00.938888 7f0846ffd700 10 filestore > header.spos 0.0.0
2013-04-03 13:31:00.938911 7f0846ffd700 10 filestore(/var/lib/ceph/osd/ceph-3) _do_transaction on 0x7f082c002b68
2013-04-03 13:31:00.938916 7f0846ffd700 15 filestore(/var/lib/ceph/osd/ceph-3) write 0.14_head/fab48e74/benchmark_data_mira084_29639_object629/head//0 0~4194304
2013-04-03 13:31:00.941245 7f0846ffd700 10 filestore(/var/lib/ceph/osd/ceph-3) queue_flusher ep 17 fd 38 0~4194304 qlen 1
2013-04-03 13:31:00.941263 7f0846ffd700 10 filestore(/var/lib/ceph/osd/ceph-3) write 0.14_head/fab48e74/benchmark_data_mira084_29639_object629/head//0 0~4194304 = 4194304
2013-04-03 13:31:00.941290 7f08467fc700 20 filestore(/var/lib/ceph/osd/ceph-3) flusher_entry awoke
2013-04-03 13:31:00.941301 7f08467fc700 10 filestore(/var/lib/ceph/osd/ceph-3) flusher_entry flushing+closing 38 ep 17
2013-04-03 13:31:00.941351 7f0846ffd700 15 filestore(/var/lib/ceph/osd/ceph-3) setattrs 0.14_head/fab48e74/benchmark_data_mira084_29639_object629/head//0
2013-04-03 13:31:00.941412 7f0846ffd700 10 filestore(/var/lib/ceph/osd/ceph-3) setattrs 0.14_head/fab48e74/benchmark_data_mira084_29639_object629/head//0 = -2
2013-04-03 13:31:00.941469 7f0846ffd700 20 filestore(/var/lib/ceph/osd/ceph-3) fgetattrs 44 getting '
'
2013-04-03 13:31:00.941490 7f0846ffd700 15 filestore(/var/lib/ceph/osd/ceph-3) setattrs 0.14_head/fab48e74/benchmark_data_mira084_29639_object629/head//0
2013-04-03 13:31:00.941525 7f0846ffd700 10 filestore(/var/lib/ceph/osd/ceph-3) setattrs 0.14_head/fab48e74/benchmark_data_mira084_29639_object629/head//0 = -2
2013-04-03 13:31:00.941536 7f0846ffd700 10 journal op_apply_finish 927 open_ops 1 -> 0, max_applied_seq 926 -> 927
2013-04-03 13:31:00.941538 7f0846ffd700 10 filestore(/var/lib/ceph/osd/ceph-3) _do_op 0x7f082c051f90 seq 927 r = 0, finisher 0x7f082c00a1e0 0
2013-04-03 13:31:00.941541 7f0846ffd700 10 filestore(/var/lib/ceph/osd/ceph-3) _finish_op 0x7f082c051f90 seq 927 osr(0.14 0x7f08140fac00)/0x7f08140fac00
2013-04-03 13:31:00.941580 7f0845ffb700 10 osd.3 pg_epoch: 10 pg[0.14( v 10'18 (0'0,10'18] local-les=10 n=18 ec=1 les/c 10/10 9/9/9) [2,3] r=1 lpr=9 pi=7-8/1 luod=0'0 lcod 10'16 active] sub_op_modify_applied on 0x7f082c002b20 op osd_sub_op(client.4131.0:630 0.14 fab48e74/benchmark_data_mira084_29639_object629/head//0 [] v 10'18 snapset=0=[]:[] snapc=0=[]) v7
2013-04-03 13:31:00.941608 7f0845ffb700 1 -
10.214.135.116:6801/29315 --> osd.2 10.214.137.130:6807/16637 -- osd_sub_op_reply(client.4131.0:630 0.14 fab48e74/benchmark_data_mira084_29639_object629/head//0 [] ack, result = 0) v1 -- ?+0 0x7f080803bd30
2013-04-03 13:31:00.942950 7f08467fc700 10 filestore(/var/lib/ceph/osd/ceph-3) posix_fadvise performed after local flush
2013-04-03 13:31:00.942965 7f08467fc700 20 filestore(/var/lib/ceph/osd/ceph-3) flusher_entry sleeping
2013-04-03 13:31:00.943139 7f085effd700 20 journal prepare_multi_write queue_pos now 46514176
2013-04-03 13:31:00.943150 7f085effd700 15 journal do_aio_write writing 42311680~4202496 + header
2013-04-03 13:31:00.943154 7f085effd700 20 journal write_aio_bl 0~4096 seq 0
2013-04-03 13:31:00.943157 7f085effd700 20 journal write_aio_bl .. 0~4096 in 1
2013-04-03 13:31:16.414424 7fa183c12780 5 filestore(/var/lib/ceph/osd/ceph-3) basedir /var/lib/ceph/osd/ceph-3 journal /var/lib/ceph/osd/ceph-3/journal
2013-04-03 13:31:16.414460 7fa183c12780 10 filestore(/var/lib/ceph/osd/ceph-3) mount fsid is 8a27e96b-93da-4a69-a251-dc3c1cd838a3
2013-04-03 13:31:16.421928 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount FIEMAP ioctl is supported and appears to work
2013-04-03 13:31:16.421953 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-04-03 13:31:16.422780 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount detected btrfs
2013-04-03 13:31:16.422799 7fa183c12780 20 filestore(/var/lib/ceph/osd/ceph-3) _do_clone_range 0~1 to 0
2013-04-03 13:31:16.422807 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount btrfs CLONE_RANGE ioctl is supported
2013-04-03 13:31:16.450599 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount btrfs SNAP_CREATE is supported
2013-04-03 13:31:16.450950 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount btrfs SNAP_DESTROY is supported
2013-04-03 13:31:16.451221 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount btrfs START_SYNC is supported (transid 53)
2013-04-03 13:31:16.458689 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount btrfs WAIT_SYNC is supported
2013-04-03 13:31:16.468613 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount btrfs SNAP_CREATE_V2 is supported
2013-04-03 13:31:16.479014 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-04-03 13:31:16.479202 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount found snaps <910,926>
2013-04-03 13:31:16.479291 7fa183c12780 10 filestore(/var/lib/ceph/osd/ceph-3) current/ seq was 926
2013-04-03 13:31:16.479302 7fa183c12780 10 filestore(/var/lib/ceph/osd/ceph-3) most recent snap from <910,926> is 926
2013-04-03 13:31:16.479333 7fa183c12780 10 filestore(/var/lib/ceph/osd/ceph-3) mount rolling back to consistent snap 926
2013-04-03 13:31:17.298097 7fa183c12780 5 filestore(/var/lib/ceph/osd/ceph-3) mount op_seq is 926
2013-04-03 13:31:17.513086 7fa183c12780 20 filestore (init)dbobjectmap: seq is 156
2013-04-03 13:31:17.513103 7fa183c12780 10 filestore(/var/lib/ceph/osd/ceph-3) open_journal at /var/lib/ceph/osd/ceph-3/journal
2013-04-03 13:31:17.513127 7fa183c12780 0 filestore(/var/lib/ceph/osd/ceph-3) mount: enabling PARALLEL journal mode: btrfs, SNAP_CREATE_V2 detected and 'filestore btrfs snap' mode is enabled
2013-04-03 13:31:17.513132 7fa183c12780 10 filestore(/var/lib/ceph/osd/ceph-3) list_collections
2013-04-03 13:31:17.514490 7fa183c12780 10 journal journal_replay fs op_seq 926
2013-04-03 13:31:17.514503 7fa183c12780 2 journal open /var/lib/ceph/osd/ceph-3/journal fsid 8a27e96b-93da-4a69-a251-dc3c1cd838a3 fs_op_seq 926
2013-04-03 13:31:17.514530 7fa183c12780 10 journal _open journal is not a block device, NOT checking disk write cache on '/var/lib/ceph/osd/ceph-3/journal'
2013-04-03 13:31:17.514553 7fa183c12780 1 journal _open /var/lib/ceph/osd/ceph-3/journal fd 19: 104857600 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-04-03 13:31:17.514561 7fa183c12780 10 journal read_header
2013-04-03 13:31:17.514560 7fa17bfff700 20 filestore(/var/lib/ceph/osd/ceph-3) sync_entry waiting for max_interval 5.000000
2013-04-03 13:31:17.514717 7fa183c12780 10 journal header: block_size 4096 alignment 4096 max_size 104857600
2013-04-03 13:31:17.514727 7fa183c12780 10 journal header: start 42311680
2013-04-03 13:31:17.514729 7fa183c12780 10 journal write_pos 4096
2013-04-03 13:31:17.514732 7fa183c12780 10 journal open header.fsid = 8a27e96b-93da-4a69-a251-dc3c1cd838a3
2013-04-03 13:31:17.554708 7fa183c12780 25 journal read_entry 42311680 : bad header magic, end of journal
2013-04-03 13:31:17.555361 7fa183c12780 25 journal read_entry 42315776 : bad header magic, end of journal
2013-04-03 13:31:17.555541 7fa183c12780 25 journal read_entry 42319872 : bad header magic, end of journal
2013-04-03 13:31:17.555563 7fa183c12780 25 journal read_entry 42323968 : bad header magic, end of journal
2013-04-03 13:31:17.555570 7fa183c12780 25 journal read_entry 42328064 : bad header magic, end of journal

Actions #8

Updated by Samuel Just about 11 years ago

  • Status changed from 12 to Resolved

6ef9d87a02872fc7312c1c8f48abf9b48457f6e4

Actions

Also available in: Atom PDF