Project

General

Profile

Bug #2843

filestore: replay failure on xfs

Added by Sage Weil over 11 years ago. Updated about 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

congress osd.328 crashed with


   -31> 2012-07-26 04:50:31.096568 7ff6ff903780  2 osd.328 0 mounting /srv/ceph/osd/328 /srv/ceph/devices/osd.328.journal
   -30> 2012-07-26 04:50:31.096599 7ff6ff903780  5 filestore(/srv/ceph/osd/328) basedir /srv/ceph/osd/328 journal /srv/ceph/devices/osd.328.journal
   -29> 2012-07-26 04:50:31.096626 7ff6ff903780 10 filestore(/srv/ceph/osd/328) mount fsid is d46d9806-017d-4172-ae28-cfef8660eef5
   -28> 2012-07-26 04:50:31.099377 7ff6ff903780  0 filestore(/srv/ceph/osd/328) mount FIEMAP ioctl is supported and appears to work
   -27> 2012-07-26 04:50:31.099392 7ff6ff903780  0 filestore(/srv/ceph/osd/328) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
   -26> 2012-07-26 04:50:31.099666 7ff6ff903780  0 filestore(/srv/ceph/osd/328) mount did NOT detect btrfs
   -25> 2012-07-26 04:50:31.102271 7ff6ff903780  0 filestore(/srv/ceph/osd/328) mount syncfs(2) syscall fully supported (by glibc and kernel)
   -24> 2012-07-26 04:50:31.102329 7ff6ff903780  0 filestore(/srv/ceph/osd/328) mount found snaps <>
   -23> 2012-07-26 04:50:31.102343 7ff6ff903780  5 filestore(/srv/ceph/osd/328) mount op_seq is 1085689
   -22> 2012-07-26 04:50:31.103499 7ff6ff903780 20 filestore (init)dbobjectmap: seq is 27809
   -21> 2012-07-26 04:50:31.103514 7ff6ff903780 10 filestore(/srv/ceph/osd/328) open_journal at /srv/ceph/devices/osd.328.journal
   -20> 2012-07-26 04:50:31.103524 7ff6ff903780  0 filestore(/srv/ceph/osd/328) mount: enabling WRITEAHEAD journal mode: btrfs not detected
   -19> 2012-07-26 04:50:31.103527 7ff6ff903780 10 filestore(/srv/ceph/osd/328) list_collections
   -18> 2012-07-26 04:50:31.114766 7ff6ff903780  2 journal open /srv/ceph/devices/osd.328.journal fsid d46d9806-017d-4172-ae28-cfef8660eef5 fs_op_seq 1085689
   -17> 2012-07-26 04:50:31.114778 7ff6f6d75700 20 filestore(/srv/ceph/osd/328) sync_entry waiting for max_interval 5.000000
   -16> 2012-07-26 04:50:31.123102 7ff6ff903780  1 journal _open /srv/ceph/devices/osd.328.journal fd 32: 10736398336 bytes, block size 4096 bytes, directio = 1, aio = 0
   -15> 2012-07-26 04:50:31.884099 7ff6ff903780  2 journal read_entry 10318311424 : seq 1085689 114185663 bytes
   -14> 2012-07-26 04:50:31.884140 7ff6ff903780  2 journal read_entry 10432499712 : bad header magic, end of journal
   -13> 2012-07-26 04:50:31.884154 7ff6ff903780  2 journal read_entry 10432499712 : bad header magic, end of journal
   -12> 2012-07-26 04:50:31.884157 7ff6ff903780  3 journal journal_replay: end of journal, done.
   -11> 2012-07-26 04:50:31.930818 7ff6ff903780  1 journal _open /srv/ceph/devices/osd.328.journal fd 32: 10736398336 bytes, block size 4096 bytes, directio = 1, aio = 0
   -10> 2012-07-26 04:50:31.931001 7ff6f3d6f700 20 filestore(/srv/ceph/osd/328) flusher_entry start
    -9> 2012-07-26 04:50:31.931027 7ff6f3d6f700 20 filestore(/srv/ceph/osd/328) flusher_entry sleeping
    -8> 2012-07-26 04:50:31.931074 7ff6ff903780  2 osd.328 0 boot
    -7> 2012-07-26 04:50:31.931086 7ff6ff903780 15 filestore(/srv/ceph/osd/328) read meta/23c2fcde/osd_superblock/0//-1 0~0
    -6> 2012-07-26 04:50:31.931176 7ff6ff903780 10 filestore(/srv/ceph/osd/328) FileStore::read meta/23c2fcde/osd_superblock/0//-1 0~144/144
    -5> 2012-07-26 04:50:31.931188 7ff6ff903780 10 osd.328 0 read_superblock sb(31b8be2f-ac05-4e56-96b7-e702df166e29 osd.328 d46d9806-017d-4172-ae28-cfef8660eef5 e74853 [69038,74853] lci=[72194,74853])
    -4> 2012-07-26 04:50:31.931215 7ff6ff903780 20 osd.328 0 get_map 74853 - loading and decoding 0x2621700
    -3> 2012-07-26 04:50:31.931222 7ff6ff903780 15 filestore(/srv/ceph/osd/328) read meta/4f711459/osdmap.74853/0//-1 0~0
    -2> 2012-07-26 04:50:31.931254 7ff6ff903780 10 filestore(/srv/ceph/osd/328) FileStore::read meta/4f711459/osdmap.74853/0//-1 0~0/0
    -1> 2012-07-26 04:50:31.931260 7ff6ff903780 10 osd.328 0 add_map_bl 74853 0 bytes
     0> 2012-07-26 04:50:31.932648 7ff6ff903780 -1 *** Caught signal (Aborted) **
 in thread 7ff6ff903780

 ceph version 0.48argonaut-48-g16302ac (commit:16302acefd8def98fc4597366d6ba2845e17fcb6)
 1: ceph-osd() [0x6f4eba]
 2: (()+0xfcb0) [0x7ff6ff2e1cb0]
 3: (gsignal()+0x35) [0x7ff6fd7d2445]
 4: (abort()+0x17b) [0x7ff6fd7d5bab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7ff6fe12069d]
 6: (()+0xb5846) [0x7ff6fe11e846]
 7: (()+0xb5873) [0x7ff6fe11e873]
 8: (()+0xb596e) [0x7ff6fe11e96e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x7b1fe7]
 10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x3f) [0x76f72f]
 11: (OSDMap::decode(ceph::buffer::list&)+0x3e) [0x77082e]
 12: (OSD::get_map(unsigned int)+0x326) [0x5ced36]
 13: (OSD::init()+0x4ee) [0x5dc90e]
 14: (main()+0x2377) [0x522f37]
 15: (__libc_start_main()+0xed) [0x7ff6fd7bd76d]
 16: ceph-osd() [0x525109]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

the file on disk is 0 bytes.  dump-journal is attached.  the one entry writes that object, but doesn't replay because of the current seq value.

fooo (89.2 KB) Sage Weil, 07/25/2012 09:54 PM


Related issues

Related to Ceph - Bug #2830: [argonaut] osd/OSD.cc: 3906: FAILED assert(_get_map_bl(epoch, bl)) Duplicate 07/24/2012

History

#1 Updated by Sage Weil over 11 years ago

  • Status changed from New to Can't reproduce

#2 Updated by Guilhem Lettron about 11 years ago

Hi,

We have exactly the same problem on 1 of our osd (bobtail 0.56.1).
[[https://gist.github.com/4555135]]

What can I send to help you?

#3 Updated by Sage Weil about 11 years ago

The post-v0.50 version of this bug was just fixed, 66eb93b83648b4561b77ee6aab5b484e6dba4771, which is backported to the 'bobtail' branch in git. Can you try running that?

Note that it fixes the cause, but won't help that OSD start. There is a workaround branch for that, 'wip-bobtail-load_pgs-workaround'. You can get built packages for both branches from gitbuilder.ceph.com.

Also available in: Atom PDF