Project

General

Profile

Actions

Bug #7738

closed

osd: journal crash on startup on wheezy

Added by Sage Weil about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
emperor, dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/a/sage-wheezy/130813

2014-03-15 06:26:00.227645 7fd0c44b6780  0 filestore(/var/lib/ceph/osd/ceph-8) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-03-15 06:26:00.227747 7fd0c44b6780 10 journal journal_replay fs op_seq 2
2014-03-15 06:26:00.227756 7fd0c44b6780  2 journal open /var/lib/ceph/osd/ceph-8/journal fsid b7d17c8b-821c-45ad-92c9-8afe062d24cc fs_op_seq 2
2014-03-15 06:26:00.227786 7fd0c44b6780 10 journal _open_block_device: ignoring osd journal size. We'll use the entire block device (size: 5367661056)
2014-03-15 06:26:00.236141 7fd0c44b6780 -1 journal _check_disk_write_cache: pclose failed: (61) No data available
2014-03-15 06:26:00.236172 7fd0c44b6780  1 journal _open /var/lib/ceph/osd/ceph-8/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-03-15 06:26:00.236187 7fd0c44b6780 10 journal read_header
2014-03-15 06:26:00.243588 7fd0c44b6780 10 journal header: block_size 4096 alignment 4096 max_size 5367660544
2014-03-15 06:26:00.243599 7fd0c44b6780 10 journal header: start 4096
2014-03-15 06:26:00.243600 7fd0c44b6780 10 journal  write_pos 4096
2014-03-15 06:26:00.243603 7fd0c44b6780 10 journal open header.fsid = b7d17c8b-821c-45ad-92c9-8afe062d24cc
2014-03-15 06:26:00.243695 7fd0c44b6780  2 journal read_entry 8192 : seq 2 505 bytes
2014-03-15 06:26:00.243709 7fd0c44b6780  2 journal No further valid entries found, journal is most likely valid
2014-03-15 06:26:00.243712 7fd0c44b6780 10 journal open reached end of journal.
2014-03-15 06:26:00.243716 7fd0c44b6780  2 journal No further valid entries found, journal is most likely valid
2014-03-15 06:26:00.243717 7fd0c44b6780  3 journal journal_replay: end of journal, done.
2014-03-15 06:26:00.243734 7fd0c44b6780  2 journal FileJournal::_open: unable to open journal: open() failed: (2) No such file or directory
2014-03-15 06:26:00.243783 7fd0c44b6780 10 journal journal_start
2014-03-15 06:26:00.244037 7fd0bfd90700 10 journal commit_start max_applied_seq 2, open_ops 0
2014-03-15 06:26:00.244050 7fd0bfd90700 10 journal commit_start blocked, all open_ops have completed
2014-03-15 06:26:00.244051 7fd0bfd90700 10 journal commit_start nothing to do
2014-03-15 06:26:00.244053 7fd0bfd90700 10 journal commit_start
2014-03-15 06:26:00.244508 7fd0bed8e700 10 journal write_finish_thread_entry enter
2014-03-15 06:26:00.244530 7fd0bed8e700 20 journal write_finish_thread_entry sleeping
2014-03-15 06:26:00.244569 7fd0c44b6780 10 journal journal_stop
2014-03-15 06:26:00.244598 7fd0c44b6780  1 journal close /var/lib/ceph/osd/ceph-8/journal
2014-03-15 06:26:00.244623 7fd0bed8e700 10 journal write_finish_thread_entry exit
2014-03-15 06:26:00.244647 7fd0bf58f700 10 journal write_thread_entry start
2014-03-15 06:26:00.244665 7fd0bf58f700 10 journal write_thread_entry finish
2014-03-15 06:26:00.274736 7fd0c44b6780 -1 os/FileJournal.cc: In function 'virtual void FileJournal::close()' thread 7fd0c44b6780 time 2014-03-15 06:26:00.244697
os/FileJournal.cc: 547: FAILED assert(fd >= 0)

 ceph version 0.77-884-g6f9db6c (6f9db6c70281f0b9041bbf529d78d976c9df9b5d)
 1: (FileJournal::close()+0x185) [0xa344c5]
 2: (JournalingObjectStore::journal_stop()+0x64) [0x897504]
 3: (FileStore::umount()+0xd4) [0x85ac24]
 4: (OSD::do_convertfs(ObjectStore*)+0x1c9) [0x7914a9]
 5: (main()+0x2234) [0x738014]
 6: (__libc_start_main()+0xfd) [0x7fd0c23daead]
 7: /usr/bin/ceph-osd() [0x73bc29]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---

Actions #1

Updated by Sage Weil about 10 years ago

  • Status changed from New to Fix Under Review

As far as I can tell the problem is that the journal symlink isn't always present when we reopen the journal for write. This is probably because the udev rules are doing their thing twiddling the symlinks in /dev/disk around. Since ceph-disk activate will trigger after, this is fine, as long as ceph-osd exits cleanly. wip-7738 fixes that; we were getting failures because a core file was present from a failed assert.

Actions #2

Updated by Sage Weil about 10 years ago

  • Status changed from Fix Under Review to Resolved
Actions #3

Updated by Sage Weil about 10 years ago

  • Status changed from Resolved to Pending Backport
  • Priority changed from Urgent to High
  • Backport set to emperor, dumpling
Actions #4

Updated by Sage Weil about 10 years ago

  • Assignee deleted (Sage Weil)
Actions #5

Updated by Sage Weil about 10 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF