Actions
Bug #7738
closedosd: journal crash on startup on wheezy
Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
emperor, dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ubuntu@teuthology:/a/sage-wheezy/130813
2014-03-15 06:26:00.227645 7fd0c44b6780 0 filestore(/var/lib/ceph/osd/ceph-8) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2014-03-15 06:26:00.227747 7fd0c44b6780 10 journal journal_replay fs op_seq 2 2014-03-15 06:26:00.227756 7fd0c44b6780 2 journal open /var/lib/ceph/osd/ceph-8/journal fsid b7d17c8b-821c-45ad-92c9-8afe062d24cc fs_op_seq 2 2014-03-15 06:26:00.227786 7fd0c44b6780 10 journal _open_block_device: ignoring osd journal size. We'll use the entire block device (size: 5367661056) 2014-03-15 06:26:00.236141 7fd0c44b6780 -1 journal _check_disk_write_cache: pclose failed: (61) No data available 2014-03-15 06:26:00.236172 7fd0c44b6780 1 journal _open /var/lib/ceph/osd/ceph-8/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-03-15 06:26:00.236187 7fd0c44b6780 10 journal read_header 2014-03-15 06:26:00.243588 7fd0c44b6780 10 journal header: block_size 4096 alignment 4096 max_size 5367660544 2014-03-15 06:26:00.243599 7fd0c44b6780 10 journal header: start 4096 2014-03-15 06:26:00.243600 7fd0c44b6780 10 journal write_pos 4096 2014-03-15 06:26:00.243603 7fd0c44b6780 10 journal open header.fsid = b7d17c8b-821c-45ad-92c9-8afe062d24cc 2014-03-15 06:26:00.243695 7fd0c44b6780 2 journal read_entry 8192 : seq 2 505 bytes 2014-03-15 06:26:00.243709 7fd0c44b6780 2 journal No further valid entries found, journal is most likely valid 2014-03-15 06:26:00.243712 7fd0c44b6780 10 journal open reached end of journal. 2014-03-15 06:26:00.243716 7fd0c44b6780 2 journal No further valid entries found, journal is most likely valid 2014-03-15 06:26:00.243717 7fd0c44b6780 3 journal journal_replay: end of journal, done. 2014-03-15 06:26:00.243734 7fd0c44b6780 2 journal FileJournal::_open: unable to open journal: open() failed: (2) No such file or directory 2014-03-15 06:26:00.243783 7fd0c44b6780 10 journal journal_start 2014-03-15 06:26:00.244037 7fd0bfd90700 10 journal commit_start max_applied_seq 2, open_ops 0 2014-03-15 06:26:00.244050 7fd0bfd90700 10 journal commit_start blocked, all open_ops have completed 2014-03-15 06:26:00.244051 7fd0bfd90700 10 journal commit_start nothing to do 2014-03-15 06:26:00.244053 7fd0bfd90700 10 journal commit_start 2014-03-15 06:26:00.244508 7fd0bed8e700 10 journal write_finish_thread_entry enter 2014-03-15 06:26:00.244530 7fd0bed8e700 20 journal write_finish_thread_entry sleeping 2014-03-15 06:26:00.244569 7fd0c44b6780 10 journal journal_stop 2014-03-15 06:26:00.244598 7fd0c44b6780 1 journal close /var/lib/ceph/osd/ceph-8/journal 2014-03-15 06:26:00.244623 7fd0bed8e700 10 journal write_finish_thread_entry exit 2014-03-15 06:26:00.244647 7fd0bf58f700 10 journal write_thread_entry start 2014-03-15 06:26:00.244665 7fd0bf58f700 10 journal write_thread_entry finish 2014-03-15 06:26:00.274736 7fd0c44b6780 -1 os/FileJournal.cc: In function 'virtual void FileJournal::close()' thread 7fd0c44b6780 time 2014-03-15 06:26:00.244697 os/FileJournal.cc: 547: FAILED assert(fd >= 0) ceph version 0.77-884-g6f9db6c (6f9db6c70281f0b9041bbf529d78d976c9df9b5d) 1: (FileJournal::close()+0x185) [0xa344c5] 2: (JournalingObjectStore::journal_stop()+0x64) [0x897504] 3: (FileStore::umount()+0xd4) [0x85ac24] 4: (OSD::do_convertfs(ObjectStore*)+0x1c9) [0x7914a9] 5: (main()+0x2234) [0x738014] 6: (__libc_start_main()+0xfd) [0x7fd0c23daead] 7: /usr/bin/ceph-osd() [0x73bc29] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events ---
Updated by Sage Weil about 10 years ago
- Status changed from New to Fix Under Review
As far as I can tell the problem is that the journal symlink isn't always present when we reopen the journal for write. This is probably because the udev rules are doing their thing twiddling the symlinks in /dev/disk around. Since ceph-disk activate will trigger after, this is fine, as long as ceph-osd exits cleanly. wip-7738 fixes that; we were getting failures because a core file was present from a failed assert.
Updated by Sage Weil about 10 years ago
- Status changed from Fix Under Review to Resolved
Updated by Sage Weil about 10 years ago
- Status changed from Resolved to Pending Backport
- Priority changed from Urgent to High
- Backport set to emperor, dumpling
Updated by Sage Weil about 10 years ago
- Status changed from Pending Backport to Resolved
Actions