Project

General

Profile

Actions

Bug #9073

closed

OSD with device/partition journals down after fresh deploy or upgrade to 0.83

Added by Mark Kirkwood almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

100%

Source:
Community (user)
Tags:
Backport:
firefly
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Using a src build (and the packages built from it) on Ubuntu 14.04 x86_64. Ceph version is 0.83-399-gf77449c.

In a fresh install I'm seeing a hang at OSD mkfs with a journal partition, e.g:

$ ceph-osd --id 0 --mkjournal --mkfs --osd-data /data1/cephdata --osd-journal /dev/sdc1

Logs show:

7fbbc89e4800 -1 journal check: ondisk fsid d613adad-6e35-47d8-9f5d-e95f0170b4cd doesn't match expected 5390fcae-2ba8-497c-8dab-7265180bf82f, invalid (someone else's?) journal
7fbbc89e4800 -1 filestore(/data1/cephdata) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory

However that last line does not appear to be the issue, we are stuck after this point, waiting on a futex:

Process 12648 attached
futex(0x7fffaa3fcbac, FUTEX_WAIT_PRIVATE, 1, NULL)

Something strange happening with journals seems to be the cause, and reverting this commit:

commit 4eb18dd487da4cb621dcbecfc475fc0871b356ac
Author: Ma Jianpeng <>
Date: Wed Jul 23 10:10:38 2014 -0700

os/FileJournal: Update the journal header when closing journal
When closing journal, it should check must_write_header and update
journal header if must_write_header alreay set.
It can reduce the nosense journal-replay after restarting osd.

results in a OSD that is up.

Simiarly upgrading existing OSD's with (whole) device journals results in the going down with:

filestore(/var/lib/ceph/osd/ceph-0) mount failed to open journal /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument

..reverting the above commit makes them rescueable after a mkjournal.

I've attached a script I am using to do the simple fresh install - I was initially using ceph-deploy but wondered if it was causing the issue and wanted to rule that out.


Files

deploy.sh (2.69 KB) deploy.sh Quick script to deply 1 OSD Mark Kirkwood, 08/11/2014 09:15 PM
ceph-osd.0.log (35.9 KB) ceph-osd.0.log log of osd mkfs with Intel 520 journal Mark Kirkwood, 08/13/2014 03:58 PM
hdpard-intel-520.txt (3.07 KB) hdpard-intel-520.txt Mark Kirkwood, 08/13/2014 06:12 PM
journalblk.txt (4 KB) journalblk.txt Mark Kirkwood, 08/13/2014 06:51 PM
patch.diff (571 Bytes) patch.diff jianpeng ma, 08/13/2014 11:49 PM
ceph-osd.0.log (37.7 KB) ceph-osd.0.log Mark Kirkwood, 08/14/2014 01:38 AM
journalblk-before.txt (4 KB) journalblk-before.txt Mark Kirkwood, 08/14/2014 01:38 AM
journalblk-after.txt (4 KB) journalblk-after.txt Mark Kirkwood, 08/14/2014 01:38 AM
journalblk-mkjournal.txt (4 KB) journalblk-mkjournal.txt Mark Kirkwood, 08/14/2014 02:06 AM
deploy.sh (2.14 KB) deploy.sh Mark Kirkwood, 08/14/2014 03:04 AM
journal.diff (535 Bytes) journal.diff jianpeng ma, 08/14/2014 07:47 PM
journal.diff (691 Bytes) journal.diff jianpeng ma, 08/14/2014 10:58 PM
ceph-osd.strace (194 KB) ceph-osd.strace Mark Kirkwood, 08/14/2014 11:25 PM
journal.diff (2.29 KB) journal.diff jianpeng ma, 08/17/2014 07:27 PM
ceph-osd.0.log (16.5 KB) ceph-osd.0.log Mark Kirkwood, 08/17/2014 11:40 PM
ceph-osd.0.log (16.6 KB) ceph-osd.0.log Mark Kirkwood, 08/18/2014 01:10 AM
debug-journal-header-3.diff (562 Bytes) debug-journal-header-3.diff Mark Kirkwood, 08/18/2014 02:21 AM

Related issues 3 (0 open3 closed)

Related to Ceph - Bug #6003: journal Unable to read past sequence 406 ...ResolvedSamuel Just08/15/2013

Actions
Related to Ceph - Bug #9851: crash on journal/filestore shutdown on fireflyResolvedLoïc Dachary10/21/2014

Actions
Has duplicate Ceph - Bug #9768: ceph-osd mkfs hangsDuplicateLoïc Dachary10/14/2014

Actions
Actions

Also available in: Atom PDF