Project

General

Profile

Actions

Bug #5981

closed

osd: journal didn't preallocate

Added by Zoltan Arnold Nagy over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I had a node deployed using ceph-deploy. 7 disks in total, the journals are
on files on an SSD.

After rebooting the node, only some of them would come back.

-3 2.61 host signina
5 0.45 osd.5 up 1
6 0.45 osd.6 up 1
7 0.27 osd.7 down 0
8 0.27 osd.8 down 0
9 0.27 osd.9 down 0
10 0.45 osd.10 up 1
11 0.45 osd.11 down 0

In dmesg I see the following:

[ 42.072796] init: ceph-osd (ceph/8) main process (4118) terminated with status 134
[ 42.072824] init: ceph-osd (ceph/8) main process ended, respawning
[ 42.982971] init: ceph-osd (ceph/9) main process (4117) killed by ABRT signal
[ 42.983009] init: ceph-osd (ceph/9) main process ended, respawning
[ 42.987865] init: ceph-osd (ceph/7) main process (4124) terminated with status 134
[ 42.987892] init: ceph-osd (ceph/7) main process ended, respawning
[ 43.960877] init: ceph-osd (ceph/8) main process (8146) terminated with status 134
[ 43.960906] init: ceph-osd (ceph/8) main process ended, respawning
[ 44.963873] init: ceph-osd (ceph/9) main process (8355) terminated with status 134
[ 44.963901] init: ceph-osd (ceph/9) main process ended, respawning
[ 44.967048] init: ceph-osd (ceph/7) main process (8360) terminated with status 134
[ 44.967089] init: ceph-osd (ceph/7) main process ended, respawning

and this keeps repeating for 7,8,9,11.

They are, however, mounted:

/dev/sde1 on /var/lib/ceph/osd/ceph-9 type xfs (rw)
/dev/sdd1 on /var/lib/ceph/osd/ceph-8 type xfs (rw)
/dev/sdb1 on /var/lib/ceph/osd/ceph-6 type xfs (rw)
/dev/sdf1 on /var/lib/ceph/osd/ceph-10 type xfs (rw)
/dev/sdc1 on /var/lib/ceph/osd/ceph-7 type xfs (rw)
/dev/sdg1 on /var/lib/ceph/osd/ceph-11 type xfs (rw)
/dev/sda1 on /var/lib/ceph/osd/ceph-5 type xfs (rw)

Attaching the logs for these.

The journals are available at the specified location:

zoltan@signina:~$ ls las /osdjournal/
total 30787668
4 drwxr-xr-x 3 root root 4096 Aug 13 14:51 .
4 drwxr-xr-x 24 root root 4096 Aug 13 14:34 ..
16 drwx-----
2 root root 16384 Aug 13 14:34 lost+found
5242888 rw-r--r- 1 root root 5368709120 Aug 15 20:56 sda
5242888 rw-r--r- 1 root root 5368709120 Aug 15 20:56 sdb
3409264 rw-r--r- 1 root root 5368709120 Aug 15 20:50 sdc
3585292 rw-r--r- 1 root root 5368709120 Aug 15 20:50 sdd
3413500 rw-r--r- 1 root root 5368709120 Aug 15 20:50 sde
5242888 rw-r--r- 1 root root 5368709120 Aug 15 20:56 sdf
4650924 rw-r--r- 1 root root 5368709120 Aug 15 20:51 sdg
zoltan@signina:~$


Files

ceph-osd.7.log (323 KB) ceph-osd.7.log Zoltan Arnold Nagy, 08/15/2013 12:22 PM
ceph-osd.11.log (1.37 MB) ceph-osd.11.log Zoltan Arnold Nagy, 08/15/2013 12:22 PM
ceph-osd.9.log (264 KB) ceph-osd.9.log Zoltan Arnold Nagy, 08/15/2013 12:22 PM
ceph-osd.8.log (290 KB) ceph-osd.8.log Zoltan Arnold Nagy, 08/15/2013 12:22 PM

Updated by Zoltan Arnold Nagy over 10 years ago

Oh, I forgot. This is on Ubuntu 13.04, with the following packages:

zoltan@signina:~$ dpkg -l | grep 0.61.
ii ceph 0.61.7-1raring amd64 distributed storage and file system
ii ceph-common 0.61.7-1raring amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-fs-common 0.61.7-1raring amd64 common utilities to mount and interact with a ceph file system
ii ceph-mds 0.61.7-1raring amd64 metadata server for the ceph distributed file system
ii librados2 0.61.7-1raring amd64 RADOS distributed object store client library
ii librbd1 0.61.7-1raring amd64 RADOS block device client library
zoltan@signina:~$

Actions #2

Updated by Zoltan Arnold Nagy over 10 years ago

Ok, the issue has been that the journal mountpoint has been filled, since it has been undersized.

Would it be possible to pre-fill the journal files instead of using spare files? Then this error would be shown during deploy.

Actions #3

Updated by Sage Weil over 10 years ago

strange, it is doing an fallocate on the jounral when it creates it, which should ensure there is sufficient disk space. what fs are you using?

Actions #4

Updated by Sage Weil over 10 years ago

  • Status changed from New to Need More Info
  • Priority changed from Normal to High
  • Source changed from other to Community (user)
Actions #5

Updated by Zoltan Arnold Nagy over 10 years ago

ext4, mounted with noatime,nodiratime,discard.

Actions #6

Updated by Sage Weil over 10 years ago

  • Subject changed from ceph-osd doesn't bring osd back after reboot to osd: journal didn't preallocate
Actions #7

Updated by Sage Weil over 10 years ago

  • Assignee set to Sage Weil

the problem is that ceph-disk creates the jouranl but does not allocat eit

Actions #8

Updated by Sage Weil over 10 years ago

  • Priority changed from High to Urgent
Actions #9

Updated by Sage Weil over 10 years ago

  • Status changed from Need More Info to Fix Under Review
Actions #10

Updated by Sage Weil over 10 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF