Bug #5981
closedosd: journal didn't preallocate
0%
Description
I had a node deployed using ceph-deploy. 7 disks in total, the journals are
on files on an SSD.
After rebooting the node, only some of them would come back.
-3 2.61 host signina
5 0.45 osd.5 up 1
6 0.45 osd.6 up 1
7 0.27 osd.7 down 0
8 0.27 osd.8 down 0
9 0.27 osd.9 down 0
10 0.45 osd.10 up 1
11 0.45 osd.11 down 0
In dmesg I see the following:
[ 42.072796] init: ceph-osd (ceph/8) main process (4118) terminated with status 134
[ 42.072824] init: ceph-osd (ceph/8) main process ended, respawning
[ 42.982971] init: ceph-osd (ceph/9) main process (4117) killed by ABRT signal
[ 42.983009] init: ceph-osd (ceph/9) main process ended, respawning
[ 42.987865] init: ceph-osd (ceph/7) main process (4124) terminated with status 134
[ 42.987892] init: ceph-osd (ceph/7) main process ended, respawning
[ 43.960877] init: ceph-osd (ceph/8) main process (8146) terminated with status 134
[ 43.960906] init: ceph-osd (ceph/8) main process ended, respawning
[ 44.963873] init: ceph-osd (ceph/9) main process (8355) terminated with status 134
[ 44.963901] init: ceph-osd (ceph/9) main process ended, respawning
[ 44.967048] init: ceph-osd (ceph/7) main process (8360) terminated with status 134
[ 44.967089] init: ceph-osd (ceph/7) main process ended, respawning
and this keeps repeating for 7,8,9,11.
They are, however, mounted:
/dev/sde1 on /var/lib/ceph/osd/ceph-9 type xfs (rw)
/dev/sdd1 on /var/lib/ceph/osd/ceph-8 type xfs (rw)
/dev/sdb1 on /var/lib/ceph/osd/ceph-6 type xfs (rw)
/dev/sdf1 on /var/lib/ceph/osd/ceph-10 type xfs (rw)
/dev/sdc1 on /var/lib/ceph/osd/ceph-7 type xfs (rw)
/dev/sdg1 on /var/lib/ceph/osd/ceph-11 type xfs (rw)
/dev/sda1 on /var/lib/ceph/osd/ceph-5 type xfs (rw)
Attaching the logs for these.
The journals are available at the specified location:
zoltan@signina:~$ ls las /osdjournal/ 2 root root 16384 Aug 13 14:34 lost+found
total 30787668
4 drwxr-xr-x 3 root root 4096 Aug 13 14:51 .
4 drwxr-xr-x 24 root root 4096 Aug 13 14:34 ..
16 drwx-----
5242888 rw-r--r- 1 root root 5368709120 Aug 15 20:56 sda
5242888 rw-r--r- 1 root root 5368709120 Aug 15 20:56 sdb
3409264 rw-r--r- 1 root root 5368709120 Aug 15 20:50 sdc
3585292 rw-r--r- 1 root root 5368709120 Aug 15 20:50 sdd
3413500 rw-r--r- 1 root root 5368709120 Aug 15 20:50 sde
5242888 rw-r--r- 1 root root 5368709120 Aug 15 20:56 sdf
4650924 rw-r--r- 1 root root 5368709120 Aug 15 20:51 sdg
zoltan@signina:~$
Files
Updated by Zoltan Arnold Nagy over 10 years ago
- File ceph-osd.7.log ceph-osd.7.log added
- File ceph-osd.11.log ceph-osd.11.log added
- File ceph-osd.9.log ceph-osd.9.log added
- File ceph-osd.8.log ceph-osd.8.log added
Oh, I forgot. This is on Ubuntu 13.04, with the following packages:
zoltan@signina:~$ dpkg -l | grep 0.61.
ii ceph 0.61.7-1raring amd64 distributed storage and file system
ii ceph-common 0.61.7-1raring amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-fs-common 0.61.7-1raring amd64 common utilities to mount and interact with a ceph file system
ii ceph-mds 0.61.7-1raring amd64 metadata server for the ceph distributed file system
ii librados2 0.61.7-1raring amd64 RADOS distributed object store client library
ii librbd1 0.61.7-1raring amd64 RADOS block device client library
zoltan@signina:~$
Updated by Zoltan Arnold Nagy over 10 years ago
Ok, the issue has been that the journal mountpoint has been filled, since it has been undersized.
Would it be possible to pre-fill the journal files instead of using spare files? Then this error would be shown during deploy.
Updated by Sage Weil over 10 years ago
strange, it is doing an fallocate on the jounral when it creates it, which should ensure there is sufficient disk space. what fs are you using?
Updated by Sage Weil over 10 years ago
- Status changed from New to Need More Info
- Priority changed from Normal to High
- Source changed from other to Community (user)
Updated by Zoltan Arnold Nagy over 10 years ago
ext4, mounted with noatime,nodiratime,discard.
Updated by Sage Weil over 10 years ago
- Subject changed from ceph-osd doesn't bring osd back after reboot to osd: journal didn't preallocate
Updated by Sage Weil over 10 years ago
- Assignee set to Sage Weil
the problem is that ceph-disk creates the jouranl but does not allocat eit
Updated by Sage Weil over 10 years ago
- Status changed from Need More Info to Fix Under Review
Updated by Sage Weil over 10 years ago
- Status changed from Fix Under Review to Resolved