Project

General

Profile

Actions

Bug #4757

closed

ceph-disk-prepare will not use all available space with >2TB hard drives

Added by Alexandre Marangone about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When sharing the journal with the OSD data, ceph-disk-prepare will not use all the available disk space with disks >2TB.
This has been reproduced on a 3TB HD with VirtualBox (VBoxManage createhd --filename 3TB.vdi --size 3000000 --format VDI --variant Standard).

# ceph-disk-prepare /dev/sde
# sgdisk -p /dev/sde
Disk /dev/sde: 6144000000 sectors, 2.9 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): C9183EF7-05CA-461C-B47D-BF9257E69596
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 6143999966
Partitions will be aligned on 2048-sector boundaries
Total free space is 1838544895 sectors (876.7 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1      1849032704      6143999966   2.0 TiB     FFFF  ceph data
   2      1838544896      1849032670   5.0 GiB     FFFF  ceph journal

It seems that the journal creation is the issue. The end sector should be 6143999966 instead of 1849032670.

Looking at ceph-disk code, this is immitable with:
# sgdisk --new=2:-1024M:0 /dev/sde

Making the journal starts at the beginning of the disk works:

diff --git a/src/ceph-disk b/src/ceph-disk
index 28cba37..4abf9c4 100755
--- a/src/ceph-disk
+++ b/src/ceph-disk
@@ -629,7 +629,7 @@ def prepare_journal_dev(
         # journal at end of free space so partitioning tools don't
         # reorder them suddenly
         num = 2
-        journal_part = '{num}:-{size}M:0'.format(
+        journal_part = '{num}:0:{size}M'.format(
             num=num,
             size=journal_size,
             )

But there is a warning above this code to not do that and my knowledge here is very limited.

When fixed, could this also go to bobtail-dc?

Actions #1

Updated by Sage Weil about 11 years ago

hrm, that comment came from tv, so who knows what he was seeing. can you do some testing with the change and see if you see anything strange with partitions reordering?

Actions #2

Updated by Alexandre Marangone about 11 years ago

I ran ceph-disk-prepare with the patch for a disk of 3TB and a disk of 10GB. Multiple times, with and without --zap-disk.
Didn't see any partition re-ordering. My best guess is that the warning is when other partitioning tools are being used.

I tried to print the partition table with parted and it seems to be consistent with sgdisk.

Actions #3

Updated by Alexandre Marangone about 11 years ago

  • Status changed from New to Resolved
Actions #4

Updated by Dan Mick about 11 years ago

Really confused by that state; journal should have been partition 2 at the end of the drive, so more is wrong than just its size, and those numbers aren't obvious 32-bit flubs. I'm worried that we don't understand what was happening here well enough to know that we've fixed it.

Actions #5

Updated by Greg Farnum about 11 years ago

We should also get more details about what the original problem was before just assuming it's fixed. I bet Mark has TV's email address.

Actions #6

Updated by Dan Mick about 11 years ago

My suspicion is that Tv ran across this bug, and some version of gdisk wanted to reorder
partitions based on starting block number (partition management programs are always doing that kind of stuff).

But I now fully understand and agree the bug was a 32-bit rounding problem; notably, in the
example above, if one adds 0x1 0000 0000 to the starting block number, one gets
6144000000

Actions

Also available in: Atom PDF