Bug #4757
closedceph-disk-prepare will not use all available space with >2TB hard drives
0%
Description
When sharing the journal with the OSD data, ceph-disk-prepare will not use all the available disk space with disks >2TB.
This has been reproduced on a 3TB HD with VirtualBox (VBoxManage createhd --filename 3TB.vdi --size 3000000 --format VDI --variant Standard).
# ceph-disk-prepare /dev/sde # sgdisk -p /dev/sde Disk /dev/sde: 6144000000 sectors, 2.9 TiB Logical sector size: 512 bytes Disk identifier (GUID): C9183EF7-05CA-461C-B47D-BF9257E69596 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 6143999966 Partitions will be aligned on 2048-sector boundaries Total free space is 1838544895 sectors (876.7 GiB) Number Start (sector) End (sector) Size Code Name 1 1849032704 6143999966 2.0 TiB FFFF ceph data 2 1838544896 1849032670 5.0 GiB FFFF ceph journal
It seems that the journal creation is the issue. The end sector should be 6143999966 instead of 1849032670.
Looking at ceph-disk code, this is immitable with:# sgdisk --new=2:-1024M:0 /dev/sde
Making the journal starts at the beginning of the disk works:
diff --git a/src/ceph-disk b/src/ceph-disk index 28cba37..4abf9c4 100755 --- a/src/ceph-disk +++ b/src/ceph-disk @@ -629,7 +629,7 @@ def prepare_journal_dev( # journal at end of free space so partitioning tools don't # reorder them suddenly num = 2 - journal_part = '{num}:-{size}M:0'.format( + journal_part = '{num}:0:{size}M'.format( num=num, size=journal_size, )
But there is a warning above this code to not do that and my knowledge here is very limited.
When fixed, could this also go to bobtail-dc?
Updated by Sage Weil about 11 years ago
hrm, that comment came from tv, so who knows what he was seeing. can you do some testing with the change and see if you see anything strange with partitions reordering?
Updated by Alexandre Marangone about 11 years ago
I ran ceph-disk-prepare with the patch for a disk of 3TB and a disk of 10GB. Multiple times, with and without --zap-disk.
Didn't see any partition re-ordering. My best guess is that the warning is when other partitioning tools are being used.
I tried to print the partition table with parted and it seems to be consistent with sgdisk.
Updated by Alexandre Marangone about 11 years ago
- Status changed from New to Resolved
Updated by Dan Mick about 11 years ago
Really confused by that state; journal should have been partition 2 at the end of the drive, so more is wrong than just its size, and those numbers aren't obvious 32-bit flubs. I'm worried that we don't understand what was happening here well enough to know that we've fixed it.
Updated by Greg Farnum about 11 years ago
We should also get more details about what the original problem was before just assuming it's fixed. I bet Mark has TV's email address.
Updated by Dan Mick about 11 years ago
My suspicion is that Tv ran across this bug, and some version of gdisk wanted to reorder
partitions based on starting block number (partition management programs are always doing that kind of stuff).
But I now fully understand and agree the bug was a 32-bit rounding problem; notably, in the
example above, if one adds 0x1 0000 0000 to the starting block number, one gets
6144000000