Project

General

Profile

Bug #14842

ceph-disk: /sys/block/<device>/queue/physical_block_size is not obeyed

Added by Loic Dachary almost 5 years ago. Updated over 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

ceph-disk is creating data partition from sector 1 ignoring what sgdisk is recommending (256 in my disk). Basically, it should be aligned with physical sector size (reported in /sys/block/<device>/queue/physical_block_size). In my case it is 16K physical and 4K logical...256 is perfectly fine as sgdisk/fdisk internally decides.
Disk performance will be severely impacted because of partitioning this way from ceph-disk

History

#1 Updated by Loic Dachary almost 5 years ago

IIRC there are cases where sgdisk is not able to create a partition aligned as recommended (but parted could). I don't have an actual example, just an imprecise rememberance.

#2 Updated by Sage Weil over 4 years ago

Can we just switch to parted? That would solve the partition length issue, too.

#3 Updated by Samuel Just over 4 years ago

  • Priority changed from Urgent to Normal

Loic: is this actively being worked on?

#4 Updated by Loic Dachary over 4 years ago

It has not made progress recently.

#5 Updated by Loic Dachary over 4 years ago

  • Assignee deleted (Loic Dachary)

#6 Updated by Josh Durgin over 3 years ago

Loic: is this still an issue?

#7 Updated by Loic Dachary over 3 years ago

Yes

#8 Updated by Hans Boot over 3 years ago

Please people, I do not understand this is classified as "minor".
On many physical disks, the alignment must be respected or otherwise performance will suffer greatly.
This situation means that one CANNOT use ceph-disk or "ceph-deploy disk prepare" if one seeks performance, which should be the majority of the users.

On top of that, having a partition that is not aligned also breaks "ceph-deploy disk activate" due to parsing issues.

So people with certain disks simply cannot use this tool. A bug that disqualifies an entire tool is worth at least "major" to me.

#9 Updated by Loic Dachary over 3 years ago

  • Severity changed from 3 - minor to 2 - major

#10 Updated by Hans Boot over 3 years ago

to continue on this, as most disks now have 4096 alignment and sgdisk uses 2048, a quick and dirty solution would be to add --set-alignment=4096 before the --largest-new=... via some minor adaptations in ceph-disk.
I personally solved my case like this, but I do not say this is the definitive solution. Just in case someone else stumbles on this and needs a quick solution.

#11 Updated by Kefu Chai over 3 years ago

  • Assignee set to Kefu Chai

#12 Updated by Kefu Chai over 3 years ago

  • Status changed from 12 to Need More Info

sgdisk always moves the start sector to the multiple of sector alignment. see https://sourceforge.net/p/gptfdisk/code/ci/master/tree/gpt.cc#l1862

a typical output of ceph-disk looks like:

# ceph-disk prepare --osd-uuid "$osd_uuid" \
     --fs-type xfs --cluster ceph -- \
     /dev/sdc3 /dev/sda
WARNING:ceph-disk:OSD will not be hot-swappable if ...
Information: Moved requested sector from 34 to 2048 in
order to align on 2048-sector boundaries.
The operation has completed successfully.
meta-data=/dev/sdc3              isize=2048   agcount=4, agsize=61083136 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=244332544, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=119303, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

in the above example, sgdisk moves the start sector to 2048 so it's the multiple of the sector alignment. could you post the output of

sgdisk --version

#13 Updated by Hans Boot over 3 years ago

Terribly sorry, but I am no longer involved with ceph at this moment, and I no longer have an operational ceph cluster at disposal. From a backup I got the following, but that is without access to the disks that were used as storage, so I do not know if this is useful.

$sgdisk --version
GPT fdisk (sgdisk) version 1.0.1

The disks I used at the time were relatively old but fairly standard disks: WD4000FYYZ

#14 Updated by Kefu Chai over 3 years ago

strange enough, i checked the commit of 846a9e30cda88f75369d175f2f549cad3ea15db2 of gptfdisk: https://sourceforge.net/p/gptfdisk/code/ci/846a9e30cda88f75369d175f2f549cad3ea15db2/tree/gpt.cc#l1790

it also aligns the start sector when creating a new partition.

Also available in: Atom PDF