Bug #8622
closederasure-code: rados command does not enforce alignement constraints
90%
Description
Original title for the record : "EC pool fails for certain (k,m) combinations for >4MB objs"
Steps to reproduce the error:
git clone https://github.com/ceph/ceph.git cd ceph ./do_autogen.sh make cd src OSD=5 ./vstart.sh -l -n -X ./ceph osd erasure-code-profile set ecprofile ruleset-failure-domain=osd k=3 m=2 plugin=jerasure ./ceph osd crush rule create-erasure ecruleset ecprofile ./ceph osd pool create ecpool 1 1 erasure ecprofile ecruleset
Once you have the development cluster working you can try:
dd if=/dev/urandom of=./test.dat bs=1MB count=5 ./rados -p ecpool put test ./test.dat
And you get the following error:
error putting ecpool/test: (95) Operation not supported
However, the following case works perfectly:
dd if=/dev/urandom of=./test.dat bs=1MB count=4 ./rados -p ecpool put test ./test.dat
If instead of (k=3, m=2, OSD=5) you try (k=2, m=2, OSD=4) it works for both 4MB and 5MB objects.
I observed this bug in an Ubuntu Precise machine and in an up-to-date Arch Linux machine.
This same error was first observed by Michael Nelson: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-March/028311.html
Updated by Lluis PJ almost 10 years ago
After some debugging it seems to me that the problem is that rados client reads ands sends data in chunks of size (1<<22) bytes, which is slightly less than 4.2 MB. However, this size might not be aligned to 'stripe_width' that changes with "k". For the first chunk that rados sends, the OSD zero padds the request to make it multiple of 'stripe_width'. The problem comes when the offset of the second chunk (which is always 1<<22) is not 'stripe_width' aligned. The OSD requires all writes to have an offset aligned with 'stripe_width'.
I see two possible solutions:
- Make the OSD read the last stripe of the first chunk, and replace the zero pad with the new data,
- or make rados client access the pool 'stripe_width' value and align all chunk sizes properly.
The second solution is IMHO the fast and easy one.
Any other solutions?
Updated by Lluis PJ almost 10 years ago
I created a pull request for the second option.
Updated by Loïc Dachary almost 10 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 80
Updated by Loïc Dachary almost 10 years ago
- Subject changed from EC pool fails for certain (k,m) combinations for >4MB objs to erasure-code: rados command does not enforce alignement constraints
- Description updated (diff)
- Category changed from OSD to 26
Updated by Lluis PJ almost 10 years ago
Loic,
I had some problems squashing the commits and I created a new pull request:
https://github.com/ceph/ceph/pull/1984/
Updated by Loïc Dachary almost 10 years ago
- Status changed from In Progress to Resolved
- % Done changed from 80 to 100
This is fine, great work :-)
Updated by Loïc Dachary almost 10 years ago
Would you have time to review / try this : https://github.com/ceph/ceph/pull/1987 ?
Updated by Ian Colle almost 10 years ago
- Status changed from Resolved to Pending Backport
- Backport set to Firefly
Loic - this needs to be backported to Firefly.
Updated by Loïc Dachary almost 10 years ago
- % Done changed from 100 to 90
Needs to be backported along with https://github.com/ceph/ceph/pull/2020 which fixes a bug introduced by the fix :-/
Updated by Sage Weil over 9 years ago
- Status changed from Pending Backport to Resolved
commit:7a58da53ebfcaaf385c21403b654d1d2f1508e1a
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
26) - Target version deleted (
0.82)