Bug #9408
erasure-code: misalignment
80%
Description
fdeeeb1b6be7a95f473ad33d3344a499a61106a1
[20:56:48] sage: I'm not even sure where the buffers for erasure-code/jerasure are allocated [21:05:44] ErasureCodeJerasure seems to handle alignment [21:52:50] I don't get why it's not aligned. blocksize = 1536; chunk.substr_of(out, i * blocksize, blocksize); cerr << i << ": " << static_cast<const void*>(chunk.c_str()) << '\n'; produces '2: 0x10e3008' [21:58:44] the buffer combining in ErasureCode::encode_prepare might be responsible
Related issues
Associated revisions
erasure-code: add ErasureCode::encode unit test
Re-create and describe the situation that is fixed by
91a7e18f60bbc9acab3045baaa1b6505474ec4a9 which reworks the buffer
preparation function provided by ErasureCode::encode.
http://tracker.ceph.com/issues/9408 Refs: #9408
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
History
#1 Updated by Loïc Dachary about 9 years ago
Local hack that fixes the problem:
diff --git a/src/common/buffer.cc b/src/common/buffer.cc index 7617c2f..35fc300 100644 --- a/src/common/buffer.cc +++ b/src/common/buffer.cc @@ -29,6 +29,7 @@ #include <sstream> #include <sys/uio.h> #include <limits.h> +#include <stdlib.h> namespace ceph { @@ -186,7 +187,8 @@ static simple_spinlock_t buffer_debug_lock = SIMPLE_SPINLOCK_INITIALIZER; public: raw_malloc(unsigned l) : raw(l) { if (len) { - data = (char *)malloc(len); + //data = (char *)malloc(len); + posix_memalign((void **) &data, 32, len); if (!data) throw bad_alloc(); } else { @@ -1084,7 +1086,7 @@ static simple_spinlock_t buffer_debug_lock = SIMPLE_SPINLOCK_INITIALIZER; if ((_len & ~CEPH_PAGE_MASK) == 0) nb = buffer::create_page_aligned(_len); else - nb = buffer::create(_len); + nb = buffer::create_malloc(_len); rebuild(nb); }
#2 Updated by Loïc Dachary about 9 years ago
- Priority changed from Normal to Urgent
#3 Updated by jianpeng ma about 9 years ago
Loic, although i'm not sure erasure whether need align memory.
int ErasureCodeJerasure::encode_chunks(const set<int> &want_to_encode,
map<int, bufferlist> *encoded)
{
char *chunks[k + m];
for (int i = 0; i < k + m; i++)
chunks[i] = (*encoded)[i].c_str();
Suppose it need, i think we only make c_str() to return align memory. Is is right?
Loic, i think we should modify the c_str(), Like c_str(bool align)
if (align)
posix_memalign(&data, CEPH_PGAG_SIZE, len)
else
new char[len]
How about this?
#4 Updated by Janne Grunau about 9 years ago
./src/ceph_erasure_code_benchmark --plugin jerasure --workload encode --iterations 1048576 --size 4096 --erasures 0 --parameter k=3 --parameter m=2 --parameter directory=src/.libs --parameter packetsize=256 --parameter w=8 --parameter jerasure-per-chunk-alignment=true --parameter technique=reed_sol_van
reproduces it reliable for me. Adding padding in ErasureCode::encode_prepare() is required to hit the unaligned buffer. The substr_of for the padded buffer needs to create a new buffer which combines the rest of the input and the padding. That code uses a ordinary buffer::create().
The alignment of the resulting buffer is the alignment of new char[]. I haven't checked whether tcmalloc guarantees 16 byte alignment, I hit the unaligned case with tcmalloc disabled.
#5 Updated by jianpeng ma about 9 years ago
Can you tell met the result for this situation? I run with your command but it looks good.
#6 Updated by Janne Grunau about 9 years ago
jianpeng ma wrote:
Can you tell met the result for this situation? I run with your command but it looks good.
It does a buffer realloction for the padded input for the substr_of() in ErasureCode::encode(). The alignment of that buffer depends only on the behaviour of new char[].
adding an align parameter to c_str() is a bad idea. Ideally the data would be aligned from the start. Ensuring proper alignment after the initial allocation is costly since it'll require a memcpy for no good reason.
For the simplest case (k=2,m=1) the buffer management overhead is already larger than the computation for 4k blocks.
#7 Updated by Loïc Dachary about 9 years ago
With the following applied on dcc608d5d3f701315eaf0edee6f0a4796a4d97e1
diff --git a/ceph-object-corpus b/ceph-object-corpus --- a/ceph-object-corpus +++ b/ceph-object-corpus @@ -1 +1 @@ -Subproject commit bb3cee6b85b93210af5fb2c65a33f3000e341a11 +Subproject commit bb3cee6b85b93210af5fb2c65a33f3000e341a11-dirty diff --git a/src/erasure-code/ErasureCode.cc b/src/erasure-code/ErasureCode.cc index 5953f49..5b4f811 100644 --- a/src/erasure-code/ErasureCode.cc +++ b/src/erasure-code/ErasureCode.cc @@ -88,6 +88,7 @@ int ErasureCode::encode(const set<int> &want_to_encode, int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i; bufferlist &chunk = (*encoded)[chunk_index]; chunk.substr_of(out, i * blocksize, blocksize); + cerr << __func__ << ": " << i << " " << (void*)chunk.c_str() << endl; } encode_chunks(want_to_encode, encoded); for (unsigned int i = 0; i < k + m; i++) { diff --git a/src/erasure-code/jerasure/ErasureCodeJerasure.cc b/src/erasure-code/jerasure/ErasureCodeJerasure.cc index 4bd6aeb..a90ed3e 100644 --- a/src/erasure-code/jerasure/ErasureCodeJerasure.cc +++ b/src/erasure-code/jerasure/ErasureCodeJerasure.cc @@ -103,7 +103,9 @@ unsigned int ErasureCodeJerasure::get_chunk_size(unsigned int object_size) const unsigned tail = object_size % alignment; unsigned padded_length = object_size + ( tail ? ( alignment - tail ) : 0 ); assert(padded_length % k == 0); - return padded_length / k; + unsigned int chunk_size = padded_length / k; + assert(chunk_size % LARGEST_VECTOR_WORDSIZE == 0); + return chunk_size; } }
running
./ceph_erasure_code_benchmark --debug-osd=20 --plugin jerasure --workload encode --iterations 1 --size 4096 --erasures 0 --parameter k=3 --parameter m=2 --parameter directory=.libs --parameter packetsize=256 --parameter w=8 --parameter technique=reed_sol_van
produces
load: jerasure 2014-09-15 13:10:32.439651 7f7a23631800 10 ErasureCodePluginSelectJerasure: load: jerasure_sse4 2014-09-15 13:10:32.439708 7f7a23631800 10 ErasureCodePluginSelectJerasure: sse4 plugin 2014-09-15 13:10:32.439721 7f7a23631800 10 ErasureCodeJerasure: technique=reed_sol_van encode: 0 0x4bdc000 encode: 1 0x4bdc560 encode: 2 0x4be0000 encode: 3 0x4bdd020 encode: 4 0x4bdd580 0.000177 4
#8 Updated by jianpeng ma about 9 years ago
Hi Loic, I think Janne Grunau is right. For memory align, it depend on the bufferlist::c_str.
Using this patch:
diff --git a/src/erasure-code/isa/ErasureCodeIsa.cc b/src/erasure-code/isa/ErasureCodeIsa.cc
index 962ab81..502c8f3 100644
--- a/src/erasure-code/isa/ErasureCodeIsa.cc
+++ b/src/erasure-code/isa/ErasureCodeIsa.cc@ -96,8 +96,12
@ int ErasureCodeIsa::encode_chunks(const set<int> &want_to_encode,
map<int, bufferlist> *encoded)
{
char *chunks[k + m];
- for (int i = 0; i < k + m; i++)
+ dout(10) << "chunk size " << get_chunk_size(4096) << dendl;
+ for (int i = 0; i < k + m; i++) {
+ std::list<bufferptr> buffers = (*encoded)[i].buffers();
+ dout(10) << "i " << i << " has ptr " << buffers.size() << " lenght " << (*encoded)[i].length() << dendl;
chunks[i] = (*encoded)[i].c_str();
+ }
isa_encode(&chunks0, &chunks[k], (*encoded)[0].length());
return 0;
}
./ceph_erasure_code_benchmark -P k=3 -P m=2 --plugin=isa -P technique=reed_sol_van -s 4096 -i 1 --log-to-strerr --debug-osd 20
0.000094 4
2014-09-15 20:49:42.342478 7f8c64e7b7c0 10 ErasureCodeIsa: technique=default
2014-09-15 20:49:42.342512 7f8c64e7b7c0 10 ErasureCodeIsa: [ cache tables ] creating coeff for k=3 m=2
2014-09-15 20:49:42.342523 7f8c64e7b7c0 10 ErasureCodeIsa: [ cache tables ] creating tables for k=3 m=2
2014-09-15 20:49:42.342528 7f8c64e7b7c0 10 ErasureCodeIsa: [ cache memory ] = 1207680 bytes [ matrix ] = Vandermonde
2014-09-15 20:49:42.342696 7f8c64e7b7c0 10 ErasureCodeIsa: chunk size 1376
2014-09-15 20:49:42.342710 7f8c64e7b7c0 10 ErasureCodeIsa: i 0 has ptr 1lenght 1376
2014-09-15 20:49:42.342715 7f8c64e7b7c0 10 ErasureCodeIsa: i 1 has ptr 1lenght 1376
2014-09-15 20:49:42.342717 7f8c64e7b7c0 10 ErasureCodeIsa: i 2 has ptr 2lenght 1376
2014-09-15 20:49:42.342722 7f8c64e7b7c0 10 ErasureCodeIsa: i 3 has ptr 1lenght 1376
2014-09-15 20:49:42.342729 7f8c64e7b7c0 10 ErasureCodeIsa: i 4 has ptr 1lenght 1376
i 2 has ptr 2lenght 1376. So for c_str(), it use buffer::create(_len) because length(1376) % 4096 != 0.
buffer::raw* buffer::create(unsigned len) {
return new raw_char(len);
}
class buffer::raw_char : public buffer::raw {
public:
raw_char(unsigned l) : raw(l) {
if (len)
data = new char[len];
else
data = 0;
It class buffer::raw_char : public buffer::raw {
public:
raw_char(unsigned l) : raw(l) {
if (len)
data = new char[len];
else
data = 0;
So the c_str finially depend on new char[len] as Janne Grunau said.
I don't know the implementation of 'new char[len]' depend on compile or libc or other.
#9 Updated by Loïc Dachary about 9 years ago
- Status changed from New to In Progress
Now I see it, thanks for your patience.
#10 Updated by Loïc Dachary about 9 years ago
- Status changed from In Progress to 7
Running under the branch wip-9408-buffer-alignment in http://ceph.com/gitbuilder.cgi
#11 Updated by Loïc Dachary about 9 years ago
- Status changed from 7 to Fix Under Review
- % Done changed from 0 to 80
Corresponding pull request https://github.com/ceph/ceph/pull/2558
#12 Updated by Loïc Dachary about 9 years ago
gitbuilder is all green
#13 Updated by Loïc Dachary about 9 years ago
New pull request https://github.com/ceph/ceph/pull/2595
#14 Updated by Loïc Dachary about 9 years ago
Running under the branch wip-9408-buffer-alignment in http://ceph.com/gitbuilder.cgi
#15 Updated by Loïc Dachary almost 9 years ago
- Status changed from Fix Under Review to In Progress
#16 Updated by Loïc Dachary almost 9 years ago
- Status changed from In Progress to Fix Under Review
#17 Updated by Sage Weil almost 9 years ago
- Priority changed from Urgent to Immediate
#18 Updated by Samuel Just almost 9 years ago
- Status changed from Fix Under Review to 7
#19 Updated by Loïc Dachary almost 9 years ago
- Status changed from 7 to Pending Backport
- Priority changed from Immediate to Urgent
- Backport set to firefly
It can't be easily cherry picked because the code has changed. That can happen on firefly too. Backporting would make sense.
#20 Updated by Loïc Dachary over 8 years ago
- Description updated (diff)
#21 Updated by Loïc Dachary over 8 years ago
- Backport changed from firefly to firefly,giant
#22 Updated by Loïc Dachary over 8 years ago
- Backport changed from firefly,giant to firefly
giant is end of life
#23 Updated by Loïc Dachary over 8 years ago
- Status changed from Pending Backport to Resolved
- Regression set to No