Bug #9408: erasure-code: misalignment - Ceph - Ceph

Actions

Copy link

Bug #9408

closed

erasure-code: misalignment

Added by Loïc Dachary over 9 years ago. Updated almost 9 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Loïc Dachary

Category:

OSD

Target version:

% Done:

80%

Source:

other

Tags:

Backport:

firefly

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

fdeeeb1b6be7a95f473ad33d3344a499a61106a1

[20:56:48] sage: I'm not even sure where the buffers for erasure-code/jerasure are allocated
[21:05:44] ErasureCodeJerasure seems to handle alignment
[21:52:50] I don't get why it's not aligned. blocksize = 1536; chunk.substr_of(out, i * blocksize, blocksize); cerr << i << ": " << static_cast<const void*>(chunk.c_str()) << '\n'; produces '2: 0x10e3008'
[21:58:44] the buffer combining in ErasureCode::encode_prepare might be responsible

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Local hack that fixes the problem:

diff --git a/src/common/buffer.cc b/src/common/buffer.cc
index 7617c2f..35fc300 100644
--- a/src/common/buffer.cc
+++ b/src/common/buffer.cc
@@ -29,6 +29,7 @@
 #include <sstream>
 #include <sys/uio.h>
 #include <limits.h>
+#include <stdlib.h>

 namespace ceph {

@@ -186,7 +187,8 @@ static simple_spinlock_t buffer_debug_lock = SIMPLE_SPINLOCK_INITIALIZER;
   public:
     raw_malloc(unsigned l) : raw(l) {
       if (len) {
-       data = (char *)malloc(len);
+       //data = (char *)malloc(len);
+       posix_memalign((void **) &data, 32, len);
         if (!data)
           throw bad_alloc();
       } else {
@@ -1084,7 +1086,7 @@ static simple_spinlock_t buffer_debug_lock = SIMPLE_SPINLOCK_INITIALIZER;
     if ((_len & ~CEPH_PAGE_MASK) == 0)
       nb = buffer::create_page_aligned(_len);
     else
-      nb = buffer::create(_len);
+      nb = buffer::create_malloc(_len);
     rebuild(nb);
   }

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by jianpeng ma over 9 years ago

Loic, although i'm not sure erasure whether need align memory.

int ErasureCodeJerasure::encode_chunks(const set<int> &want_to_encode,
map<int, bufferlist> *encoded) {
char *chunks[k + m];
for (int i = 0; i < k + m; i++)
chunks[i] = (*encoded)[i].c_str();

Suppose it need, i think we only make c_str() to return align memory. Is is right?

Loic, i think we should modify the c_str(), Like c_str(bool align)
if (align)
posix_memalign(&data, CEPH_PGAG_SIZE, len)
else
new char[len]

How about this?

Actions

Copy link

Updated by Janne Grunau over 9 years ago

./src/ceph_erasure_code_benchmark --plugin jerasure --workload encode --iterations 1048576 --size 4096 --erasures 0 --parameter k=3 --parameter m=2 --parameter directory=src/.libs --parameter packetsize=256 --parameter w=8 --parameter jerasure-per-chunk-alignment=true --parameter technique=reed_sol_van

reproduces it reliable for me. Adding padding in ErasureCode::encode_prepare() is required to hit the unaligned buffer. The substr_of for the padded buffer needs to create a new buffer which combines the rest of the input and the padding. That code uses a ordinary buffer::create().

The alignment of the resulting buffer is the alignment of new char[]. I haven't checked whether tcmalloc guarantees 16 byte alignment, I hit the unaligned case with tcmalloc disabled.

Actions

Copy link

Updated by jianpeng ma over 9 years ago

Can you tell met the result for this situation? I run with your command but it looks good.

Actions

Copy link

Updated by Janne Grunau over 9 years ago

jianpeng ma wrote:

Can you tell met the result for this situation? I run with your command but it looks good.

It does a buffer realloction for the padded input for the substr_of() in ErasureCode::encode(). The alignment of that buffer depends only on the behaviour of new char[].

adding an align parameter to c_str() is a bad idea. Ideally the data would be aligned from the start. Ensuring proper alignment after the initial allocation is costly since it'll require a memcpy for no good reason.

For the simplest case (k=2,m=1) the buffer management overhead is already larger than the computation for 4k blocks.

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

With the following applied on dcc608d5d3f701315eaf0edee6f0a4796a4d97e1

diff --git a/ceph-object-corpus b/ceph-object-corpus
--- a/ceph-object-corpus
+++ b/ceph-object-corpus
@@ -1 +1 @@
-Subproject commit bb3cee6b85b93210af5fb2c65a33f3000e341a11
+Subproject commit bb3cee6b85b93210af5fb2c65a33f3000e341a11-dirty
diff --git a/src/erasure-code/ErasureCode.cc b/src/erasure-code/ErasureCode.cc
index 5953f49..5b4f811 100644
--- a/src/erasure-code/ErasureCode.cc
+++ b/src/erasure-code/ErasureCode.cc
@@ -88,6 +88,7 @@ int ErasureCode::encode(const set<int> &want_to_encode,
     int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i;
     bufferlist &chunk = (*encoded)[chunk_index];
     chunk.substr_of(out, i * blocksize, blocksize);
+    cerr << __func__ << ": " << i << " " << (void*)chunk.c_str() << endl;
   }
   encode_chunks(want_to_encode, encoded);
   for (unsigned int i = 0; i < k + m; i++) {
diff --git a/src/erasure-code/jerasure/ErasureCodeJerasure.cc b/src/erasure-code/jerasure/ErasureCodeJerasure.cc
index 4bd6aeb..a90ed3e 100644
--- a/src/erasure-code/jerasure/ErasureCodeJerasure.cc
+++ b/src/erasure-code/jerasure/ErasureCodeJerasure.cc
@@ -103,7 +103,9 @@ unsigned int ErasureCodeJerasure::get_chunk_size(unsigned int object_size) const
     unsigned tail = object_size % alignment;
     unsigned padded_length = object_size + ( tail ?  ( alignment - tail ) : 0 );
     assert(padded_length % k == 0);
-    return padded_length / k;
+    unsigned int chunk_size = padded_length / k;
+    assert(chunk_size % LARGEST_VECTOR_WORDSIZE == 0);
+    return chunk_size;
   }
 }

running

./ceph_erasure_code_benchmark --debug-osd=20 --plugin jerasure --workload encode --iterations 1 --size 4096 --erasures 0 --parameter k=3 --parameter m=2 --parameter directory=.libs --parameter packetsize=256 --parameter w=8 --parameter technique=reed_sol_van

produces

load: jerasure 2014-09-15 13:10:32.439651 7f7a23631800 10 ErasureCodePluginSelectJerasure: load: jerasure_sse4 
2014-09-15 13:10:32.439708 7f7a23631800 10 ErasureCodePluginSelectJerasure: sse4 plugin
2014-09-15 13:10:32.439721 7f7a23631800 10 ErasureCodeJerasure: technique=reed_sol_van
encode: 0 0x4bdc000
encode: 1 0x4bdc560
encode: 2 0x4be0000
encode: 3 0x4bdd020
encode: 4 0x4bdd580
0.000177    4

Actions

Copy link

Updated by jianpeng ma over 9 years ago

Hi Loic, I think Janne Grunau is right. For memory align, it depend on the bufferlist::c_str.

Using this patch:

diff --git a/src/erasure-code/isa/ErasureCodeIsa.cc b/src/erasure-code/isa/ErasureCodeIsa.cc
index 962ab81..502c8f3 100644
--- a/src/erasure-code/isa/ErasureCodeIsa.cc
+++ b/src/erasure-code/isa/ErasureCodeIsa.cc
@ -96,8 +96,12 @ int ErasureCodeIsa::encode_chunks(const set<int> &want_to_encode,
map<int, bufferlist> *encoded) {
char *chunks[k + m];
- for (int i = 0; i < k + m; i++)
+ dout(10) << "chunk size " << get_chunk_size(4096) << dendl;
+ for (int i = 0; i < k + m; i++) {
+ std::list<bufferptr> buffers = (*encoded)[i].buffers();
+ dout(10) << "i " << i << " has ptr " << buffers.size() << " lenght " << (*encoded)[i].length() << dendl;
chunks[i] = (*encoded)[i].c_str();
+ }
isa_encode(&chunks⁰, &chunks[k], (*encoded)[0].length());
return 0;
}

./ceph_erasure_code_benchmark  -P k=3 -P m=2 --plugin=isa -P technique=reed_sol_van -s 4096 -i 1 --log-to-strerr --debug-osd 20

0.000094 4

2014-09-15 20:49:42.342478 7f8c64e7b7c0 10 ErasureCodeIsa: technique=default
2014-09-15 20:49:42.342512 7f8c64e7b7c0 10 ErasureCodeIsa: [ cache tables ] creating coeff for k=3 m=2
2014-09-15 20:49:42.342523 7f8c64e7b7c0 10 ErasureCodeIsa: [ cache tables ] creating tables for k=3 m=2
2014-09-15 20:49:42.342528 7f8c64e7b7c0 10 ErasureCodeIsa: [ cache memory ] = 1207680 bytes [ matrix ] = Vandermonde
2014-09-15 20:49:42.342696 7f8c64e7b7c0 10 ErasureCodeIsa: chunk size 1376
2014-09-15 20:49:42.342710 7f8c64e7b7c0 10 ErasureCodeIsa: i 0 has ptr 1lenght 1376
2014-09-15 20:49:42.342715 7f8c64e7b7c0 10 ErasureCodeIsa: i 1 has ptr 1lenght 1376
2014-09-15 20:49:42.342717 7f8c64e7b7c0 10 ErasureCodeIsa: i 2 has ptr 2lenght 1376
2014-09-15 20:49:42.342722 7f8c64e7b7c0 10 ErasureCodeIsa: i 3 has ptr 1lenght 1376
2014-09-15 20:49:42.342729 7f8c64e7b7c0 10 ErasureCodeIsa: i 4 has ptr 1lenght 1376

i 2 has ptr 2lenght 1376. So for c_str(), it use buffer::create(_len) because length(1376) % 4096 != 0.

buffer::raw* buffer::create(unsigned len) {
return new raw_char(len);
}

class buffer::raw_char : public buffer::raw {
public:
raw_char(unsigned l) : raw(l) {
if (len)
data = new char[len];
else
data = 0;

It class buffer::raw_char : public buffer::raw {
public:
raw_char(unsigned l) : raw(l) {
if (len)
data = new char[len];
else
data = 0;

So the c_str finially depend on new char[len] as Janne Grunau said.
I don't know the implementation of 'new char[len]' depend on compile or libc or other.

Actions

Copy link