Project

General

Profile

Bug #9592

librados: Not able to create Large Files with Librados

Added by pushpesh sharma almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
09/25/2014
Due date:
% Done:

100%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

I find this issue while i was trying to run a 1GB Write Cosbench Workload using librados.(My 1MB write & read run was successful)
The Run hang permanently and nothing written to cluster, however I observed client is sending the data (n/w bandwidth usage was 5-10Gbits).
AFAIK, COSbench uses java binding for Librados. I debugged further with python binding.
After installing ceph-python , I executed following program:-


########################
import rados
try:
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
except TypeError as e:
print 'Argument validation error: ', e
raise e
print "Created cluster handle." 
try:
cluster.connect()
except Exception as e:
print "connection error: ", e
raise e
finally:
print "Connected to the cluster." 
print "\n\nI/O Context and Object Operations" 
print "=================================" 
print "\nCreating a context for the 'data' pool" 
if not cluster.pool_exists('data'):
raise RuntimeError('No data pool exists')
ioctx = cluster.open_ioctx('data')
print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'." 
ioctx.write("bm", "Bonjour tout le monde!")
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'" 
ioctx.set_xattr("bm", "lang", "fr_FR")
print "\nWriting object 'bm1' with contents 'Bonjour tout le monde!' to pool 'data'." 
ioctx.write("bm1", "Bonjour tout le monde!"*10000)
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'" 
ioctx.set_xattr("bm1", "lang", "fr_FR")
print "\nWriting object 'hw' with contents 'Hello World!' to pool 'data'." 
ioctx.write("hw", "Hello World!\n world is not enough"*1000000000)
print "Writing XATTR 'lang' with value 'en_US' to object 'hw'" 
ioctx.set_xattr("hw", "lang", "en_US")
'''
print "\nContents of object 'hw'\n------------------------" 
print ioctx.read("hw")
print "\n\nGetting XATTR 'lang' from object 'hw'" 
print ioctx.get_xattr("hw", "lang")
print "\nContents of object 'bm'\n------------------------" 
print ioctx.read("bm")
print "Getting XATTR 'lang' from object 'bm'" 
print ioctx.get_xattr("bm", "lang")
'''
print "Removing object 'bm'" 
ioctx.remove_object("bm")
print "Removing object 'bm1'" 
ioctx.remove_object("bm1")
print "\nRemoving object 'hw'" 
ioctx.remove_object("hw")

I got the following output:-
ems@rack6-ramp-1:~$ python ceph-client.py
Created cluster handle.
Connected to the cluster.
I/O Context and Object Operations
=================================
Creating a context for the 'data' pool
Writing object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'.
Writing XATTR 'lang' with value 'fr_FR' to object 'bm'
Writing object 'bm1' with contents 'Bonjour tout le monde!' to pool 'data'.
Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'
Writing object 'hw' with contents 'Hello World!' to pool 'data'.
common/Throttle.cc: In function 'bool Throttle::get_or_fail(int64_t)' thread 7f768a7d0700 time 2014-09-25 18:37:08.003401
common/Throttle.cc: 194: FAILED assert(c >= 0)
ceph version 0.84 (3215c520e1306f50d0094b5646636c02456c9df4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f768d73d71b]
2: (Throttle::get_or_fail(long)+0x5ae) [0x7f768d733d0e]
3: (Objecter::_throttle_op(Objecter::Op*, int)+0x68) [0x7f768d6ae588]
4: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x195) [0x7f768d6bf785]
5: (Objecter::op_submit(Objecter::Op*)+0x96) [0x7f768d6bf8c6]
6: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x33d) [0x7f768d698a0d]
7: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7f768d69ea2d]
8: (rados_write()+0xc1) [0x7f768d66cee1]
9: (ffi_call_unix64()+0x4c) [0x7f76903ebadc]
10: (ffi_call()+0x1fc) [0x7f76903eb40c]
11: (_ctypes_callproc()+0x48e) [0x7f76906025fe]
12: (()+0x15f9e) [0x7f7690603f9e]
13: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
15: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
16: python() [0x56d0aa]
17: python() [0x4d9854]
18: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
19: python() [0x5872b2]
20: (()+0x8182) [0x7f769168c182]
21: (clone()+0x6d) [0x7f76913b930d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)
#######################

_usr_bin_python2.7.1000.crash (42.9 MB) pushpesh sharma, 09/25/2014 11:59 PM

Associated revisions

Revision 33501d24 (diff)
Added by Loic Dachary almost 5 years ago

librados: cap the rados*{write,append} buffer length

When the caller submits a payload that will end up being rejected with

rados.Error: Ioctx.write(rbd): failed to write hw: errno EMSGSIZE

it is stored in a bufferlist whose length is an unsigned int. If the
value of the len parameter is greater than UINT_MAX/2, rados_write,
rados_write_full and rados_append will fail with E2BIG.

Multiple calls to rados_write or rados_append can fill objects larger
than UINT_MAX/2.

http://tracker.ceph.com/issues/9592 Fixes: #9592

Signed-off-by: Loic Dachary <>

Revision 32195f94 (diff)
Added by Loic Dachary almost 5 years ago

librados: cap the rados_aio_*{write,append} buffer length

If the value of the len parameter is greater than UINT_MAX/2,
rados_aio_write, rados_aio_write_full and rados_aio_append will fail
with E2BIG.

For more information see 33501d242661a545211df43bf645398b492398ae

http://tracker.ceph.com/issues/9592 Fixes: #9592

Signed-off-by: Loic Dachary <>

Revision 5d1d9dba (diff)
Added by Loic Dachary almost 5 years ago

librados: cap the IoCtxImpl::{aio_}*{write,append} buffer length

If the value of the len parameter is greater than UINT_MAX/2,
IoCtxImpl::aio_write, IoCtxImpl::aio_write_full, IoCtxImpl::aio_append,
IoCtxImpl::write, IoCtxImpl::append will fail with E2BIG.

IoCtxImpl::write_full is the exception because it does not have a
length argument to check.

For more information see 33501d242661a545211df43bf645398b492398ae

http://tracker.ceph.com/issues/9592 Fixes: #9592

Signed-off-by: Loic Dachary <>

History

#1 Updated by Loic Dachary almost 5 years ago

  • Description updated (diff)

#2 Updated by Loic Dachary almost 5 years ago

  • Subject changed from Librados: Not able to create Large Files with Librados to librados: Not able to create Large Files with Librados
  • Status changed from New to Can't reproduce
  • Assignee set to Loic Dachary
  • Priority changed from High to Normal
"Hello World!\n world is not enough"*1000000000

this is a ~34GB string, is this intended ? Running a smaller string
"H"*1000000000

I got
$ PYTHONPATH=pybind python /tmp/a.py
Created cluster handle.
Connected to the cluster.

I/O Context and Object Operations
=================================

Creating a context for the 'rbd' pool

Writing object 'bm' with contents 'Bonjour tout le monde!' to pool 'rbd'.
Writing XATTR 'lang' with value 'fr_FR' to object 'bm'

Writing object 'bm1' with contents 'Bonjour tout le monde!' to pool 'rbd'.
Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'

Writing object 'hw' with contents 'Hello World!' to pool 'rbd'.
Traceback (most recent call last):
  File "/tmp/a.py", line 31, in <module>
    ioctx.write("hw", "H"*1000000000)
  File "/home/loic/software/ceph/ceph/src/pybind/rados.py", line 1223, in write
    (self.name, key))
rados.Error: Ioctx.write(rbd): failed to write hw: errno EMSGSIZE

using
$ ceph --version
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
ceph version 0.85-920-gba02a5e (ba02a5e638ddae3d58e1f5d8bde741da9a6e342d)

with the following
import rados
try:
    cluster = rados.Rados(conffile='ceph.conf')
except TypeError as e:
    print 'Argument validation error: ', e
    raise e
print "Created cluster handle." 
try:
    cluster.connect()
except Exception as e:
    print "connection error: ", e
    raise e
finally:
    print "Connected to the cluster." 
    print "\n\nI/O Context and Object Operations" 
    print "=================================" 
    print "\nCreating a context for the 'rbd' pool" 

if not cluster.pool_exists('rbd'):
    raise RuntimeError('No rbd pool exists')
ioctx = cluster.open_ioctx('rbd')
print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'rbd'." 
ioctx.write("bm", "Bonjour tout le monde!")
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'" 
ioctx.set_xattr("bm", "lang", "fr_FR")
print "\nWriting object 'bm1' with contents 'Bonjour tout le monde!' to pool 'rbd'." 
ioctx.write("bm1", "Bonjour tout le monde!"*10000)
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'" 
ioctx.set_xattr("bm1", "lang", "fr_FR")
print "\nWriting object 'hw' with contents 'Hello World!' to pool 'rbd'." 
ioctx.write("hw", "H"*1000000000)
print "Writing XATTR 'lang' with value 'en_US' to object 'hw'" 
ioctx.set_xattr("hw", "lang", "en_US")
'''
print "\nContents of object 'hw'\n------------------------" 
print ioctx.read("hw")
print "\n\nGetting XATTR 'lang' from object 'hw'" 
print ioctx.get_xattr("hw", "lang")
print "\nContents of object 'bm'\n------------------------" 
print ioctx.read("bm")
print "Getting XATTR 'lang' from object 'bm'" 
print ioctx.get_xattr("bm", "lang")
'''
print "Removing object 'bm'" 
ioctx.remove_object("bm")
print "Removing object 'bm1'" 
ioctx.remove_object("bm1")
print "\nRemoving object 'hw'" 
ioctx.remove_object("hw")

#3 Updated by Greg Farnum almost 5 years ago

If I were to guess, something in the stack is converting the size value down to an int32 and then back up to int64, so it looks negative. That's not great, but you also shouldn't be sending operations that large to the OSDs (in fact, by default they'll reject the write) or creating any objects of that size.

#4 Updated by pushpesh sharma almost 5 years ago

A modified script to debug this issue:- ####################################
import rados
import sys
try:
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
except TypeError as e:
print 'Argument validation error: ', e
raise e
print "Created cluster handle."
try:
cluster.connect()
except Exception as e:
print "connection error: ", e
raise e
finally:
print "Connected to the cluster."
print "\n\nI/O Context and Object Operations"
print "================================="
print "\nCreating a context for the 'data' pool"
if not cluster.pool_exists('data'):
raise RuntimeError('No data pool exists')
ioctx = cluster.open_ioctx('data')
s9mb='a'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s9mb)/1024/1024;
try:
print "Writing object s1 to pool 'data'."
ioctx.write("s9mb", s9mb)
ioctx.set_xattr("s9mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s9mb")
except Exception as e :
print 'failed'+str(e)
s90mb='ababababa'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s90mb)/1024/1024;
try:
print "Writing object s90mb to pool 'data'."
ioctx.write("s90mb", s90mb)
ioctx.set_xattr("s90mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s90mb")
except Exception as e :
print 'failed'+str(e)
s100mb='abababababa'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s100mb)/1024/1024;
try:
print "Writing object s100mb to pool 'data'."
ioctx.write("s100mb", s100mb)
ioctx.set_xattr("s100mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s100mb")
except Exception as e :
print 'failed'+str(e)
s1G='abababababa'*100000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s1G)/1024/1024;
try:
print "Writing object 1G to pool 'data'."
ioctx.write("s1G", s1G)
ioctx.set_xattr("s1G", "lang", "en_US")
print 'done'
ioctx.remove_object("s1G")
except Exception as e :
print 'failed'+str(e)
s10G='ababaabababa'*1000000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s10G)/1024/1024;
try:
print "Writing object s10G to pool 'data'."
ioctx.write("s10G", s10G)
ioctx.set_xattr("s10G", "lang", "en_US")
print 'done'
ioctx.remove_object("s10G")
except Exception as e :
print 'failed'+str(e)
s20G='ababaabababaababababababab'*1000000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s20G)/1024/1024;
try:
print "Writing object s20G to pool 'data'."
ioctx.write("s20G", s20G)
ioctx.set_xattr("s20G", "lang", "en_US")
print 'done'
ioctx.remove_object("s20G")
except Exception as e :
print 'failed'+str(e) ########################
I got following output: #######################
ems@rack6-client-2:~$ python ceph-client.py
Created cluster handle.
Connected to the cluster.
I/O Context and Object Operations =================================
Creating a context for the 'data' pool
size in mb of str is:
9
Writing object s1 to pool 'data'.
done
size in mb of str is:
85
Writing object s90mb to pool 'data'.
done
size in mb of str is:
104
Writing object s100mb to pool 'data'.
failedIoctx.write(data): failed to write s100mb: errno EMSGSIZE
size in mb of str is:
1049
Writing object 1G to pool 'data'.
failedIoctx.write(data): failed to write s1G: errno EMSGSIZE
size in mb of str is:
11444
Writing object s10G to pool 'data'.
common/Throttle.cc: In function 'bool Throttle::get_or_fail(int64_t)' thread 7fb047788700 time 2014-09-26 17:44:49.495822
common/Throttle.cc: 194: FAILED assert(c >= 0)
ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fb04a6f571b]
2: (Throttle::get_or_fail(long)+0x5ae) [0x7fb04a6ebd0e]
3: (Objecter::_throttle_op(Objecter::Op*, int)+0x68) [0x7fb04a666588]
4: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x195) [0x7fb04a677785]
5: (Objecter::op_submit(Objecter::Op*)+0x96) [0x7fb04a6778c6]
6: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x33d) [0x7fb04a650a0d]
7: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7fb04a656a2d]
8: (rados_write()+0xc1) [0x7fb04a624ee1]
9: (ffi_call_unix64()+0x4c) [0x7fb04d3a3adc]
10: (ffi_call()+0x1fc) [0x7fb04d3a340c]
11: (_ctypes_callproc()+0x48e) [0x7fb04d5ba5fe]
12: (()+0x15f9e) [0x7fb04d5bbf9e]
13: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
15: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
16: python() [0x56d0aa]
17: python() [0x4d9854]
18: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
19: python() [0x5872b2]
20: (()+0x8182) [0x7fb04e644182]
21: (clone()+0x6d) [0x7fb04e37130d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped) ######################
Core dumps are attached.

#5 Updated by Pavan Rallabhandi almost 5 years ago


int Objecter::calc_op_budget(Op *op)
{
  int op_budget = 0;
  for (vector<OSDOp>::iterator i = op->ops.begin();
       i != op->ops.end();
       ++i) {
    if (i->op.op & CEPH_OSD_OP_MODE_WR) {
      op_budget += i->indata.length();
    } else if (ceph_osd_op_mode_read(i->op.op)) {
      if (ceph_osd_op_type_data(i->op.op)) {
        if ((int64_t)i->op.extent.length > 0)
          op_budget += (int64_t)i->op.extent.length;                <----- could be the culprit?
      } else if (ceph_osd_op_type_attr(i->op.op)) {
        op_budget += i->op.xattr.name_len + i->op.xattr.value_len;
      }
    }
  }
  return op_budget;
}

#6 Updated by Loic Dachary almost 5 years ago

  • Status changed from Can't reproduce to In Progress

#7 Updated by Loic Dachary almost 5 years ago

The minimal script

import sys
import rados
cluster = rados.Rados(conffile=sys.argv[1])
cluster.connect()
ioctx = cluster.open_ioctx('rbd')
ioctx.write("hw", "A"*int(sys.argv[2]))

with debug objecter 20 shows the problem with
PYTHONPATH=pybind LD_LIBRARY_PATH=.libs python /tmp/a.py ceph.conf $((10 * 1024 * 1024 * 1024))" 
2014-09-26 19:08:46.304819 7f0dd7fff700 10 client.?.objecter ms_handle_connect 0x7f0dd80108c0
2014-09-26 19:08:46.304839 7f0dd7fff700 10 client.?.objecter resend_mon_ops
2014-09-26 19:08:46.305207 7f0dded96700 10 client.4110.objecter _maybe_request_map subscribing (onetime) to next osd map
2014-09-26 19:08:46.305410 7f0dd7fff700 10 client.4110.objecter ms_dispatch 0x7f0dd8000bf0 osd_map(11..11 src has 1..11) v3
2014-09-26 19:08:46.305420 7f0dd7fff700  3 client.4110.objecter handle_osd_map got epochs [11,11] > 0
2014-09-26 19:08:46.305425 7f0dd7fff700  3 client.4110.objecter handle_osd_map decoding full epoch 11
2014-09-26 19:08:46.305537 7f0dd7fff700 20 client.4110.objecter dump_active .. 0 homeless
2014-09-26 19:08:46.305564 7f0dd7fff700 10 client.4110.objecter ms_dispatch 0x7f0dd8000bf0 osd_map(11..11 src has 1..11) v3
2014-09-26 19:08:46.305567 7f0dd7fff700  3 client.4110.objecter handle_osd_map ignoring epochs [11,11] <= 11
2014-09-26 19:08:46.305569 7f0dd7fff700 20 client.4110.objecter dump_active .. 0 homeless
2014-09-26 19:08:51.305292 7f0ddcc52700 10 client.4110.objecter tick
2014-09-26 19:08:53.195734 7f0dded96700 -1 client.4110.objecter op_submit[write 0~2147483648]
2014-09-26 19:08:53.219751 7f0dded96700 -1 client.4110.objecter calc_op_budget[write 0~2147483648]
2014-09-26 19:08:53.219770 7f0dded96700 -1 client.4110.objecter calc_op_budget0x7f0b34000bb0
2014-09-26 19:08:53.219776 7f0dded96700 -1 client.4110.objecter calc_op_budget op_budget -2147483648
osdc/Objecter.cc: In function 'int Objecter::calc_op_budget(Objecter::Op*)' thread 7f0dded96700 time 2014-09-26 19:08:53.219782
osdc/Objecter.cc: 2421: FAILED assert(op_budget >= 0)
 ceph version 0.85-997-g2bf4526 (2bf4526fafde11b6cf95afab5089f85126d0fc65)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f0de079547b]
 2: (Objecter::calc_op_budget(Objecter::Op*)+0x46b) [0x7f0de0702ecb]
 3: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x10c) [0x7f0de07139ac]
 4: (Objecter::op_submit(Objecter::Op*)+0x74) [0x7f0de0713b54]
 5: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x344) [0x7f0de06ec9f4]
 6: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7f0de06f28cd]
 7: (rados_write()+0x59) [0x7f0de06c7fa9]
 8: (ffi_call_unix64()+0x4c) [0x7f0de34e0adc]
 9: (ffi_call()+0x1fc) [0x7f0de34e040c]
 10: (_ctypes_callproc()+0x48e) [0x7f0de36f75fe]
 11: (()+0x15f9e) [0x7f0de36f8f9e]
 12: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
 13: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
 14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
 15: python() [0x56d0aa]
 16: python() [0x4d9854]
 17: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
 18: python() [0x5872b2]
 19: (()+0x8182) [0x7f0de4b65182]
 20: (clone()+0x6d) [0x7f0de489230d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
bash: line 1:  2590 Aborted                 (core dumped) PYTHONPATH=pybind LD_LIBRARY_PATH=.libs python /tmp/a.py ceph.conf 10737418240

#8 Updated by Loic Dachary almost 5 years ago

Nice catch Pavan Rallabhandi ;-) I had trouble reproducing the problem because I forgot the "LD_LIBRARY_PATH=.libs" and was using the librados from 0.80 instead of 0.84+ ...

#9 Updated by Loic Dachary almost 5 years ago

  • Status changed from In Progress to Need Review
  • % Done changed from 0 to 80

https://github.com/ceph/ceph/pull/2584 should be enough. Unless there is a good reason to write an object with chunks larger than 1GB ? That is : if there is a need to create a 20GB object, is there a reason to create it with rados_write of buffers larger than 1GB ? Or is it reasonable to use smaller buffers (100MB at a time or less) ?

#10 Updated by Loic Dachary almost 5 years ago

Looking at librados.hpp

#11 Updated by Loic Dachary almost 5 years ago

Extend the checks to librados.hpp and aio_* https://github.com/ceph/ceph/pull/2590

#12 Updated by Loic Dachary almost 5 years ago

  • Status changed from Need Review to Resolved
  • % Done changed from 80 to 100

Also available in: Atom PDF