Bug #9592
librados: Not able to create Large Files with Librados
100%
Description
I find this issue while i was trying to run a 1GB Write Cosbench Workload using librados.(My 1MB write & read run was successful)
The Run hang permanently and nothing written to cluster, however I observed client is sending the data (n/w bandwidth usage was 5-10Gbits).
AFAIK, COSbench uses java binding for Librados. I debugged further with python binding.
After installing ceph-python , I executed following program:-
######################## import rados try: cluster = rados.Rados(conffile='/etc/ceph/ceph.conf') except TypeError as e: print 'Argument validation error: ', e raise e print "Created cluster handle." try: cluster.connect() except Exception as e: print "connection error: ", e raise e finally: print "Connected to the cluster." print "\n\nI/O Context and Object Operations" print "=================================" print "\nCreating a context for the 'data' pool" if not cluster.pool_exists('data'): raise RuntimeError('No data pool exists') ioctx = cluster.open_ioctx('data') print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'." ioctx.write("bm", "Bonjour tout le monde!") print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'" ioctx.set_xattr("bm", "lang", "fr_FR") print "\nWriting object 'bm1' with contents 'Bonjour tout le monde!' to pool 'data'." ioctx.write("bm1", "Bonjour tout le monde!"*10000) print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'" ioctx.set_xattr("bm1", "lang", "fr_FR") print "\nWriting object 'hw' with contents 'Hello World!' to pool 'data'." ioctx.write("hw", "Hello World!\n world is not enough"*1000000000) print "Writing XATTR 'lang' with value 'en_US' to object 'hw'" ioctx.set_xattr("hw", "lang", "en_US") ''' print "\nContents of object 'hw'\n------------------------" print ioctx.read("hw") print "\n\nGetting XATTR 'lang' from object 'hw'" print ioctx.get_xattr("hw", "lang") print "\nContents of object 'bm'\n------------------------" print ioctx.read("bm") print "Getting XATTR 'lang' from object 'bm'" print ioctx.get_xattr("bm", "lang") ''' print "Removing object 'bm'" ioctx.remove_object("bm") print "Removing object 'bm1'" ioctx.remove_object("bm1") print "\nRemoving object 'hw'" ioctx.remove_object("hw")
I got the following output:-
ems@rack6-ramp-1:~$ python ceph-client.py Created cluster handle. Connected to the cluster. I/O Context and Object Operations ================================= Creating a context for the 'data' pool Writing object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'. Writing XATTR 'lang' with value 'fr_FR' to object 'bm' Writing object 'bm1' with contents 'Bonjour tout le monde!' to pool 'data'. Writing XATTR 'lang' with value 'fr_FR' to object 'bm1' Writing object 'hw' with contents 'Hello World!' to pool 'data'. common/Throttle.cc: In function 'bool Throttle::get_or_fail(int64_t)' thread 7f768a7d0700 time 2014-09-25 18:37:08.003401 common/Throttle.cc: 194: FAILED assert(c >= 0) ceph version 0.84 (3215c520e1306f50d0094b5646636c02456c9df4) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f768d73d71b] 2: (Throttle::get_or_fail(long)+0x5ae) [0x7f768d733d0e] 3: (Objecter::_throttle_op(Objecter::Op*, int)+0x68) [0x7f768d6ae588] 4: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x195) [0x7f768d6bf785] 5: (Objecter::op_submit(Objecter::Op*)+0x96) [0x7f768d6bf8c6] 6: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x33d) [0x7f768d698a0d] 7: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7f768d69ea2d] 8: (rados_write()+0xc1) [0x7f768d66cee1] 9: (ffi_call_unix64()+0x4c) [0x7f76903ebadc] 10: (ffi_call()+0x1fc) [0x7f76903eb40c] 11: (_ctypes_callproc()+0x48e) [0x7f76906025fe] 12: (()+0x15f9e) [0x7f7690603f9e] 13: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6] 14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32] 15: (PyEval_EvalFrameEx()+0xc82) [0x52cf32] 16: python() [0x56d0aa] 17: python() [0x4d9854] 18: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b] 19: python() [0x5872b2] 20: (()+0x8182) [0x7f769168c182] 21: (clone()+0x6d) [0x7f76913b930d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' Aborted (core dumped) #######################
Associated revisions
librados: cap the rados*{write,append} buffer length
When the caller submits a payload that will end up being rejected with
rados.Error: Ioctx.write(rbd): failed to write hw: errno EMSGSIZE
it is stored in a bufferlist whose length is an unsigned int. If the
value of the len parameter is greater than UINT_MAX/2, rados_write,
rados_write_full and rados_append will fail with E2BIG.
Multiple calls to rados_write or rados_append can fill objects larger
than UINT_MAX/2.
http://tracker.ceph.com/issues/9592 Fixes: #9592
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
librados: cap the rados_aio_*{write,append} buffer length
If the value of the len parameter is greater than UINT_MAX/2,
rados_aio_write, rados_aio_write_full and rados_aio_append will fail
with E2BIG.
For more information see 33501d242661a545211df43bf645398b492398ae
http://tracker.ceph.com/issues/9592 Fixes: #9592
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
librados: cap the IoCtxImpl::{aio_}*{write,append} buffer length
If the value of the len parameter is greater than UINT_MAX/2,
IoCtxImpl::aio_write, IoCtxImpl::aio_write_full, IoCtxImpl::aio_append,
IoCtxImpl::write, IoCtxImpl::append will fail with E2BIG.
IoCtxImpl::write_full is the exception because it does not have a
length argument to check.
For more information see 33501d242661a545211df43bf645398b492398ae
http://tracker.ceph.com/issues/9592 Fixes: #9592
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
History
#1 Updated by Loïc Dachary about 9 years ago
- Description updated (diff)
#2 Updated by Loïc Dachary about 9 years ago
- Subject changed from Librados: Not able to create Large Files with Librados to librados: Not able to create Large Files with Librados
- Status changed from New to Can't reproduce
- Assignee set to Loïc Dachary
- Priority changed from High to Normal
"Hello World!\n world is not enough"*1000000000
this is a ~34GB string, is this intended ? Running a smaller string
"H"*1000000000
I got
$ PYTHONPATH=pybind python /tmp/a.py Created cluster handle. Connected to the cluster. I/O Context and Object Operations ================================= Creating a context for the 'rbd' pool Writing object 'bm' with contents 'Bonjour tout le monde!' to pool 'rbd'. Writing XATTR 'lang' with value 'fr_FR' to object 'bm' Writing object 'bm1' with contents 'Bonjour tout le monde!' to pool 'rbd'. Writing XATTR 'lang' with value 'fr_FR' to object 'bm1' Writing object 'hw' with contents 'Hello World!' to pool 'rbd'. Traceback (most recent call last): File "/tmp/a.py", line 31, in <module> ioctx.write("hw", "H"*1000000000) File "/home/loic/software/ceph/ceph/src/pybind/rados.py", line 1223, in write (self.name, key)) rados.Error: Ioctx.write(rbd): failed to write hw: errno EMSGSIZE
using
$ ceph --version *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** ceph version 0.85-920-gba02a5e (ba02a5e638ddae3d58e1f5d8bde741da9a6e342d)
with the following
import rados try: cluster = rados.Rados(conffile='ceph.conf') except TypeError as e: print 'Argument validation error: ', e raise e print "Created cluster handle." try: cluster.connect() except Exception as e: print "connection error: ", e raise e finally: print "Connected to the cluster." print "\n\nI/O Context and Object Operations" print "=================================" print "\nCreating a context for the 'rbd' pool" if not cluster.pool_exists('rbd'): raise RuntimeError('No rbd pool exists') ioctx = cluster.open_ioctx('rbd') print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'rbd'." ioctx.write("bm", "Bonjour tout le monde!") print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'" ioctx.set_xattr("bm", "lang", "fr_FR") print "\nWriting object 'bm1' with contents 'Bonjour tout le monde!' to pool 'rbd'." ioctx.write("bm1", "Bonjour tout le monde!"*10000) print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'" ioctx.set_xattr("bm1", "lang", "fr_FR") print "\nWriting object 'hw' with contents 'Hello World!' to pool 'rbd'." ioctx.write("hw", "H"*1000000000) print "Writing XATTR 'lang' with value 'en_US' to object 'hw'" ioctx.set_xattr("hw", "lang", "en_US") ''' print "\nContents of object 'hw'\n------------------------" print ioctx.read("hw") print "\n\nGetting XATTR 'lang' from object 'hw'" print ioctx.get_xattr("hw", "lang") print "\nContents of object 'bm'\n------------------------" print ioctx.read("bm") print "Getting XATTR 'lang' from object 'bm'" print ioctx.get_xattr("bm", "lang") ''' print "Removing object 'bm'" ioctx.remove_object("bm") print "Removing object 'bm1'" ioctx.remove_object("bm1") print "\nRemoving object 'hw'" ioctx.remove_object("hw")
#3 Updated by Greg Farnum about 9 years ago
If I were to guess, something in the stack is converting the size value down to an int32 and then back up to int64, so it looks negative. That's not great, but you also shouldn't be sending operations that large to the OSDs (in fact, by default they'll reject the write) or creating any objects of that size.
#4 Updated by pushpesh sharma about 9 years ago
- File _usr_bin_python2.7.1000.crash added
A modified script to debug this issue:-
####################################
import rados
import sys
try:
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
except TypeError as e:
print 'Argument validation error: ', e
raise e
print "Created cluster handle."
try:
cluster.connect()
except Exception as e:
print "connection error: ", e
raise e
finally:
print "Connected to the cluster."
print "\n\nI/O Context and Object Operations"
print "================================="
print "\nCreating a context for the 'data' pool"
if not cluster.pool_exists('data'):
raise RuntimeError('No data pool exists')
ioctx = cluster.open_ioctx('data')
s9mb='a'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s9mb)/1024/1024;
try:
print "Writing object s1 to pool 'data'."
ioctx.write("s9mb", s9mb)
ioctx.set_xattr("s9mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s9mb")
except Exception as e :
print 'failed'+str(e)
s90mb='ababababa'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s90mb)/1024/1024;
try:
print "Writing object s90mb to pool 'data'."
ioctx.write("s90mb", s90mb)
ioctx.set_xattr("s90mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s90mb")
except Exception as e :
print 'failed'+str(e)
s100mb='abababababa'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s100mb)/1024/1024;
try:
print "Writing object s100mb to pool 'data'."
ioctx.write("s100mb", s100mb)
ioctx.set_xattr("s100mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s100mb")
except Exception as e :
print 'failed'+str(e)
s1G='abababababa'*100000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s1G)/1024/1024;
try:
print "Writing object 1G to pool 'data'."
ioctx.write("s1G", s1G)
ioctx.set_xattr("s1G", "lang", "en_US")
print 'done'
ioctx.remove_object("s1G")
except Exception as e :
print 'failed'+str(e)
s10G='ababaabababa'*1000000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s10G)/1024/1024;
try:
print "Writing object s10G to pool 'data'."
ioctx.write("s10G", s10G)
ioctx.set_xattr("s10G", "lang", "en_US")
print 'done'
ioctx.remove_object("s10G")
except Exception as e :
print 'failed'+str(e)
s20G='ababaabababaababababababab'*1000000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s20G)/1024/1024;
try:
print "Writing object s20G to pool 'data'."
ioctx.write("s20G", s20G)
ioctx.set_xattr("s20G", "lang", "en_US")
print 'done'
ioctx.remove_object("s20G")
except Exception as e :
print 'failed'+str(e)
########################
I got following output:
#######################
ems@rack6-client-2:~$ python ceph-client.py
Created cluster handle.
Connected to the cluster.
I/O Context and Object Operations
=================================
Creating a context for the 'data' pool
size in mb of str is:
9
Writing object s1 to pool 'data'.
done
size in mb of str is:
85
Writing object s90mb to pool 'data'.
done
size in mb of str is:
104
Writing object s100mb to pool 'data'.
failedIoctx.write(data): failed to write s100mb: errno EMSGSIZE
size in mb of str is:
1049
Writing object 1G to pool 'data'.
failedIoctx.write(data): failed to write s1G: errno EMSGSIZE
size in mb of str is:
11444
Writing object s10G to pool 'data'.
common/Throttle.cc: In function 'bool Throttle::get_or_fail(int64_t)' thread 7fb047788700 time 2014-09-26 17:44:49.495822
common/Throttle.cc: 194: FAILED assert(c >= 0)
ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fb04a6f571b]
2: (Throttle::get_or_fail(long)+0x5ae) [0x7fb04a6ebd0e]
3: (Objecter::_throttle_op(Objecter::Op*, int)+0x68) [0x7fb04a666588]
4: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x195) [0x7fb04a677785]
5: (Objecter::op_submit(Objecter::Op*)+0x96) [0x7fb04a6778c6]
6: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x33d) [0x7fb04a650a0d]
7: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7fb04a656a2d]
8: (rados_write()+0xc1) [0x7fb04a624ee1]
9: (ffi_call_unix64()+0x4c) [0x7fb04d3a3adc]
10: (ffi_call()+0x1fc) [0x7fb04d3a340c]
11: (_ctypes_callproc()+0x48e) [0x7fb04d5ba5fe]
12: (()+0x15f9e) [0x7fb04d5bbf9e]
13: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
15: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
16: python() [0x56d0aa]
17: python() [0x4d9854]
18: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
19: python() [0x5872b2]
20: (()+0x8182) [0x7fb04e644182]
21: (clone()+0x6d) [0x7fb04e37130d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)
######################
Core dumps are attached.
#5 Updated by Pavan Rallabhandi about 9 years ago
int Objecter::calc_op_budget(Op *op) { int op_budget = 0; for (vector<OSDOp>::iterator i = op->ops.begin(); i != op->ops.end(); ++i) { if (i->op.op & CEPH_OSD_OP_MODE_WR) { op_budget += i->indata.length(); } else if (ceph_osd_op_mode_read(i->op.op)) { if (ceph_osd_op_type_data(i->op.op)) { if ((int64_t)i->op.extent.length > 0) op_budget += (int64_t)i->op.extent.length; <----- could be the culprit? } else if (ceph_osd_op_type_attr(i->op.op)) { op_budget += i->op.xattr.name_len + i->op.xattr.value_len; } } } return op_budget; }
#6 Updated by Loïc Dachary about 9 years ago
- Status changed from Can't reproduce to In Progress
#7 Updated by Loïc Dachary about 9 years ago
The minimal script
import sys import rados cluster = rados.Rados(conffile=sys.argv[1]) cluster.connect() ioctx = cluster.open_ioctx('rbd') ioctx.write("hw", "A"*int(sys.argv[2]))
with debug objecter 20 shows the problem with
PYTHONPATH=pybind LD_LIBRARY_PATH=.libs python /tmp/a.py ceph.conf $((10 * 1024 * 1024 * 1024))" 2014-09-26 19:08:46.304819 7f0dd7fff700 10 client.?.objecter ms_handle_connect 0x7f0dd80108c0 2014-09-26 19:08:46.304839 7f0dd7fff700 10 client.?.objecter resend_mon_ops 2014-09-26 19:08:46.305207 7f0dded96700 10 client.4110.objecter _maybe_request_map subscribing (onetime) to next osd map 2014-09-26 19:08:46.305410 7f0dd7fff700 10 client.4110.objecter ms_dispatch 0x7f0dd8000bf0 osd_map(11..11 src has 1..11) v3 2014-09-26 19:08:46.305420 7f0dd7fff700 3 client.4110.objecter handle_osd_map got epochs [11,11] > 0 2014-09-26 19:08:46.305425 7f0dd7fff700 3 client.4110.objecter handle_osd_map decoding full epoch 11 2014-09-26 19:08:46.305537 7f0dd7fff700 20 client.4110.objecter dump_active .. 0 homeless 2014-09-26 19:08:46.305564 7f0dd7fff700 10 client.4110.objecter ms_dispatch 0x7f0dd8000bf0 osd_map(11..11 src has 1..11) v3 2014-09-26 19:08:46.305567 7f0dd7fff700 3 client.4110.objecter handle_osd_map ignoring epochs [11,11] <= 11 2014-09-26 19:08:46.305569 7f0dd7fff700 20 client.4110.objecter dump_active .. 0 homeless 2014-09-26 19:08:51.305292 7f0ddcc52700 10 client.4110.objecter tick 2014-09-26 19:08:53.195734 7f0dded96700 -1 client.4110.objecter op_submit[write 0~2147483648] 2014-09-26 19:08:53.219751 7f0dded96700 -1 client.4110.objecter calc_op_budget[write 0~2147483648] 2014-09-26 19:08:53.219770 7f0dded96700 -1 client.4110.objecter calc_op_budget0x7f0b34000bb0 2014-09-26 19:08:53.219776 7f0dded96700 -1 client.4110.objecter calc_op_budget op_budget -2147483648 osdc/Objecter.cc: In function 'int Objecter::calc_op_budget(Objecter::Op*)' thread 7f0dded96700 time 2014-09-26 19:08:53.219782 osdc/Objecter.cc: 2421: FAILED assert(op_budget >= 0) ceph version 0.85-997-g2bf4526 (2bf4526fafde11b6cf95afab5089f85126d0fc65) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f0de079547b] 2: (Objecter::calc_op_budget(Objecter::Op*)+0x46b) [0x7f0de0702ecb] 3: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x10c) [0x7f0de07139ac] 4: (Objecter::op_submit(Objecter::Op*)+0x74) [0x7f0de0713b54] 5: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x344) [0x7f0de06ec9f4] 6: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7f0de06f28cd] 7: (rados_write()+0x59) [0x7f0de06c7fa9] 8: (ffi_call_unix64()+0x4c) [0x7f0de34e0adc] 9: (ffi_call()+0x1fc) [0x7f0de34e040c] 10: (_ctypes_callproc()+0x48e) [0x7f0de36f75fe] 11: (()+0x15f9e) [0x7f0de36f8f9e] 12: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6] 13: (PyEval_EvalFrameEx()+0xc82) [0x52cf32] 14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32] 15: python() [0x56d0aa] 16: python() [0x4d9854] 17: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b] 18: python() [0x5872b2] 19: (()+0x8182) [0x7f0de4b65182] 20: (clone()+0x6d) [0x7f0de489230d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' bash: line 1: 2590 Aborted (core dumped) PYTHONPATH=pybind LD_LIBRARY_PATH=.libs python /tmp/a.py ceph.conf 10737418240
#8 Updated by Loïc Dachary about 9 years ago
Nice catch Pavan Rallabhandi ;-) I had trouble reproducing the problem because I forgot the "LD_LIBRARY_PATH=.libs" and was using the librados from 0.80 instead of 0.84+ ...
#9 Updated by Loïc Dachary about 9 years ago
- Status changed from In Progress to Fix Under Review
- % Done changed from 0 to 80
https://github.com/ceph/ceph/pull/2584 should be enough. Unless there is a good reason to write an object with chunks larger than 1GB ? That is : if there is a need to create a 20GB object, is there a reason to create it with rados_write of buffers larger than 1GB ? Or is it reasonable to use smaller buffers (100MB at a time or less) ?
#10 Updated by Loïc Dachary about 9 years ago
Looking at librados.hpp
#11 Updated by Loïc Dachary almost 9 years ago
Extend the checks to librados.hpp and aio_* https://github.com/ceph/ceph/pull/2590
#12 Updated by Loïc Dachary almost 9 years ago
- Status changed from Fix Under Review to Resolved
- % Done changed from 80 to 100