Project

General

Profile

Actions

Bug #9592

closed

librados: Not able to create Large Files with Librados

Added by pushpesh sharma over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I find this issue while i was trying to run a 1GB Write Cosbench Workload using librados.(My 1MB write & read run was successful)
The Run hang permanently and nothing written to cluster, however I observed client is sending the data (n/w bandwidth usage was 5-10Gbits).
AFAIK, COSbench uses java binding for Librados. I debugged further with python binding.
After installing ceph-python , I executed following program:-


########################
import rados
try:
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
except TypeError as e:
print 'Argument validation error: ', e
raise e
print "Created cluster handle." 
try:
cluster.connect()
except Exception as e:
print "connection error: ", e
raise e
finally:
print "Connected to the cluster." 
print "\n\nI/O Context and Object Operations" 
print "=================================" 
print "\nCreating a context for the 'data' pool" 
if not cluster.pool_exists('data'):
raise RuntimeError('No data pool exists')
ioctx = cluster.open_ioctx('data')
print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'." 
ioctx.write("bm", "Bonjour tout le monde!")
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'" 
ioctx.set_xattr("bm", "lang", "fr_FR")
print "\nWriting object 'bm1' with contents 'Bonjour tout le monde!' to pool 'data'." 
ioctx.write("bm1", "Bonjour tout le monde!"*10000)
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'" 
ioctx.set_xattr("bm1", "lang", "fr_FR")
print "\nWriting object 'hw' with contents 'Hello World!' to pool 'data'." 
ioctx.write("hw", "Hello World!\n world is not enough"*1000000000)
print "Writing XATTR 'lang' with value 'en_US' to object 'hw'" 
ioctx.set_xattr("hw", "lang", "en_US")
'''
print "\nContents of object 'hw'\n------------------------" 
print ioctx.read("hw")
print "\n\nGetting XATTR 'lang' from object 'hw'" 
print ioctx.get_xattr("hw", "lang")
print "\nContents of object 'bm'\n------------------------" 
print ioctx.read("bm")
print "Getting XATTR 'lang' from object 'bm'" 
print ioctx.get_xattr("bm", "lang")
'''
print "Removing object 'bm'" 
ioctx.remove_object("bm")
print "Removing object 'bm1'" 
ioctx.remove_object("bm1")
print "\nRemoving object 'hw'" 
ioctx.remove_object("hw")

I got the following output:-
ems@rack6-ramp-1:~$ python ceph-client.py
Created cluster handle.
Connected to the cluster.
I/O Context and Object Operations
=================================
Creating a context for the 'data' pool
Writing object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'.
Writing XATTR 'lang' with value 'fr_FR' to object 'bm'
Writing object 'bm1' with contents 'Bonjour tout le monde!' to pool 'data'.
Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'
Writing object 'hw' with contents 'Hello World!' to pool 'data'.
common/Throttle.cc: In function 'bool Throttle::get_or_fail(int64_t)' thread 7f768a7d0700 time 2014-09-25 18:37:08.003401
common/Throttle.cc: 194: FAILED assert(c >= 0)
ceph version 0.84 (3215c520e1306f50d0094b5646636c02456c9df4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f768d73d71b]
2: (Throttle::get_or_fail(long)+0x5ae) [0x7f768d733d0e]
3: (Objecter::_throttle_op(Objecter::Op*, int)+0x68) [0x7f768d6ae588]
4: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x195) [0x7f768d6bf785]
5: (Objecter::op_submit(Objecter::Op*)+0x96) [0x7f768d6bf8c6]
6: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x33d) [0x7f768d698a0d]
7: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7f768d69ea2d]
8: (rados_write()+0xc1) [0x7f768d66cee1]
9: (ffi_call_unix64()+0x4c) [0x7f76903ebadc]
10: (ffi_call()+0x1fc) [0x7f76903eb40c]
11: (_ctypes_callproc()+0x48e) [0x7f76906025fe]
12: (()+0x15f9e) [0x7f7690603f9e]
13: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
15: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
16: python() [0x56d0aa]
17: python() [0x4d9854]
18: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
19: python() [0x5872b2]
20: (()+0x8182) [0x7f769168c182]
21: (clone()+0x6d) [0x7f76913b930d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)
#######################


Files

_usr_bin_python2.7.1000.crash (42.9 MB) _usr_bin_python2.7.1000.crash pushpesh sharma, 09/25/2014 11:59 PM
Actions #1

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #2

Updated by Loïc Dachary over 9 years ago

  • Subject changed from Librados: Not able to create Large Files with Librados to librados: Not able to create Large Files with Librados
  • Status changed from New to Can't reproduce
  • Assignee set to Loïc Dachary
  • Priority changed from High to Normal
"Hello World!\n world is not enough"*1000000000

this is a ~34GB string, is this intended ? Running a smaller string
"H"*1000000000

I got
$ PYTHONPATH=pybind python /tmp/a.py
Created cluster handle.
Connected to the cluster.

I/O Context and Object Operations
=================================

Creating a context for the 'rbd' pool

Writing object 'bm' with contents 'Bonjour tout le monde!' to pool 'rbd'.
Writing XATTR 'lang' with value 'fr_FR' to object 'bm'

Writing object 'bm1' with contents 'Bonjour tout le monde!' to pool 'rbd'.
Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'

Writing object 'hw' with contents 'Hello World!' to pool 'rbd'.
Traceback (most recent call last):
  File "/tmp/a.py", line 31, in <module>
    ioctx.write("hw", "H"*1000000000)
  File "/home/loic/software/ceph/ceph/src/pybind/rados.py", line 1223, in write
    (self.name, key))
rados.Error: Ioctx.write(rbd): failed to write hw: errno EMSGSIZE

using
$ ceph --version
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
ceph version 0.85-920-gba02a5e (ba02a5e638ddae3d58e1f5d8bde741da9a6e342d)

with the following
import rados
try:
    cluster = rados.Rados(conffile='ceph.conf')
except TypeError as e:
    print 'Argument validation error: ', e
    raise e
print "Created cluster handle." 
try:
    cluster.connect()
except Exception as e:
    print "connection error: ", e
    raise e
finally:
    print "Connected to the cluster." 
    print "\n\nI/O Context and Object Operations" 
    print "=================================" 
    print "\nCreating a context for the 'rbd' pool" 

if not cluster.pool_exists('rbd'):
    raise RuntimeError('No rbd pool exists')
ioctx = cluster.open_ioctx('rbd')
print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'rbd'." 
ioctx.write("bm", "Bonjour tout le monde!")
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'" 
ioctx.set_xattr("bm", "lang", "fr_FR")
print "\nWriting object 'bm1' with contents 'Bonjour tout le monde!' to pool 'rbd'." 
ioctx.write("bm1", "Bonjour tout le monde!"*10000)
print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm1'" 
ioctx.set_xattr("bm1", "lang", "fr_FR")
print "\nWriting object 'hw' with contents 'Hello World!' to pool 'rbd'." 
ioctx.write("hw", "H"*1000000000)
print "Writing XATTR 'lang' with value 'en_US' to object 'hw'" 
ioctx.set_xattr("hw", "lang", "en_US")
'''
print "\nContents of object 'hw'\n------------------------" 
print ioctx.read("hw")
print "\n\nGetting XATTR 'lang' from object 'hw'" 
print ioctx.get_xattr("hw", "lang")
print "\nContents of object 'bm'\n------------------------" 
print ioctx.read("bm")
print "Getting XATTR 'lang' from object 'bm'" 
print ioctx.get_xattr("bm", "lang")
'''
print "Removing object 'bm'" 
ioctx.remove_object("bm")
print "Removing object 'bm1'" 
ioctx.remove_object("bm1")
print "\nRemoving object 'hw'" 
ioctx.remove_object("hw")
Actions #3

Updated by Greg Farnum over 9 years ago

If I were to guess, something in the stack is converting the size value down to an int32 and then back up to int64, so it looks negative. That's not great, but you also shouldn't be sending operations that large to the OSDs (in fact, by default they'll reject the write) or creating any objects of that size.

Actions #4

Updated by pushpesh sharma over 9 years ago

A modified script to debug this issue:- ####################################
import rados
import sys
try:
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
except TypeError as e:
print 'Argument validation error: ', e
raise e
print "Created cluster handle."
try:
cluster.connect()
except Exception as e:
print "connection error: ", e
raise e
finally:
print "Connected to the cluster."
print "\n\nI/O Context and Object Operations"
print "================================="
print "\nCreating a context for the 'data' pool"
if not cluster.pool_exists('data'):
raise RuntimeError('No data pool exists')
ioctx = cluster.open_ioctx('data')
s9mb='a'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s9mb)/1024/1024;
try:
print "Writing object s1 to pool 'data'."
ioctx.write("s9mb", s9mb)
ioctx.set_xattr("s9mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s9mb")
except Exception as e :
print 'failed'+str(e)
s90mb='ababababa'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s90mb)/1024/1024;
try:
print "Writing object s90mb to pool 'data'."
ioctx.write("s90mb", s90mb)
ioctx.set_xattr("s90mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s90mb")
except Exception as e :
print 'failed'+str(e)
s100mb='abababababa'*10000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s100mb)/1024/1024;
try:
print "Writing object s100mb to pool 'data'."
ioctx.write("s100mb", s100mb)
ioctx.set_xattr("s100mb", "lang", "en_US")
print 'done'
ioctx.remove_object("s100mb")
except Exception as e :
print 'failed'+str(e)
s1G='abababababa'*100000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s1G)/1024/1024;
try:
print "Writing object 1G to pool 'data'."
ioctx.write("s1G", s1G)
ioctx.set_xattr("s1G", "lang", "en_US")
print 'done'
ioctx.remove_object("s1G")
except Exception as e :
print 'failed'+str(e)
s10G='ababaabababa'*1000000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s10G)/1024/1024;
try:
print "Writing object s10G to pool 'data'."
ioctx.write("s10G", s10G)
ioctx.set_xattr("s10G", "lang", "en_US")
print 'done'
ioctx.remove_object("s10G")
except Exception as e :
print 'failed'+str(e)
s20G='ababaabababaababababababab'*1000000000
print '\n\nsize in mb of str is: '
print sys.getsizeof(s20G)/1024/1024;
try:
print "Writing object s20G to pool 'data'."
ioctx.write("s20G", s20G)
ioctx.set_xattr("s20G", "lang", "en_US")
print 'done'
ioctx.remove_object("s20G")
except Exception as e :
print 'failed'+str(e) ########################
I got following output: #######################
ems@rack6-client-2:~$ python ceph-client.py
Created cluster handle.
Connected to the cluster.
I/O Context and Object Operations =================================
Creating a context for the 'data' pool
size in mb of str is:
9
Writing object s1 to pool 'data'.
done
size in mb of str is:
85
Writing object s90mb to pool 'data'.
done
size in mb of str is:
104
Writing object s100mb to pool 'data'.
failedIoctx.write(data): failed to write s100mb: errno EMSGSIZE
size in mb of str is:
1049
Writing object 1G to pool 'data'.
failedIoctx.write(data): failed to write s1G: errno EMSGSIZE
size in mb of str is:
11444
Writing object s10G to pool 'data'.
common/Throttle.cc: In function 'bool Throttle::get_or_fail(int64_t)' thread 7fb047788700 time 2014-09-26 17:44:49.495822
common/Throttle.cc: 194: FAILED assert(c >= 0)
ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fb04a6f571b]
2: (Throttle::get_or_fail(long)+0x5ae) [0x7fb04a6ebd0e]
3: (Objecter::_throttle_op(Objecter::Op*, int)+0x68) [0x7fb04a666588]
4: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x195) [0x7fb04a677785]
5: (Objecter::op_submit(Objecter::Op*)+0x96) [0x7fb04a6778c6]
6: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x33d) [0x7fb04a650a0d]
7: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7fb04a656a2d]
8: (rados_write()+0xc1) [0x7fb04a624ee1]
9: (ffi_call_unix64()+0x4c) [0x7fb04d3a3adc]
10: (ffi_call()+0x1fc) [0x7fb04d3a340c]
11: (_ctypes_callproc()+0x48e) [0x7fb04d5ba5fe]
12: (()+0x15f9e) [0x7fb04d5bbf9e]
13: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
15: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
16: python() [0x56d0aa]
17: python() [0x4d9854]
18: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
19: python() [0x5872b2]
20: (()+0x8182) [0x7fb04e644182]
21: (clone()+0x6d) [0x7fb04e37130d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped) ######################
Core dumps are attached.

Actions #5

Updated by Pavan Rallabhandi over 9 years ago


int Objecter::calc_op_budget(Op *op)
{
  int op_budget = 0;
  for (vector<OSDOp>::iterator i = op->ops.begin();
       i != op->ops.end();
       ++i) {
    if (i->op.op & CEPH_OSD_OP_MODE_WR) {
      op_budget += i->indata.length();
    } else if (ceph_osd_op_mode_read(i->op.op)) {
      if (ceph_osd_op_type_data(i->op.op)) {
        if ((int64_t)i->op.extent.length > 0)
          op_budget += (int64_t)i->op.extent.length;                <----- could be the culprit?
      } else if (ceph_osd_op_type_attr(i->op.op)) {
        op_budget += i->op.xattr.name_len + i->op.xattr.value_len;
      }
    }
  }
  return op_budget;
}

Actions #6

Updated by Loïc Dachary over 9 years ago

  • Status changed from Can't reproduce to In Progress
Actions #7

Updated by Loïc Dachary over 9 years ago

The minimal script

import sys
import rados
cluster = rados.Rados(conffile=sys.argv[1])
cluster.connect()
ioctx = cluster.open_ioctx('rbd')
ioctx.write("hw", "A"*int(sys.argv[2]))

with debug objecter 20 shows the problem with
PYTHONPATH=pybind LD_LIBRARY_PATH=.libs python /tmp/a.py ceph.conf $((10 * 1024 * 1024 * 1024))" 
2014-09-26 19:08:46.304819 7f0dd7fff700 10 client.?.objecter ms_handle_connect 0x7f0dd80108c0
2014-09-26 19:08:46.304839 7f0dd7fff700 10 client.?.objecter resend_mon_ops
2014-09-26 19:08:46.305207 7f0dded96700 10 client.4110.objecter _maybe_request_map subscribing (onetime) to next osd map
2014-09-26 19:08:46.305410 7f0dd7fff700 10 client.4110.objecter ms_dispatch 0x7f0dd8000bf0 osd_map(11..11 src has 1..11) v3
2014-09-26 19:08:46.305420 7f0dd7fff700  3 client.4110.objecter handle_osd_map got epochs [11,11] > 0
2014-09-26 19:08:46.305425 7f0dd7fff700  3 client.4110.objecter handle_osd_map decoding full epoch 11
2014-09-26 19:08:46.305537 7f0dd7fff700 20 client.4110.objecter dump_active .. 0 homeless
2014-09-26 19:08:46.305564 7f0dd7fff700 10 client.4110.objecter ms_dispatch 0x7f0dd8000bf0 osd_map(11..11 src has 1..11) v3
2014-09-26 19:08:46.305567 7f0dd7fff700  3 client.4110.objecter handle_osd_map ignoring epochs [11,11] <= 11
2014-09-26 19:08:46.305569 7f0dd7fff700 20 client.4110.objecter dump_active .. 0 homeless
2014-09-26 19:08:51.305292 7f0ddcc52700 10 client.4110.objecter tick
2014-09-26 19:08:53.195734 7f0dded96700 -1 client.4110.objecter op_submit[write 0~2147483648]
2014-09-26 19:08:53.219751 7f0dded96700 -1 client.4110.objecter calc_op_budget[write 0~2147483648]
2014-09-26 19:08:53.219770 7f0dded96700 -1 client.4110.objecter calc_op_budget0x7f0b34000bb0
2014-09-26 19:08:53.219776 7f0dded96700 -1 client.4110.objecter calc_op_budget op_budget -2147483648
osdc/Objecter.cc: In function 'int Objecter::calc_op_budget(Objecter::Op*)' thread 7f0dded96700 time 2014-09-26 19:08:53.219782
osdc/Objecter.cc: 2421: FAILED assert(op_budget >= 0)
 ceph version 0.85-997-g2bf4526 (2bf4526fafde11b6cf95afab5089f85126d0fc65)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f0de079547b]
 2: (Objecter::calc_op_budget(Objecter::Op*)+0x46b) [0x7f0de0702ecb]
 3: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&)+0x10c) [0x7f0de07139ac]
 4: (Objecter::op_submit(Objecter::Op*)+0x74) [0x7f0de0713b54]
 5: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, long*, int)+0x344) [0x7f0de06ec9f4]
 6: (librados::IoCtxImpl::write(object_t const&, ceph::buffer::list&, unsigned long, unsigned long)+0x18d) [0x7f0de06f28cd]
 7: (rados_write()+0x59) [0x7f0de06c7fa9]
 8: (ffi_call_unix64()+0x4c) [0x7f0de34e0adc]
 9: (ffi_call()+0x1fc) [0x7f0de34e040c]
 10: (_ctypes_callproc()+0x48e) [0x7f0de36f75fe]
 11: (()+0x15f9e) [0x7f0de36f8f9e]
 12: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
 13: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
 14: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
 15: python() [0x56d0aa]
 16: python() [0x4d9854]
 17: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
 18: python() [0x5872b2]
 19: (()+0x8182) [0x7f0de4b65182]
 20: (clone()+0x6d) [0x7f0de489230d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
bash: line 1:  2590 Aborted                 (core dumped) PYTHONPATH=pybind LD_LIBRARY_PATH=.libs python /tmp/a.py ceph.conf 10737418240

Actions #8

Updated by Loïc Dachary over 9 years ago

Nice catch Pavan Rallabhandi ;-) I had trouble reproducing the problem because I forgot the "LD_LIBRARY_PATH=.libs" and was using the librados from 0.80 instead of 0.84+ ...

Actions #9

Updated by Loïc Dachary over 9 years ago

  • Status changed from In Progress to Fix Under Review
  • % Done changed from 0 to 80

https://github.com/ceph/ceph/pull/2584 should be enough. Unless there is a good reason to write an object with chunks larger than 1GB ? That is : if there is a need to create a 20GB object, is there a reason to create it with rados_write of buffers larger than 1GB ? Or is it reasonable to use smaller buffers (100MB at a time or less) ?

Actions #10

Updated by Loïc Dachary over 9 years ago

Looking at librados.hpp

Actions #11

Updated by Loïc Dachary over 9 years ago

Extend the checks to librados.hpp and aio_* https://github.com/ceph/ceph/pull/2590

Actions #12

Updated by Loïc Dachary over 9 years ago

  • Status changed from Fix Under Review to Resolved
  • % Done changed from 80 to 100
Actions

Also available in: Atom PDF