Project

General

Profile

Actions

Bug #14089

closed

ceph rbd api is not thread safe

Added by ceph zte over 8 years ago. Updated over 8 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My ceph version is 0.94.When i use the Multithreading run rbd api like below.

The librbd will core dump.Why the librbd api is not thread safe?

import rados
import rbd
import hashlib
import datetime
import  threading

RADOS_NAME = 'client.admin'
RBDTIMEOUT = 20
num=0
def md5(raw):
    import hashlib
    hasher = hashlib.md5()
    hasher.update(raw)
    return hasher.hexdigest()

mu = threading.Lock()
mu1 = threading.Lock()

class RBDHandle():
    def __init__(self, clustername="ceph"):
        self._clustername = clustername

    def getmd5info(self, ivalue):
        hasher = hashlib.md5()
        hasher.update(str(ivalue))
        return hasher.hexdigest()

     def testconnect(self):
       try:

           cluster_handle = rados.Rados(name=RADOS_NAME, clustername=self._clustername, conffile='')
           cluster_handle.connect(timeout=RBDTIMEOUT)

       except:
           print "it is error" 
       finally:

           cluster_handle.shutdown()

           if mu1.acquire():
              global num
              num = num + 1
              mu1.release()
              print num

def test_fuc():
    crb = RBDHandle()
    data={}
    threads=[]
    for i in range(1,10):
        print "it is begin" 
        t=threading.Thread(target= crb.testconnect)
        threads.append(t)

    for t in threads:
        t.setDaemon(True)
        t.start()
    t.join()

if __name__ == "__main__":
    while True:
         global num
         num = 0
         test_fuc()
         if num != 9:
            break
    print "it is over" 
Actions #1

Updated by Nathan Cutler over 8 years ago

  • Project changed from Ceph to rbd
  • Subject changed from ceph rbd api is not thread sate. to ceph rbd api is not thread safe
Actions #2

Updated by Jason Dillaman over 8 years ago

  • Status changed from New to Need More Info
  • Priority changed from Urgent to Normal

What is the exact issue you are encountering?

Actions #3

Updated by Josh Durgin over 8 years ago

  • Description updated (diff)
Actions #4

Updated by ceph zte over 8 years ago

I run the below python script in Multithreading.Such as 100 threads.Sometimes it usually has the below errors like "Exception in thread Thread-95 (most likely raised during interpreter shutdown)"

def testconnect(self):
try:

cluster_handle = rados.Rados(name=RADOS_NAME, clustername=self._clustername, conffile='')
cluster_handle.connect(timeout=RBDTIMEOUT)
finally:
cluster_handle.shutdown()
Actions #5

Updated by ceph zte over 8 years ago

The core dump print is as below

pure virtual method called
terminate called without an active exception
Aborted (core dumped)

Actions #6

Updated by Josh Durgin over 8 years ago

Could you open the core file that was dumped with gdb and get a backtrace?

i.e. run 'gdb python /path/to/core/file' and then the gdb command 'bt', and paste the full output here?

This does seem to be a good test case - on master running it in a loop exposed a different bug (#14115)

Actions #7

Updated by ceph zte over 8 years ago

(gdb) bt
#0 0x00007f39abea35f9 in raise () from /lib64/libc.so.6
#1 0x00007f39abea5068 in abort () from /lib64/libc.so.6
#2 0x00007f399d8ef9d5 in _gnu_cxx::_verbose_terminate_handler() ()
from /lib64/libstdc++.so.6
#3 0x00007f399d8ed946 in ?? () from /lib64/libstdc++.so.6
#4 0x00007f399d8ed973 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00007f399d8ee4df in __cxa_pure_virtual () from /lib64/libstdc++.so.6
#6 0x00007f3969a84dd4 in ThreadPool::WorkQueueVal<std::pair<Context*, int>, std ::pair<Context*, int> >::_void_dequeue (this=0x7f36f8494c40)
at ./common/WorkQueue.h:177
#7 0x00007f3969b75534 in ThreadPool::worker (this=0x7f36f822d650,
wt=0x7f36f87b6d30) at common/WorkQueue.cc:120
#8 0x00007f3969b76ab0 in ThreadPool::WorkThread::entry (this=<optimized out>)
at common/WorkQueue.h:318
#9 0x00007f39ac93fdf3 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f39abf6454d in clone () from /lib64/libc.so.6

Actions #8

Updated by Jason Dillaman over 8 years ago

I believe this is a duplicate of #13636 (#13758 is the backport ticket for Hammer). It should be included in the forthcoming v0.94.6 release.

Actions #9

Updated by Jason Dillaman over 8 years ago

  • Status changed from Need More Info to Duplicate
Actions

Also available in: Atom PDF