Bug #504: hang when using radostool - Ceph - Ceph

Actions

Copy link

Bug #504

closed

hang when using radostool

Added by Colin McCabe over 13 years ago. Updated over 13 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I was adding some objects using radostool, when I got an unexplained hang. It looked like this:

gdb -p 19724

(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007f00037e0583 in ?? () from /home/cmccabe/src/ceph/src/.libs/librados.so.1
#2 0x00007f00037f3742 in ?? () from /home/cmccabe/src/ceph/src/.libs/librados.so.1
#3 0x00007f00037c910b in RadosClient::shutdown (this=0x1a18420) at librados.cc:394
#4 0x00007f00037c9264 in librados::Rados::shutdown (this=0x7fff1aac0590) at librados.cc:1288
#5 0x0000000000414b9e in main (argc=6, argv=0x7fff1aac07a8) at rados.cc:467

I'm not sure whether this is a race inside radosclient/librados itself, or a server failing to respond.

I then ran another instance of radostool and got a different hang.

gdb -p 20614

(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007f78761ce583 in ?? () from /home/cmccabe/src/ceph/src/.libs/librados.so.1
#2 0x00007f78761b3c35 in RadosClient::write_full (this=0xb18420, pool=@0x7f786c0008b0, oid=@0x7fffb9ce0c40, bl=@0x7fffb9ce1140) at librados.cc:897
#3 0x00007f78761b3e1d in librados::Rados::write_full (this=0x7fffb9ce1310, pool=0x7f786c0008b0, o=@0x7fffb9ce1270, bl=@0x7fffb9ce1140) at librados.cc:1409
#4 0x00000000004134fe in main (argc=6, argv=0x7fffb9ce1528) at rados.cc:285

So it appears to have been waiting for a reply from the server.

Then I modified ceph.conf to increase debugging. Specifically, I set:
debug ms = 20
debug objecter = 20
debug monc 20

When I re-ran radostool with these settings, everything worked fine. Subsequent attempts to reproduce the first two hangs failed.

Configuration: vstart.sh with two OSDs. Standard ceph.conf.