Bug #504
closedhang when using radostool
0%
Description
I was adding some objects using radostool, when I got an unexplained hang. It looked like this:
gdb -p 19724
(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007f00037e0583 in ?? () from /home/cmccabe/src/ceph/src/.libs/librados.so.1
#2 0x00007f00037f3742 in ?? () from /home/cmccabe/src/ceph/src/.libs/librados.so.1
#3 0x00007f00037c910b in RadosClient::shutdown (this=0x1a18420) at librados.cc:394
#4 0x00007f00037c9264 in librados::Rados::shutdown (this=0x7fff1aac0590) at librados.cc:1288
#5 0x0000000000414b9e in main (argc=6, argv=0x7fff1aac07a8) at rados.cc:467
I'm not sure whether this is a race inside radosclient/librados itself, or a server failing to respond.
I then ran another instance of radostool and got a different hang.
gdb -p 20614
(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007f78761ce583 in ?? () from /home/cmccabe/src/ceph/src/.libs/librados.so.1
#2 0x00007f78761b3c35 in RadosClient::write_full (this=0xb18420, pool=@0x7f786c0008b0, oid=@0x7fffb9ce0c40, bl=@0x7fffb9ce1140) at librados.cc:897
#3 0x00007f78761b3e1d in librados::Rados::write_full (this=0x7fffb9ce1310, pool=0x7f786c0008b0, o=@0x7fffb9ce1270, bl=@0x7fffb9ce1140) at librados.cc:1409
#4 0x00000000004134fe in main (argc=6, argv=0x7fffb9ce1528) at rados.cc:285
So it appears to have been waiting for a reply from the server.
Then I modified ceph.conf to increase debugging. Specifically, I set:
debug ms = 20
debug objecter = 20
debug monc 20
When I re-ran radostool with these settings, everything worked fine. Subsequent attempts to reproduce the first two hangs failed.
Configuration: vstart.sh with two OSDs. Standard ceph.conf.