Project

General

Profile

Feature #6507

librados shouldn't block indefinitely when cluster doesn't respond

Added by Wido den Hollander almost 6 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
librados
Target version:
-
Start date:
10/10/2013
Due date:
% Done:

0%

Spent time:
Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Currently calls like rados_connect(), rados_ioctx_create(), rados_read() block for ever if the Ceph cluster isn't responding.

A parameter which configures a timeout would be useful in a lot of situations so applications can fail when the Ceph cluster is not responding rather then simply hang for ever.

My use-case here is libvirt, when it calls a storagePoolRefresh it tries to query for RBD images, but if the Ceph cluster isn't responding libvirt will eventually lock up.

radostimeout.c View (1.31 KB) Wido den Hollander, 02/03/2014 02:41 AM

Associated revisions

Revision 3e1f7bbb (diff)
Added by Josh Durgin over 5 years ago

Objecter: implement mon and osd operation timeouts

This captures almost all operations from librados other than mon_commands().

Get the values for the timeouts from the Objecter constructor, so only
librados uses them.

Add C_Cancel_*_Op, finish_*_op(), and *_op_cancel() for each type of
operation, to mirror those for Op. Create a callback and schedule it
in the existing timer thread if the timeouts are specified.

Fixes: #6507
Signed-off-by: Josh Durgin <>

Revision 30dafacd (diff)
Added by Josh Durgin over 5 years ago

Objecter: implement mon and osd operation timeouts

This captures almost all operations from librados other than mon_commands().

Get the values for the timeouts from the Objecter constructor, so only
librados uses them.

Add C_Cancel_*_Op, finish_*_op(), and *_op_cancel() for each type of
operation, to mirror those for Op. Create a callback and schedule it
in the existing timer thread if the timeouts are specified.

Fixes: #6507
Signed-off-by: Josh Durgin <>
(cherry picked from commit 3e1f7bbb4217d322f4e0ece16e676cd30ee42a20)

Conflicts:
src/osd/OSD.cc
src/osd/ReplicatedPG.cc
src/osdc/Objecter.cc
src/osdc/Objecter.h

Revision 69b1e5e5 (diff)
Added by Josh Durgin over 5 years ago

Objecter: implement mon and osd operation timeouts

This captures almost all operations from librados other than mon_commands().

Get the values for the timeouts from the Objecter constructor, so only
librados uses them.

Add C_Cancel_*_Op, finish_*_op(), and *_op_cancel() for each type of
operation, to mirror those for Op. Create a callback and schedule it
in the existing timer thread if the timeouts are specified.

Fixes: #6507
Signed-off-by: Josh Durgin <>
(cherry picked from commit 3e1f7bbb4217d322f4e0ece16e676cd30ee42a20)

Conflicts:
src/osd/ReplicatedPG.cc

History

#1 Updated by Dan Mick almost 6 years ago

1) I solved this in the python bindings with a separate timer thread, which is always possible in C/C++ as well, although I agree it'd be nice if the librados itself managed this.

2) we were just looking at this today, and my experience is that MonClient::authenticate() eventually times out (with the unfortunately-named
client_mount_timeout configuration variable, defaults to 300s). But Alfredo was
not seeing the timeout work, and we didn't get to the bottom of it. Perhaps
you could experiment with --mon_client_timeout=15, for example, Wido, and add
clarity to this issue?

#2 Updated by Dietmar Maurer over 5 years ago

I try to write perl bindings for librados, so it would be great to have that
managed inside librados.

#3 Updated by Wido den Hollander over 5 years ago

Dan Mick wrote:

1) I solved this in the python bindings with a separate timer thread, which is always possible in C/C++ as well, although I agree it'd be nice if the librados itself managed this.

2) we were just looking at this today, and my experience is that MonClient::authenticate() eventually times out (with the unfortunately-named
client_mount_timeout configuration variable, defaults to 300s). But Alfredo was
not seeing the timeout work, and we didn't get to the bottom of it. Perhaps
you could experiment with --mon_client_timeout=15, for example, Wido, and add
clarity to this issue?

So sorry for my late response on this one, but with #7282 this became relevant again.

So I did a quick test with librados.h and set 'client_mount_timeout' to 5 seconds instead of the default 300.

rados_connect indeed returns after 5 seconds with -ETIMEDOUT (-110), but rados_ioctx_create will block for ever.

Attached is the code I used for the test on my local laptop, so don't worry about the secret key ;)

I've been looking at other timeout values in config_opts.h, but none of them seem relevant.

So in libvirt's case, it will hang for ever waiting on the Ceph cluster when trying to "refresh" it's internal storage pools causing libvirt to hang eventually.

Shouldn't there be a timeout for rados_ioctx_create, rados_read, rados_write, etc?

#4 Updated by Ian Colle over 5 years ago

  • Assignee set to Josh Durgin
  • Priority changed from Normal to Urgent

#5 Updated by Josh Durgin over 5 years ago

  • Assignee deleted (Josh Durgin)
  • Priority changed from Urgent to Normal

It makes sense to add as an option for librados users like the libvirt storage pool. The default is blocking for things like cephfs and rbd that typically run underneath applications and filesystems that don't handle errors well, but librados users could handle timeout errors if they wanted. Sage suggested adding a separate timeout for monitors and osds, so I'll add those and make them apply to all librados calls.

#6 Updated by Josh Durgin over 5 years ago

  • Assignee set to Josh Durgin
  • Priority changed from Normal to Urgent

didn't mean to change these

#7 Updated by Wido den Hollander over 5 years ago

Josh Durgin wrote:

It makes sense to add as an option for librados users like the libvirt storage pool. The default is blocking for things like cephfs and rbd that typically run underneath applications and filesystems that don't handle errors well, but librados users could handle timeout errors if they wanted. Sage suggested adding a separate timeout for monitors and osds, so I'll add those and make them apply to all librados calls.

That would be great. A default behavior of blocking for ever is just fine, but inside libvirt I want to be able to have a timeout set to 30 seconds or so.

I assume it would also apply to librbd since that simply wraps over librados.

#8 Updated by Josh Durgin over 5 years ago

  • Status changed from New to In Progress

Wido den Hollander wrote:

Josh Durgin wrote:

It makes sense to add as an option for librados users like the libvirt storage pool. The default is blocking for things like cephfs and rbd that typically run underneath applications and filesystems that don't handle errors well, but librados users could handle timeout errors if they wanted. Sage suggested adding a separate timeout for monitors and osds, so I'll add those and make them apply to all librados calls.

That would be great. A default behavior of blocking for ever is just fine, but inside libvirt I want to be able to have a timeout set to 30 seconds or so.

I assume it would also apply to librbd since that simply wraps over librados.

Yeah, it'll apply to anything using librados.

I pushed an implementation in the wip-librados-timeout branch. It adds two configuration options, rados_osd_op_timeout and rados_mon_op_timeout to control the behavior, defaulting to 0 (meaning no timeout). The libvirt pool code should set them via rados_conf_set() before calling rados_connect(). It can ignore the return code from that to be backwards compatible with librados without this option.

I still need to add tests, and double check the coverage, but some basic testing showed it working (you'll get -ETIMEDOUT if an operation hits the timeout).

#9 Updated by Wido den Hollander over 5 years ago

Josh Durgin wrote:

Wido den Hollander wrote:

Josh Durgin wrote:

It makes sense to add as an option for librados users like the libvirt storage pool. The default is blocking for things like cephfs and rbd that typically run underneath applications and filesystems that don't handle errors well, but librados users could handle timeout errors if they wanted. Sage suggested adding a separate timeout for monitors and osds, so I'll add those and make them apply to all librados calls.

That would be great. A default behavior of blocking for ever is just fine, but inside libvirt I want to be able to have a timeout set to 30 seconds or so.

I assume it would also apply to librbd since that simply wraps over librados.

Yeah, it'll apply to anything using librados.

I pushed an implementation in the wip-librados-timeout branch. It adds two configuration options, rados_osd_op_timeout and rados_mon_op_timeout to control the behavior, defaulting to 0 (meaning no timeout). The libvirt pool code should set them via rados_conf_set() before calling rados_connect(). It can ignore the return code from that to be backwards compatible with librados without this option.

I still need to add tests, and double check the coverage, but some basic testing showed it working (you'll get -ETIMEDOUT if an operation hits the timeout).

So I also did a basic test and it worked for me:

I set both 'rados_osd_op_timeout' and 'rados_mon_op_timeout' to 5 seconds and did a test:

wido@wido-laptop:~/Desktop$ time ./radostimeout 
Created the RADOS cluster
Set the key option to: AQB7Sg1R2DlXIxAAkbOnif9m3v/HD4QW19kNHA==
Set the mon_host option to: localhost
Set the client_mount_timeout option to: 5
Set the rados_mon_op_timeout option to: 5
Set the rados_osd_op_timeout option to: 5
Connected to the RADOS cluster
Opened to IoCTX
Failed to write to: myobject: -110
Shut down the RADOS cluster

real    0m5.018s
user    0m0.004s
sys    0m0.012s
wido@wido-laptop:~/Desktop$

That's exactly how I wanted it. I'll start working on a patch for libvirt to implement these timeout options. Getting a patch accepted at libvirt takes weeks, so I better get started with that.

#10 Updated by Josh Durgin over 5 years ago

  • Status changed from In Progress to Need Review

#11 Updated by Josh Durgin over 5 years ago

  • Status changed from Need Review to Resolved

#12 Updated by Haomai Wang over 3 years ago

I grep "client_mount_timeout" in source tree:

ack client_mount_timeout src/

src/client/Client.cc
4899: int r = monclient->authenticate(cct->_conf->client_mount_timeout);
5274: if (req->op_stamp + cct->_conf->client_mount_timeout < now) {

src/common/config_opts.h
350:OPTION

src/librados/RadosClient.cc
286: err = monclient.authenticate(conf->client_mount_timeout);

src/mon/MonClient.cc
236: int ret = pinger->wait_for_reply(cct->_conf->client_mount_timeout);

src/mon/MonClient.h
74: until += (timeout > 0 ? timeout : cct->_conf->client_mount_timeout);
324: * expired (default: conf->client_mount_timeout).

src/test/librados/misc.cc
53: ASSERT_EQ(0, rados_conf_set(cluster, "client_mount_timeout", "0.000001"));

client_mount_timeout is a expected config value which may mix up with rados_mon_op_timeout. I guess we need to clean them up?

Also available in: Atom PDF