Feature #6507: librados shouldn't block indefinitely when cluster doesn't respond - Ceph - Ceph

Actions

Copy link

Feature #6507

closed

librados shouldn't block indefinitely when cluster doesn't respond

Added by Wido den Hollander over 10 years ago. Updated over 8 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Josh Durgin

Category:

librados

Target version:

% Done:

Spent time:

1:00 h

Source:

other

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

Currently calls like rados_connect(), rados_ioctx_create(), rados_read() block for ever if the Ceph cluster isn't responding.

A parameter which configures a timeout would be useful in a lot of situations so applications can fail when the Ceph cluster is not responding rather then simply hang for ever.

My use-case here is libvirt, when it calls a storagePoolRefresh it tries to query for RBD images, but if the Ceph cluster isn't responding libvirt will eventually lock up.

Files

radostimeout.c (1.31 KB) radostimeout.c

Wido den Hollander, 02/03/2014 02:41 AM

Actions

Copy link

Updated by Dan Mick over 10 years ago

1) I solved this in the python bindings with a separate timer thread, which is always possible in C/C++ as well, although I agree it'd be nice if the librados itself managed this.

2) we were just looking at this today, and my experience is that MonClient::authenticate() eventually times out (with the unfortunately-named
client_mount_timeout configuration variable, defaults to 300s). But Alfredo was
not seeing the timeout work, and we didn't get to the bottom of it. Perhaps
you could experiment with --mon_client_timeout=15, for example, Wido, and add
clarity to this issue?

Actions

Copy link

Updated by Dietmar Maurer over 10 years ago

I try to write perl bindings for librados, so it would be great to have that
managed inside librados.

Actions

Copy link

Updated by Wido den Hollander about 10 years ago

File radostimeout.c radostimeout.c added

Dan Mick wrote:

1) I solved this in the python bindings with a separate timer thread, which is always possible in C/C++ as well, although I agree it'd be nice if the librados itself managed this.

2) we were just looking at this today, and my experience is that MonClient::authenticate() eventually times out (with the unfortunately-named
client_mount_timeout configuration variable, defaults to 300s). But Alfredo was
not seeing the timeout work, and we didn't get to the bottom of it. Perhaps
you could experiment with --mon_client_timeout=15, for example, Wido, and add
clarity to this issue?

So sorry for my late response on this one, but with #7282 this became relevant again.

So I did a quick test with librados.h and set 'client_mount_timeout' to 5 seconds instead of the default 300.

rados_connect indeed returns after 5 seconds with -ETIMEDOUT (-110), but rados_ioctx_create will block for ever.

Attached is the code I used for the test on my local laptop, so don't worry about the secret key ;)

I've been looking at other timeout values in config_opts.h, but none of them seem relevant.

So in libvirt's case, it will hang for ever waiting on the Ceph cluster when trying to "refresh" it's internal storage pools causing libvirt to hang eventually.

Shouldn't there be a timeout for rados_ioctx_create, rados_read, rados_write, etc?

Actions

Copy link

Updated by Ian Colle about 10 years ago

Assignee set to Josh Durgin
Priority changed from Normal to Urgent

Actions

Copy link

Updated by Josh Durgin about 10 years ago

Assignee deleted (~~Josh Durgin~~)
Priority changed from Urgent to Normal

It makes sense to add as an option for librados users like the libvirt storage pool. The default is blocking for things like cephfs and rbd that typically run underneath applications and filesystems that don't handle errors well, but librados users could handle timeout errors if they wanted. Sage suggested adding a separate timeout for monitors and osds, so I'll add those and make them apply to all librados calls.

Actions

Copy link

Updated by Josh Durgin about 10 years ago

Assignee set to Josh Durgin
Priority changed from Normal to Urgent

didn't mean to change these

Actions

Copy link

Updated by Wido den Hollander about 10 years ago

Josh Durgin wrote:

It makes sense to add as an option for librados users like the libvirt storage pool. The default is blocking for things like cephfs and rbd that typically run underneath applications and filesystems that don't handle errors well, but librados users could handle timeout errors if they wanted. Sage suggested adding a separate timeout for monitors and osds, so I'll add those and make them apply to all librados calls.

That would be great. A default behavior of blocking for ever is just fine, but inside libvirt I want to be able to have a timeout set to 30 seconds or so.

I assume it would also apply to librbd since that simply wraps over librados.

Actions

Copy link

Updated by Josh Durgin about 10 years ago

Status changed from New to In Progress

Wido den Hollander wrote:

Josh Durgin wrote:

It makes sense to add as an option for librados users like the libvirt storage pool. The default is blocking for things like cephfs and rbd that typically run underneath applications and filesystems that don't handle errors well, but librados users could handle timeout errors if they wanted. Sage suggested adding a separate timeout for monitors and osds, so I'll add those and make them apply to all librados calls.

That would be great. A default behavior of blocking for ever is just fine, but inside libvirt I want to be able to have a timeout set to 30 seconds or so.

I assume it would also apply to librbd since that simply wraps over librados.

Yeah, it'll apply to anything using librados.

I pushed an implementation in the wip-librados-timeout branch. It adds two configuration options, rados_osd_op_timeout and rados_mon_op_timeout to control the behavior, defaulting to 0 (meaning no timeout). The libvirt pool code should set them via rados_conf_set() before calling rados_connect(). It can ignore the return code from that to be backwards compatible with librados without this option.

I still need to add tests, and double check the coverage, but some basic testing showed it working (you'll get -ETIMEDOUT if an operation hits the timeout).

Actions

Copy link

Updated by Wido den Hollander about 10 years ago

Josh Durgin wrote:

Wido den Hollander wrote:

Josh Durgin wrote:

It makes sense to add as an option for librados users like the libvirt storage pool. The default is blocking for things like cephfs and rbd that typically run underneath applications and filesystems that don't handle errors well, but librados users could handle timeout errors if they wanted. Sage suggested adding a separate timeout for monitors and osds, so I'll add those and make them apply to all librados calls.

That would be great. A default behavior of blocking for ever is just fine, but inside libvirt I want to be able to have a timeout set to 30 seconds or so.

I assume it would also apply to librbd since that simply wraps over librados.

Yeah, it'll apply to anything using librados.

I pushed an implementation in the wip-librados-timeout branch. It adds two configuration options, rados_osd_op_timeout and rados_mon_op_timeout to control the behavior, defaulting to 0 (meaning no timeout). The libvirt pool code should set them via rados_conf_set() before calling rados_connect(). It can ignore the return code from that to be backwards compatible with librados without this option.

I still need to add tests, and double check the coverage, but some basic testing showed it working (you'll get -ETIMEDOUT if an operation hits the timeout).

So I also did a basic test and it worked for me:

I set both 'rados_osd_op_timeout' and 'rados_mon_op_timeout' to 5 seconds and did a test:

wido@wido-laptop:~/Desktop$ time ./radostimeout 
Created the RADOS cluster
Set the key option to: AQB7Sg1R2DlXIxAAkbOnif9m3v/HD4QW19kNHA==
Set the mon_host option to: localhost
Set the client_mount_timeout option to: 5
Set the rados_mon_op_timeout option to: 5
Set the rados_osd_op_timeout option to: 5
Connected to the RADOS cluster
Opened to IoCTX
Failed to write to: myobject: -110
Shut down the RADOS cluster

real    0m5.018s
user    0m0.004s
sys    0m0.012s
wido@wido-laptop:~/Desktop$

That's exactly how I wanted it. I'll start working on a patch for libvirt to implement these timeout options. Getting a patch accepted at libvirt takes weeks, so I better get started with that.

Actions

Copy link

#10

Updated by Josh Durgin about 10 years ago

Status changed from In Progress to Fix Under Review

https://github.com/ceph/ceph/pull/1192

Actions

Copy link

#11

Updated by Josh Durgin about 10 years ago

Status changed from Fix Under Review to Resolved

merged in 32aa9fdf666063e4c5539b5e850f04af37e30b2e to master, backported to dumpling around 30dafacd0b54bb98b01284851e0d5abf76324e95

Actions

Copy link

#12

Updated by Haomai Wang over 8 years ago

I grep "client_mount_timeout" in source tree:

ack client_mount_timeout src/

src/client/Client.cc
4899: int r = monclient->authenticate(cct->_conf->client_mount_timeout);
5274: if (req->op_stamp + cct->_conf->client_mount_timeout < now) {

src/common/config_opts.h
350:OPTION

src/librados/RadosClient.cc
286: err = monclient.authenticate(conf->client_mount_timeout);

src/mon/MonClient.cc
236: int ret = pinger->wait_for_reply(cct->_conf->client_mount_timeout);

src/mon/MonClient.h
74: until += (timeout > 0 ? timeout : cct->_conf->client_mount_timeout);
324: * expired (default: conf->client_mount_timeout).

src/test/librados/misc.cc
53: ASSERT_EQ(0, rados_conf_set(cluster, "client_mount_timeout", "0.000001"));

client_mount_timeout is a expected config value which may mix up with rados_mon_op_timeout. I guess we need to clean them up?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Feature #6507

librados shouldn't block indefinitely when cluster doesn't respond

Updated by Dan Mick over 10 years ago

Updated by Dietmar Maurer over 10 years ago

Updated by Wido den Hollander about 10 years ago

Updated by Ian Colle about 10 years ago

Updated by Josh Durgin about 10 years ago

Updated by Josh Durgin about 10 years ago

Updated by Wido den Hollander about 10 years ago

Updated by Josh Durgin about 10 years ago

Updated by Wido den Hollander about 10 years ago

Updated by Josh Durgin about 10 years ago

Updated by Josh Durgin about 10 years ago

Updated by Haomai Wang over 8 years ago