Project

General

Profile

Actions

Bug #3691

closed

Lock issue in librados resulting in application hang

Added by Xiaopong Tran over 11 years ago. Updated over 11 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
librados
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We ran into some nasty lock issue in librados, it's trying to write some data, and hangs there for a many seconds until it gets out of it.

Whenever this happens, the whole application locks up for a few seconds, and then librados gets out of it, and everything continues again. After a few minutes, it would get into the same situation again.

Sometimes, it would get into this situations several times in a row. And there's nothing we can do but wait for it to get out of it.

This makes our application totally unusable.

Attached is the trace we pulled from strace.

Ceph 0.55.1 on Debian Wheezy.


Files

librados-lock-issue.txt (5 KB) librados-lock-issue.txt Xiaopong Tran, 12/28/2012 04:53 AM
Actions #1

Updated by Ian Colle over 11 years ago

  • Assignee set to Josh Durgin
Actions #2

Updated by Josh Durgin over 11 years ago

  • Status changed from New to Rejected

You're calling the synchronous version of write, and the spot where it's 'hung' is just waiting for the response from the osds. This is not a bug in librados. Writes will block until they are complete on all replicas.

There are many possible causes of blocked writes. If your cluster has pgs that are not active, writes to those pgs will block until they become active. If the journal is much faster than the data disks, you could be hitting latency spikes when the journal fills up and has to be written out to disk.

Your application may be better off using aio_write and only waiting on the writes to complete (via rados_aio_wait_for_safe/rados_aio_wait_for_complete).
If the hangs are because your cluster is unhealthy and the writes are hitting inactive pgs, there may be other issues to resolve.

Actions #3

Updated by Xiaopong Tran over 11 years ago

Well, this is an even worse issue. We are adding new osds (just 8 now), and the cluster has been staying "unhealthy" for almost 24 hours now, and we are still not sure when it will become healthy again. And this is just a small cluster that started with 30 osds. And we don't have a lot of pgs to start with either.

If everytime we add new osds and the cluster becomes unhealthy for a very very long time, this is scary.

Actions #4

Updated by Josh Durgin over 11 years ago

This affects small clusters more because a single osd is a larger proportion of the whole cluster. In bobtail, there will be limits to peering/recovery so that client I/O is not blocked like this. For argonaut, especially with small clusters, you can limit the data migration by adding osds at weight 0 and slowly ramping up their weights, as mentioned in the 'Argonaut best practices' here: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#adding-an-osd-manual

Actions

Also available in: Atom PDF