Project

General

Profile

Actions

Bug #11179

closed

"rbd ls" hang forever when cluster is damaged

Added by science luo about 9 years ago. Updated about 9 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi there,
There is something wrong with my OSD,but I did not realize this,when I run "rbd ls",this command hangs like forever.
So I enable the debug option to figure out why
from the hang log,there are endless tick process until the OSDs become to normal.
But if my OSD keep down status ?then "rbd ls" will hang forever?
There is no timeout process to end this? Return a error_code or something to end this?
Thanks for any hints.


Files

hang rbd ls.txt (2.67 KB) hang rbd ls.txt science luo, 03/20/2015 09:22 AM
normal rbd ls.txt (5.83 KB) normal rbd ls.txt science luo, 03/20/2015 09:22 AM
Actions #1

Updated by John Spray about 9 years ago

  • Project changed from CephFS to Ceph
  • Target version deleted (v0.87)
  • Affected Versions deleted (v0.87)
Actions #2

Updated by Greg Farnum about 9 years ago

  • Subject changed from "rbd ls" hang forever to "rbd ls" hang forever when cluster is damaged

In general, all of the commands which try to access data from RADOS will block until the data is accessible, on the theory that any outage is temporary.

I think we've added extra timeouts in a few places, but I'm not sure what the right criteria for that are.

Actions #3

Updated by science luo about 9 years ago

Greg Farnum wrote:

In general, all of the commands which try to access data from RADOS will block until the data is accessible, on the theory that any outage is temporary.

I think we've added extra timeouts in a few places, but I'm not sure what the right criteria for that are.

That means this situation won't change in the later version?

Actions #4

Updated by Greg Farnum about 9 years ago

  • Project changed from Ceph to rbd

I'm not sure. Like I said, we usually don't want to timeout. In some user-facing tools we've added specific timeouts because it seemed like a good tradeoff; I can't really speak to the RBD tool here.

Actions #5

Updated by Josh Durgin about 9 years ago

  • Status changed from New to Won't Fix

The best way to script around things like this is to use the timeout command, e.g. "timeout 30 rbd ls". There are "rados mon op timeout" and "rados osd op timeout" options which can also work for this case, but they're not recommended for use with anything that writes to rbd.

Actions

Also available in: Atom PDF