Project

General

Profile

Actions

Bug #37524

open

[cinder] enabling rados connect timeout can result in EROFS errors on maintenance operations

Added by Jason Dillaman over 5 years ago. Updated over 5 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If Cinder "rados_connect_timeout" setting is configured, it's possible for maintenance operations to timeout when sending ops to busy OSDs. This incorrectly results in a confusing EROFS error getting bubbled up to Cinder [1]. Instead, if a ETIMEDOUT error occurs when trying to acquire the exclusive lock, try again.

[1] https://github.com/ceph/ceph/blob/4fd9ff8caec42b9ba0992560571596ef280c2548/src/librbd/Operations.cc#L227

Actions #1

Updated by Kefu Chai over 5 years ago

Jason, how about returning EAGAIN or ETIMEOUT in that case? for instance, if rbd_snap_create() times out when acquiring an exclusive lock, we could return ETIMEOUT.

Actions

Also available in: Atom PDF