Project

General

Profile

Actions

Bug #18049

closed

timeout during ceph-disk trigger due to /var/lock/ceph-disk flock contention

Added by David Disseldorp over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph-disk@.service handles udev device events by calling "ceph-disk trigger", which in turn handles service restart for the corresponding device.

"ceph-disk trigger" invocation is performed in a mutually exclusive manner, with each call first taking an flock on /var/lock/ceph-disk. The flock behaviour was added with f0a47578c7c4521d7cf50e9419620ddb629736f5 to address http://tracker.ceph.com/issues/13160.

The 120 second timeout was later added with bed1a5cc05a9880b91fc9ac8d8a959efe3b3d512 to address http://tracker.ceph.com/issues/16580 .

On systems with many osds, "ceph-disk trigger" during startup results in a large amount of contention for the /var/lock/ceph-disk flock, and can lead to some services tripping the 120 second timeout.

Given that the intention of the flock was to restrict concurrent invocations for a single device, it should be sufficient to use the device path for the flock. This will allow "ceph-disk trigger" events for different devices to run concurrently, greatly reducing the likelihood of service timeout.


Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #18060: jewel: timeout during ceph-disk trigger due to /var/lock/ceph-disk flock contentionResolvedDavid DisseldorpActions
Actions #1

Updated by Loïc Dachary over 7 years ago

  • Status changed from New to 7
  • Priority changed from Normal to Urgent
  • Backport set to jewel
Actions #2

Updated by Loïc Dachary over 7 years ago

  • Status changed from 7 to Pending Backport
Actions #3

Updated by Loïc Dachary over 7 years ago

  • Copied to Backport #18060: jewel: timeout during ceph-disk trigger due to /var/lock/ceph-disk flock contention added
Actions #5

Updated by Nathan Cutler about 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF