Project

General

Profile

Bug #18049

timeout during ceph-disk trigger due to /var/lock/ceph-disk flock contention

Added by David Disseldorp about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
Start date:
11/28/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Release:
jewel
Needs Doc:
No

Description

ceph-disk@.service handles udev device events by calling "ceph-disk trigger", which in turn handles service restart for the corresponding device.

"ceph-disk trigger" invocation is performed in a mutually exclusive manner, with each call first taking an flock on /var/lock/ceph-disk. The flock behaviour was added with f0a47578c7c4521d7cf50e9419620ddb629736f5 to address http://tracker.ceph.com/issues/13160.

The 120 second timeout was later added with bed1a5cc05a9880b91fc9ac8d8a959efe3b3d512 to address http://tracker.ceph.com/issues/16580 .

On systems with many osds, "ceph-disk trigger" during startup results in a large amount of contention for the /var/lock/ceph-disk flock, and can lead to some services tripping the 120 second timeout.

Given that the intention of the flock was to restrict concurrent invocations for a single device, it should be sufficient to use the device path for the flock. This will allow "ceph-disk trigger" events for different devices to run concurrently, greatly reducing the likelihood of service timeout.


Related issues

Copied to Ceph - Backport #18060: jewel: timeout during ceph-disk trigger due to /var/lock/ceph-disk flock contention Resolved

History

#1 Updated by Loic Dachary about 1 year ago

  • Status changed from New to Testing
  • Priority changed from Normal to Urgent
  • Backport set to jewel

#2 Updated by Loic Dachary about 1 year ago

  • Status changed from Testing to Pending Backport

#3 Updated by Loic Dachary about 1 year ago

  • Copied to Backport #18060: jewel: timeout during ceph-disk trigger due to /var/lock/ceph-disk flock contention added

#5 Updated by Nathan Cutler 11 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF