Project

General

Profile

Actions

Bug #9257

closed

paddles: race condition in lock server

Added by Zack Cerza over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Actions #1

Updated by Zack Cerza over 9 years ago

  • Status changed from In Progress to 7
Actions #2

Updated by Sage Weil over 9 years ago

  • Priority changed from Immediate to Urgent
Actions #3

Updated by Zack Cerza over 9 years ago

My fix wasn't quite good enough. I've moved lots of error and sanity checking from the controllers to the model itself, and I intend to write some multi-threaded unit tests to reproduce the problem, so that I can prove a fix.

Actions #4

Updated by Zack Cerza over 9 years ago

  • Status changed from 7 to Fix Under Review
Actions #5

Updated by Zack Cerza over 9 years ago

The PR is merged:
https://github.com/ceph/paddles/commit/14eb6198e8928f91091d93bd50a87fb3d9cd78a2

But now it's Friday so I might wait until Monday to deploy the fix.

Actions #6

Updated by Zack Cerza over 9 years ago

  • Status changed from Fix Under Review to Resolved

Last PR mentioned wasn't quite good enough. Deployed this one today:
https://github.com/ceph/paddles/commit/4d4923a001a76f814f630eb422f3723d4f059b17

It's looking resolved:

2014-09-09 14:45:13,399 INFO  [paddles.controllers.nodes] Locking 2 plana nodes for scheduled_teuthology@teuthology
2014-09-09 14:45:13,416 WARNI [paddles.controllers.nodes] lock_many() detected race condition
2014-09-09 14:45:13,416 INFO  [paddles.controllers.nodes] retrying after race avoidance (1 tries left)

Actions

Also available in: Atom PDF