Actions
Bug #19635
closedDeadlock on two ceph-fuse clients accessing the same file
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
See Dan's reproducer script, and thread "[ceph-users] fsping, why you no work no mo?"
https://raw.githubusercontent.com/dvanders/fsping/
When I started a vstart cluster and mounted two fuse clients, then ran the script, I got two blocked requests like this
(virtualenv) jspray@senta04:~/ceph/build$ bin/ceph daemon mds.a ops *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** { "ops": [ { "description": "client_request(client.4110:27 lookup #1/senta04.ack 2017-04-16 17:39:09.476736 caller_uid=1121, caller_gid=1121{})", "initiated_at": "2017-04-16 17:39:09.476974", "age": 486.457417, "duration": 486.457469, "type_data": [ "failed to rdlock, waiting", "client.4110:27", "client_request", { "client": "client.4110", "tid": 27 }, [ { "time": "2017-04-16 17:39:09.476974", "event": "initiated" }, { "time": "2017-04-16 17:39:09.486978", "event": "failed to rdlock, waiting" } ] ] }, { "description": "client_request(client.4111:10 getattr pAsLsXsFs #100000003e9 2017-04-16 17:39:09.488176 caller_uid=1121, caller_gid=1121{})", "initiated_at": "2017-04-16 17:39:09.488318", "age": 486.446072, "duration": 486.446188, "type_data": [ "failed to rdlock, waiting", "client.4111:10", "client_request", { "client": "client.4111", "tid": 10 }, [ { "time": "2017-04-16 17:39:09.488318", "event": "initiated" }, { "time": "2017-04-16 17:39:09.489099", "event": "failed to rdlock, waiting" } ] ] } ], "num_ops": 2 }
This is apparently something that worked in 10.2.5 and is now failing on more recent versions.
Actions