Project

General

Profile

Actions

Bug #19635

closed

Deadlock on two ceph-fuse clients accessing the same file

Added by John Spray about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See Dan's reproducer script, and thread "[ceph-users] fsping, why you no work no mo?"
https://raw.githubusercontent.com/dvanders/fsping/

When I started a vstart cluster and mounted two fuse clients, then ran the script, I got two blocked requests like this

(virtualenv) jspray@senta04:~/ceph/build$ bin/ceph daemon mds.a ops
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
{
    "ops": [
        {
            "description": "client_request(client.4110:27 lookup #1/senta04.ack 2017-04-16 17:39:09.476736 caller_uid=1121, caller_gid=1121{})",
            "initiated_at": "2017-04-16 17:39:09.476974",
            "age": 486.457417,
            "duration": 486.457469,
            "type_data": [
                "failed to rdlock, waiting",
                "client.4110:27",
                "client_request",
                {
                    "client": "client.4110",
                    "tid": 27
                },
                [
                    {
                        "time": "2017-04-16 17:39:09.476974",
                        "event": "initiated" 
                    },
                    {
                        "time": "2017-04-16 17:39:09.486978",
                        "event": "failed to rdlock, waiting" 
                    }
                ]
            ]
        },
        {
            "description": "client_request(client.4111:10 getattr pAsLsXsFs #100000003e9 2017-04-16 17:39:09.488176 caller_uid=1121, caller_gid=1121{})",
            "initiated_at": "2017-04-16 17:39:09.488318",
            "age": 486.446072,
            "duration": 486.446188,
            "type_data": [
                "failed to rdlock, waiting",
                "client.4111:10",
                "client_request",
                {
                    "client": "client.4111",
                    "tid": 10
                },
                [
                    {
                        "time": "2017-04-16 17:39:09.488318",
                        "event": "initiated" 
                    },
                    {
                        "time": "2017-04-16 17:39:09.489099",
                        "event": "failed to rdlock, waiting" 
                    }
                ]
            ]
        }
    ],
    "num_ops": 2
}

This is apparently something that worked in 10.2.5 and is now failing on more recent versions.


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #20027: jewel: Deadlock on two ceph-fuse clients accessing the same fileResolvedWei-Chung ChengActions
Copied to CephFS - Backport #20028: kraken: Deadlock on two ceph-fuse clients accessing the same fileResolvedNathan CutlerActions
Actions

Also available in: Atom PDF