Bug #61717: CephFS flock blocked on itself - CephFS - Ceph

Actions

Copy link

Bug #61717

closed

CephFS flock blocked on itself

Added by Niklas Hambuechen 11 months ago. Updated 9 months ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Milind Changire

Category:

Correctness/Safety

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v16.2.7

ceph-qa-suite:

Component(FS):

MDS, kceph

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I had a situation where on the same machine, 30 processes were started almost simultaneously that took the same exclusive `flock()` on a CephFS file.

This resulted all the processes being stuck in the flock syscall as shown in strace:

flock(4</path/to/cephfs/mount/.my-lock>, LOCK_EX

None of the processes advanced.

When killing all except 1 of them, it was still stuck in the syscall.
Running

ceph daemon "mds.$(hostname)" dump_blocked_ops

showed a corresponding entry


        {
            "description": "client_request(client.2464309:84251 setfilelock rule 2, type 2, owner 16217184910939483155, pid 1509340, start 0, length 0, wait 1 #0x1001d5669f5 2023-06-16T17:04:49.997379+0000 caller_uid=1000, caller_gid=65534{})",
            "initiated_at": "2023-06-16T17:04:49.999396+0000",
            "age": 768.636335207,
            "duration": 768.63642045799998,
            "type_data": {
                "flag_point": "failed to add lock, waiting",
                "reqid": "client.2464309:84251",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.2464309",
                    "tid": 84251
                },
                "events": [
                    {
                        "time": "2023-06-16T17:04:49.999396+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2023-06-16T17:04:49.999397+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2023-06-16T17:04:49.999396+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2023-06-16T17:04:49.999400+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2023-06-16T17:04:49.999409+0000",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2023-06-16T17:04:49.999417+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2023-06-16T17:04:49.999423+0000",
                        "event": "failed to add lock, waiting" 
                    }
                ]
            }
        }

I could work around the situation by manually killing also the last remaining processes that was stuck in `flock()`.

So this looks like the `flock()` is somehow blocked on itself.
That seems wrong.

Ceph 16.2.7, Linux 5.15.109 kernel mount

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #61717

CephFS flock blocked on itself

Updated by Milind Changire 11 months ago

Updated by Dhairya Parmar 11 months ago

Updated by Niklas Hambuechen 11 months ago

Updated by Niklas Hambuechen 11 months ago

Updated by Greg Farnum 9 months ago

Updated by Venky Shankar 9 months ago