Actions
Bug #61717
closedCephFS flock blocked on itself
Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:
0%
ceph-qa-suite:
Component(FS):
MDS, kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I had a situation where on the same machine, 30 processes were started almost simultaneously that took the same exclusive `flock()` on a CephFS file.
This resulted all the processes being stuck in the flock syscall as shown in strace:
flock(4</path/to/cephfs/mount/.my-lock>, LOCK_EX
None of the processes advanced.
When killing all except 1 of them, it was still stuck in the syscall.
Running
ceph daemon "mds.$(hostname)" dump_blocked_ops
showed a corresponding entry
{ "description": "client_request(client.2464309:84251 setfilelock rule 2, type 2, owner 16217184910939483155, pid 1509340, start 0, length 0, wait 1 #0x1001d5669f5 2023-06-16T17:04:49.997379+0000 caller_uid=1000, caller_gid=65534{})", "initiated_at": "2023-06-16T17:04:49.999396+0000", "age": 768.636335207, "duration": 768.63642045799998, "type_data": { "flag_point": "failed to add lock, waiting", "reqid": "client.2464309:84251", "op_type": "client_request", "client_info": { "client": "client.2464309", "tid": 84251 }, "events": [ { "time": "2023-06-16T17:04:49.999396+0000", "event": "initiated" }, { "time": "2023-06-16T17:04:49.999397+0000", "event": "throttled" }, { "time": "2023-06-16T17:04:49.999396+0000", "event": "header_read" }, { "time": "2023-06-16T17:04:49.999400+0000", "event": "all_read" }, { "time": "2023-06-16T17:04:49.999409+0000", "event": "dispatched" }, { "time": "2023-06-16T17:04:49.999417+0000", "event": "acquired locks" }, { "time": "2023-06-16T17:04:49.999423+0000", "event": "failed to add lock, waiting" } ] } }
I could work around the situation by manually killing also the last remaining processes that was stuck in `flock()`.
So this looks like the `flock()` is somehow blocked on itself.
That seems wrong.
Ceph 16.2.7, Linux 5.15.109 kernel mount
Actions