Project

General

Profile

Actions

Bug #54044

closed

intermittent hangs waiting for caps

Added by Jeff Layton over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I've seen a problem with my own testing that only crops up randomly. Occasionally, I see processes get stuck waiting for caps to do an operation (a read in this stack):

[root@centos8 ceph]# cat /proc/3685/stack
[<0>] wait_woken+0x2c/0x60
[<0>] ceph_get_caps+0x3f2/0x610 [ceph]
[<0>] ceph_read_iter+0x140/0xbe0 [ceph]
[<0>] new_sync_read+0x10f/0x150
[<0>] vfs_read+0x91/0x140
[<0>] ksys_pread64+0x61/0xa0
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca

When I go to look at the caps list, I can see that it's waiting for Fr caps, but it appears to already have them!

[root@centos8 31eed9b8-785f-11ec-bfe7-52540031ba78.client144465]# cat caps
total        20
avail        15
used        5
reserved    0
min        1024

ino              mds  issued           implemented
--------------------------------------------------
0x10000035a1e      0  pAsLsXs          pAsLsXs          
0x10000035a1c      0  pAsLsXsFs        pAsLsXsFs        
0x100000357aa      0  pAsLsXsFs        pAsLsXsFs        
0x10000035a21      0  pAsLsXs          pAsLsXs          
0x10000035a22      2  pAsxLsXsxFsxcrwb pAsxLsXsxFsxcrwb 

Waiters:
--------
tgid         ino                need             want
-----------------------------------------------------
3685         0x10000035a22      Fr               Fc               

This is either a race in how we're handling this wait, or we're missing some wakeups.

Actions

Also available in: Atom PDF