Project

General

Profile

Actions

Bug #54044

closed

intermittent hangs waiting for caps

Added by Jeff Layton about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I've seen a problem with my own testing that only crops up randomly. Occasionally, I see processes get stuck waiting for caps to do an operation (a read in this stack):

[root@centos8 ceph]# cat /proc/3685/stack
[<0>] wait_woken+0x2c/0x60
[<0>] ceph_get_caps+0x3f2/0x610 [ceph]
[<0>] ceph_read_iter+0x140/0xbe0 [ceph]
[<0>] new_sync_read+0x10f/0x150
[<0>] vfs_read+0x91/0x140
[<0>] ksys_pread64+0x61/0xa0
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca

When I go to look at the caps list, I can see that it's waiting for Fr caps, but it appears to already have them!

[root@centos8 31eed9b8-785f-11ec-bfe7-52540031ba78.client144465]# cat caps
total        20
avail        15
used        5
reserved    0
min        1024

ino              mds  issued           implemented
--------------------------------------------------
0x10000035a1e      0  pAsLsXs          pAsLsXs          
0x10000035a1c      0  pAsLsXsFs        pAsLsXsFs        
0x100000357aa      0  pAsLsXsFs        pAsLsXsFs        
0x10000035a21      0  pAsLsXs          pAsLsXs          
0x10000035a22      2  pAsxLsXsxFsxcrwb pAsxLsXsxFsxcrwb 

Waiters:
--------
tgid         ino                need             want
-----------------------------------------------------
3685         0x10000035a22      Fr               Fc               

This is either a race in how we're handling this wait, or we're missing some wakeups.

Actions #1

Updated by Jeff Layton about 2 years ago

I think it may be missing wakeups. In particular, it looks like if the caps we wanted came across in an IMPORT instead of a GRANT, then we may miss waking up the waiter.

Actions #2

Updated by Benjamin Resch almost 2 years ago

When I write several files from one client machine to a ceph filesystem and read it from another client machine few seconds later, I have noticed such intermittent hangs as well. strace shows a blocking read() syscall then. Just like described above, I get a kernel (5.17.0) stack trace with

[<0>] wait_woken+0x50/0x70
[<0>] ceph_get_caps+0x3e2/0x680 [ceph]

on its top during that hangs.

I can reproduce the issue by cd-ing to the same empty directory of a ceph filesystem on two different client machines and running

for i in {0..30}; do dd if=/dev/zero of=$i.bin bs=1M count=6; done

on the first and
find -type f -exec dd if={} of=/dev/null ';'

on the second. The second machine will then stall on some reads:
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 3.15579 s, 2.0 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 2.23673 s, 2.8 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 5.00335 s, 1.3 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 4.99509 s, 1.3 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 4.99936 s, 1.3 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 5.00855 s, 1.3 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 2.42172 s, 2.6 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 0.0255695 s, 246 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 2.53211 s, 2.5 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 5.00524 s, 1.3 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 0.0239585 s, 263 MB/s
12288+0 records in
12288+0 records out
6291456 bytes (6.3 MB, 6.0 MiB) copied, 0.0209594 s, 300 MB/s
...

While the flawless reads are processed in about 0.023 seconds, the stalling reads somehow tend to fall into a 5 seconds grid (notice the reads 3,4,5,6,10 and 7-9 together).

It looks like waiting some seconds after the write and calling

find -exec stat {} ';'

on the second machine before actually reading the files reduces the risk of stalling during the reads. However, this is just an impression after about 10 test runs and not really statistically found.

Depending on the number of files written and read, the stalls can add up to over a minute of waiting time which is quite severe for my application. Therefore, I hope that my insights help somehow to fix the issue.

Actions #3

Updated by Jeff Layton almost 2 years ago

  • Assignee changed from Jeff Layton to Xiubo Li
Actions #4

Updated by Xiubo Li over 1 year ago

Hi Benjamin,

BTW, what's the max_mds value ? It sounds like the caps are deferred and waiting for the mdlog to be flushed in the MDS side. The MDS will flush the mdlog per 5 seconds.

Actions #5

Updated by Xiubo Li over 1 year ago

  • Status changed from New to In Progress

Benjamin,

What's ceph version you were using ? Before we have a almost the same issue in ceph.

https://github.com/ceph/ceph/pull/45243

Actions #7

Updated by Xiubo Li over 1 year ago

  • Status changed from In Progress to Fix Under Review
Actions #8

Updated by Xiubo Li over 1 year ago

Xiubo Li wrote:

The patchwork link: https://patchwork.kernel.org/project/ceph-devel/list/?series=665521

More detail about this, for example:

When revoking the caps from CInode::xattrlock, maybe the CInode::filelock state has changed, so when revoking the caps there will be new caps coming too.

Actions #9

Updated by Benjamin Resch over 1 year ago

Hi Xiubo,

here are the answers to the open questions:

  • My max_mds value is 1
  • My ceph version is 17.2.2

In the meantime I have switched from kernel-based ceph to ceph-fuse on my clients to be able to use a more recent ceph version easily.
On clients with ceph-fuse 17.2.2 the problem still existed. I'll update 17.2.4 on clients and servers and see what happens then.

Actions #10

Updated by Xiubo Li over 1 year ago

  • Status changed from Fix Under Review to Resolved

Applied to the mainline and closing this tracker.

Actions

Also available in: Atom PDF