Bug #62863
Slowness or deadlock in ceph-fuse causes teuthology job to hang and fail
0%
Description
It seems that this due to a deadlock in ceph-fuse, the read-write on CephFS (which was mounted using ceph-fuse) was stuck. Due to this the job was stuck for almost 3 hours after which the job terminated itself.
From teuthology.log -
2023-09-13T06:19:38.633 INFO:tasks.workunit.client.0.smithi159.stdout: CC kernel/irq/migration.o 2023-09-13T09:10:31.862 DEBUG:teuthology.orchestra.run:got remote process result: 124 2023-09-13T09:11:31.897 DEBUG:teuthology.orchestra.run:timed out waiting; will kill: <Greenlet at 0x7f213277b480: copy_file_to(<paramiko.ChannelFile from <paramiko.Channel 104 (, <Logger tasks.workunit.client.0.smithi159.stderr (, None, False)> 2023-09-13T09:12:31.899 DEBUG:teuthology.orchestra.run:timed out waiting; will kill: <Greenlet at 0x7f213277b6a0: copy_file_to(<paramiko.ChannelFile from <paramiko.Channel 104 (, <Logger tasks.workunit.client.0.smithi159.stdout (, None, False)> 2023-09-13T09:12:31.899 INFO:tasks.workunit:Stopping ['kernel_untar_build.sh'] on client.0... 2023-09-13T09:12:31.900 DEBUG:teuthology.orchestra.run.smithi159:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2023-09-13T09:12:32.787 ERROR:teuthology.run_tasks:Saw exception from tasks.
Note the time difference between following 2 entries -
2023-09-13T06:19:38.633 INFO:tasks.workunit.client.0.smithi159.stdout: CC kernel/irq/migration.o 2023-09-13T09:10:31.862 DEBUG:teuthology.orchestra.run:got remote process result: 124
From /rishabh-2023-09-12_12:12:15-fs-wip-rishabh-2023sep12-b2-testing-default-smithi/7394785/console_logs/smithi159.log
-
[ 2214.285332] Showing all locks held in the system:^M [ 2214.291512] 1 lock held by rcu_tasks_kthre/12:^M [ 2214.295954] #0: ffffffff8255dd50 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0x420^M [ 2214.305517] 1 lock held by rcu_tasks_rude_/13:^M [ 2214.309960] #0: ffffffff8255dad0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0x420^M [ 2214.319953] 1 lock held by khungtaskd/56:^M [ 2214.323965] #0: ffffffff8255e7c0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x12/0x1d0^M [ 2214.332942] 1 lock held by ceph-fuse/128555:^M [ 2214.337208] #0: ffff888101ebdc68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.346091] 1 lock held by ceph-fuse/128557:^M [ 2214.350357] #0: ffff888107c4ca68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.359242] 1 lock held by ceph-fuse/128560:^M [ 2214.363530] #0: ffff88810d542268 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.372396] 1 lock held by ceph-fuse/128571:^M [ 2214.376667] #0: ffff888160d11a68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.385535] 1 lock held by ceph-fuse/128577:^M [ 2214.389807] #0: ffff888101ebe268 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.398674] 1 lock held by ceph-fuse/128581:^M [ 2214.402956] #0: ffff888160d11468 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.411820] 1 lock held by ceph-fuse/128593:^M [ 2214.416084] #0: ffff888101ebea68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.424970] 1 lock held by ceph-fuse/128596:^M [ 2214.429241] #0: ffff88810d540868 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.438111] 2 locks held by cc1/128215:^M [ 2214.441947] #0: ffff8883324771d0 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.451159] #1: ffff888332477628 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.459965] 2 locks held by cc1/128296:^M [ 2214.463796] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.473009] #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.481788] 2 locks held by cc1/128348:^M [ 2214.485619] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.494833] #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.503611] 1 lock held by cc1/128369:^M [ 2214.507356] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.516569] 2 locks held by cc1/128419:^M [ 2214.520400] #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.529613] #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.538412] 2 locks held by cc1/128466:^M [ 2214.542248] #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.551462] #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.560251] 2 locks held by cc1/128595:^M [ 2214.564088] #0: ffff888346f89410 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: walk_component+0x74/0x160^M [ 2214.573487] #1: ffff888346f89868 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M
History
#1 Updated by Rishabh Dave 3 months ago
- Subject changed from ceph-fuse stuck causes teuthology job to fail to deadlock in ceph-fuse causes teuthology job to hang and fail
#2 Updated by Venky Shankar 3 months ago
- Category set to Correctness/Safety
- Assignee set to Leonid Usov
- Target version set to v19.0.0
- Component(FS) Client, ceph-fuse added
#3 Updated by Venky Shankar 3 months ago
- Status changed from New to Triaged
#4 Updated by Patrick Donnelly 3 months ago
- Related to Bug #62682: mon: no mdsmap broadcast after "fs set joinable" is set to true added
#5 Updated by Venky Shankar 3 months ago
I don't think this is related to #62682.
#6 Updated by Leonid Usov 3 months ago
I agree with Venky, this doesn't seem to be related to the linked issue. @Patrick, would you mind clarifying why you think these are related?
From what I've learned so far, it does seem like a deadlock. The system hanged while executing the qa/workunits/kernel_untar_build.sh
work unit, during the build step:
2023-09-13T06:10:41.659 INFO:tasks.workunit.client.0.smithi159.stderr:+ mkdir t 2023-09-13T06:10:41.665 INFO:tasks.workunit.client.0.smithi159.stderr:+ cd t 2023-09-13T06:10:41.667 INFO:tasks.workunit.client.0.smithi159.stderr:+ tar xzf ../linux.tar.gz 2023-09-13T06:15:51.129 INFO:tasks.workunit.client.0.smithi159.stderr:+ cd linux-5.4 2023-09-13T06:15:51.133 INFO:tasks.workunit.client.0.smithi159.stderr:+ make defconfig ... 2023-09-13T06:16:11.936 INFO:tasks.workunit.client.0.smithi159.stderr:++ grep -c processor /proc/cpuinfo 2023-09-13T06:16:11.939 INFO:tasks.workunit.client.0.smithi159.stderr:+ make -j8 ... 2023-09-13T06:19:36.099 INFO:tasks.workunit.client.0.smithi159.stdout: CC arch/x86/ia32/sys_ia32.o 2023-09-13T06:19:38.633 INFO:tasks.workunit.client.0.smithi159.stdout: CC kernel/irq/migration.o 2023-09-13T09:10:31.862 DEBUG:teuthology.orchestra.run:got remote process result: 124 2023-09-13T09:11:31.897 DEBUG:teuthology.orchestra.run:timed out waiting; will kill: <Greenlet at 0x7f213277b480: copy_file_to(<paramiko.ChannelFile from <paramiko.Channel 104 (, <Logger tasks.workunit.client.0.smithi159.stderr (, None, False)> 2023-09-13T09:12:31.899 DEBUG:teuthology.orchestra.run:timed out waiting; will kill: <Greenlet at 0x7f213277b6a0: copy_file_to(<paramiko.ChannelFile from <paramiko.Channel 104 (, <Logger tasks.workunit.client.0.smithi159.stdout (, None, False)> 2023-09-13T09:12:31.899 INFO:tasks.workunit:Stopping ['kernel_untar_build.sh'] on client.0... 2023-09-13T09:12:31.900 DEBUG:teuthology.orchestra.run.smithi159:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2023-09-13T09:12:32.787 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): ... File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 181, in _raise_for_status raise CommandFailedError( teuthology.exceptions.CommandFailedError: Command failed (workunit test kernel_untar_build.sh) on smithi159 with status 124:
Note the timestamps.
#7 Updated by Leonid Usov 3 months ago
There are 3 processes of the compiler that seem to be in a deadlock:
[ 1845.449431] INFO: task cc1:128296 blocked for more than 122 seconds. [ 1845.455854] Tainted: G S 6.5.0-rc7-gfef64f93da2b #1 [ 1845.462646] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1845.470467] task:cc1 state:D stack:0 pid:128296 ppid:128293 flags:0x00004000 [ 1845.478899] Call Trace: [ 1845.481352] <TASK> [ 1845.483458] __schedule+0x37d/0xe80 [ 1845.486950] ? find_held_lock+0x2d/0x90 [ 1845.490788] ? __mutex_lock+0x7fc/0xc80 [ 1845.494629] schedule+0x54/0xc0 [ 1845.497774] schedule_preempt_disabled+0x15/0x20 [ 1845.502393] __mutex_lock+0x801/0xc80 [ 1845.506061] ? fuse_lock_inode+0x31/0x40 [fuse] [ 1845.510612] ? fuse_lock_inode+0x31/0x40 [fuse] [ 1845.515144] fuse_lock_inode+0x31/0x40 [fuse] [ 1845.519519] fuse_lookup+0x6e/0x140 [fuse] [ 1845.523619] ? lock_release+0xc5/0x260 [ 1845.527373] ? d_alloc_parallel+0x350/0x970 [ 1845.531560] fuse_atomic_open+0xcc/0x130 [fuse] [ 1845.536100] lookup_open+0x37b/0x630 [ 1845.539681] ? verify_cpu+0x30/0x100 [ 1845.543261] ? verify_cpu+0x30/0x100 [ 1845.546838] path_openat+0x25a/0x9d0 [ 1845.550422] do_filp_open+0xb8/0x120
[ 1845.630231] INFO: task cc1:128369 blocked for more than 123 seconds. [ 1845.636593] Tainted: G S 6.5.0-rc7-gfef64f93da2b #1 [ 1845.643381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1845.651220] task:cc1 state:D stack:0 pid:128369 ppid:128368 flags:0x00004000 [ 1845.659651] Call Trace: [ 1845.662096] <TASK> [ 1845.664196] __schedule+0x37d/0xe80 [ 1845.667687] ? d_alloc_parallel+0x645/0x970 [ 1845.671875] schedule+0x54/0xc0 [ 1845.675019] d_alloc_parallel+0x64a/0x970 [ 1845.679032] ? wake_up_state+0x10/0x10 [ 1845.682786] lookup_open+0x189/0x630 [ 1845.686367] ? verify_cpu+0x30/0x100 [ 1845.689962] ? verify_cpu+0x30/0x100 [ 1845.693561] path_openat+0x25a/0x9d0 [ 1845.697156] do_filp_open+0xb8/0x120
[ 1845.776969] INFO: task cc1:128419 blocked for more than 123 seconds. [ 1845.783336] Tainted: G S 6.5.0-rc7-gfef64f93da2b #1 [ 1845.790123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1845.797940] task:cc1 state:D stack:0 pid:128419 ppid:128416 flags:0x00004000 [ 1845.806391] Call Trace: [ 1845.808836] <TASK> [ 1845.810941] __schedule+0x37d/0xe80 [ 1845.814433] ? find_held_lock+0x2d/0x90 [ 1845.818273] ? __mutex_lock+0x7fc/0xc80 [ 1845.822116] schedule+0x54/0xc0 [ 1845.825279] schedule_preempt_disabled+0x15/0x20 [ 1845.829913] __mutex_lock+0x801/0xc80 [ 1845.833603] ? fuse_lock_inode+0x31/0x40 [fuse] [ 1845.838142] ? fuse_lock_inode+0x31/0x40 [fuse] [ 1845.842679] fuse_lock_inode+0x31/0x40 [fuse] [ 1845.847039] fuse_lookup+0x6e/0x140 [fuse] [ 1845.851139] ? lock_release+0xc5/0x260 [ 1845.854893] ? d_alloc_parallel+0x350/0x970 [ 1845.859078] fuse_atomic_open+0xcc/0x130 [fuse] [ 1845.863623] lookup_open+0x37b/0x630 [ 1845.867218] ? verify_cpu+0x30/0x100 [ 1845.870797] ? verify_cpu+0x30/0x100 [ 1845.874376] path_openat+0x25a/0x9d0 [ 1845.877958] do_filp_open+0xb8/0x120
then the same callstack report repeats
[ 1968.331724] INFO: task cc1:128296 blocked for more than 245 seconds. [ 1968.512704] INFO: task cc1:128369 blocked for more than 245 seconds. [ 1968.659391] INFO: task cc1:128419 blocked for more than 246 seconds. [ 2091.213996] INFO: task cc1:128296 blocked for more than 368 seconds. [ 2091.394662] INFO: task cc1:128369 blocked for more than 368 seconds. [ 2091.541412] INFO: task cc1:128419 blocked for more than 368 seconds. [ 2214.096338] INFO: task cc1:128296 blocked for more than 491 seconds.
#8 Updated by Patrick Donnelly 3 months ago
I don't think it's related either. I was probably trying to link a different ticket but I don't recall which.
#9 Updated by Patrick Donnelly 3 months ago
- Related to deleted (Bug #62682: mon: no mdsmap broadcast after "fs set joinable" is set to true)
#10 Updated by Leonid Usov 2 months ago
http://pulpito.front.sepia.ceph.com/leonidus-2023-10-03_17:11:08-fs-wip-lusov-wip-rishabh-2023sep12-b2-distro-default-smithi/ had the same test pass 50 times. The command to schedule was
$ teuthology-suite -s fs -c wip-lusov-wip-rishabh-2023sep12-b2 --filter-all "bluestore-bitmap,centos_8,pacific,kernel_untar_build" --seed 2531 -N 50 -p 150
#11 Updated by Leonid Usov 2 months ago
A modified version of the test which runs make with -j20 instead of -j8 has passed 50 times (49 in the link below + 1 same config when I validated that my change had applied)
#12 Updated by Leonid Usov 2 months ago
- Status changed from Triaged to Can't reproduce
- Priority changed from Normal to Low
Following a talk with Venky we decided that given the unsuccessful reproduction attempts (see the links above) and the absence of logs from the fuse client we will mark the issue as "Can't Reproduce". We'll be looking for other instances of fuse client deadlocks but for now, there's nothing else that can be done about this issue.
leonidus@teuthology:/a/rishabh-2023-09-12_12:12:15-fs-wip-rishabh-2023sep12-b2-testing-default-smithi/7394785$ find . -name "*.log*" ./ansible.log ./console_logs/smithi141.log ./console_logs/smithi159.log ./console_logs/smithi141_reimage.log ./console_logs/smithi159_reimage.log ./teuthology.log ./supervisor.7394785.log
As a final point to mention, analyzing once again the traces for the deadlocked processes, we see that they are all stuck in the kernel traces and not in the fuse client side. So, it could be a bug with the fuse kernel module rather than ceph_fuse client daemon.
We can see that the problematic flow is the `lookup_open` kernel function which was entered by three different processes, presumably for the same directory (?)
[ 2214.459965] 2 locks held by cc1/128296:^M [ 2214.463796] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.473009] #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 1845.515144] fuse_lock_inode+0x31/0x40 [fuse] [ 1845.519519] fuse_lookup+0x6e/0x140 [fuse] [ 1845.523619] ? lock_release+0xc5/0x260 [ 1845.527373] ? d_alloc_parallel+0x350/0x970 [ 1845.531560] fuse_atomic_open+0xcc/0x130 [fuse] [ 1845.536100] lookup_open+0x37b/0x630 -------------------------------------- [ 2214.503611] 1 lock held by cc1/128369:^M [ 2214.507356] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 1845.671875] schedule+0x54/0xc0 [ 1845.675019] d_alloc_parallel+0x64a/0x970 [ 1845.679032] ? wake_up_state+0x10/0x10 [ 1845.682786] lookup_open+0x189/0x630 -------------------------------------- [ 2214.516569] 2 locks held by cc1/128419:^M [ 2214.520400] #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.529613] #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 1845.515144] fuse_lock_inode+0x31/0x40 [fuse] [ 1845.519519] fuse_lookup+0x6e/0x140 [fuse] [ 1845.523619] ? lock_release+0xc5/0x260 [ 1845.527373] ? d_alloc_parallel+0x350/0x970 [ 1845.531560] fuse_atomic_open+0xcc/0x130 [fuse] [ 1845.536100] lookup_open+0x37b/0x630
The internet search for similar deadlocks didn't give any matches
#13 Updated by Leonid Usov 2 months ago
Thanks to challenges from Shachar Sharon, I continued to look into it. I focused on the lock list because it's rather confusing.
First of all, here is the relevant code that prints the locks held: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/blob/c8s/kernel/locking/lockdep.c#L749
We can tell from this function that the locks we see listed
- are per kernel task, i.e. per thread - despite the fact that the id we see printed is that of a process this task belongs to.
- are locks that are held, i.e. have been acquired by the said task
- are in the historical order of acquisition
We can immediately spot some peculiar things related to two of the three hanging processes:
hanging process 128296 is holding the same locks as another process 128348
[ 1846.145932] 2 locks held by cc1/128296: [ 1846.149764] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0 [ 1846.158992] #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse] [ 1846.167772] 2 locks held by cc1/128348: [ 1846.171603] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0 [ 1846.180844] #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]
hanging process 128419 is holding the same two locks as another process 128466
[ 1846.202613] 2 locks held by cc1/128419: [ 1846.206444] #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0 [ 1846.215659] #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse] [ 1846.224447] 2 locks held by cc1/128466: [ 1846.228284] #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0 [ 1846.237499] #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]
How is it possible that the same mutex is held by different tasks? Let's look at the lock types. &type->i_mutex_dir_key#7
is actually an rwsem, read-write semaphore, which explains how multiple processes can hold the same instance - apparently, for read:
https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/blob/c8s/fs/inode.c#L965
lockdep_set_class(&inode->i_rwsem, &type->i_mutex_dir_key);
It seems like the flow is inlined into the path_openat
function via its call into do_last
: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/blob/c8s/fs/namei.c?ref_type=heads#L3718
However, the &fi->mutex
lock type is really a mutex (https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/blob/c8s/fs/fuse/inode.c?ref_type=heads#L87)
mutex_init(&fi->mutex);
This poses a big question, how come the same instance of the type is held by several threads? While that on its own isn't a deadlock, it may cause one because clients assume this can't happen. That is unless I am missing something...
For the curious, there is another piece of info we see in the logs that's however not helpful in our case, as far as I can tell. In the curly braces, we see encoded lock usage history bits, namely (&type->i_mutex_dir_key#7){++++}
and (&fi->mutex){+.+.}
See this flow here https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/blob/c8s/kernel/locking/lockdep.c#L708. There's also some documentation available about these flags: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/blob/c8s/Documentation/locking/lockdep-design.rst?ref_type=heads
In short, these bits show whether the lock was ever taken for write/read in the hard/soft IRQ environment and whether IRQs were disabled or enabled at that point. The four chars mean (hard IRQ w, hard IRD r, soft IRQ w, sort IRQ r). The default mode is .
, which is another evidence that (&fi->mutex)
is never locked for reading, only exclusively, as opposed to (&type->i_mutex_dir_key#7)
What we see is that these locks were only ever taken outside of an IRQ context, whether soft or hard, and while IRQs were enabled, meaning that the locks are considered "IRQ safe" by the lock validator logic. Not interesting for us because it doesn't tell us what the problem is, and the validator never spotted any issues with how these locks were used.
#14 Updated by Leonid Usov 2 months ago
Good news, I stand corrected!
The lockdep_depth is increased in the function __lock_acquire
which is called before the lock is actually acquired. From the docs
__acquire | lock _____ | \ | __contended | | | <wait> | _______/ |/ | __acquired | . <hold> . | __release | unlock lock, unlock - the regular lock functions __* - the hooks <> - states
So, the last lock in the list may be the one that's currently being waited for
#15 Updated by Leonid Usov 2 months ago
OK, so the task
[ 1968.512704] INFO: task cc1:128369 blocked for more than 245 seconds. [ 1968.519057] Tainted: G S 6.5.0-rc7-gfef64f93da2b #1 [ 1968.525860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1968.533697] task:cc1 state:D stack:0 pid:128369 ppid:128368 flags:0x00004000 [ 1968.542145] Call Trace: [ 1968.544598] <TASK> [ 1968.546704] __schedule+0x37d/0xe80 [ 1968.550213] ? d_alloc_parallel+0x645/0x970 [ 1968.554402] schedule+0x54/0xc0 [ 1968.557545] d_alloc_parallel+0x64a/0x970 [ 1968.561561] ? wake_up_state+0x10/0x10
is in the schedule
which it does while waiting for other lookups for the same directory to finish
/* * somebody is likely to be still doing lookup for it; * wait for them to finish */ spin_lock(&dentry->d_lock); d_wait_lookup(dentry);
static void d_wait_lookup(struct dentry *dentry) { if (d_in_lookup(dentry)) { DECLARE_WAITQUEUE(wait, current); add_wait_queue(dentry->d_wait, &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); spin_unlock(&dentry->d_lock); schedule(); spin_lock(&dentry->d_lock); } while (d_in_lookup(dentry)); } }
So, this task is a victim of others who keep performing the lookups on the dir. One suspect less.
#16 Updated by Leonid Usov 2 months ago
The two other hung tasks are both apparently sleeping on their corresponding mutexes, trying to acquire them
[ 1845.478899] Call Trace: [ 1845.481352] <TASK> [ 1845.483458] __schedule+0x37d/0xe80 [ 1845.486950] ? find_held_lock+0x2d/0x90 [ 1845.490788] ? __mutex_lock+0x7fc/0xc80 [ 1845.494629] schedule+0x54/0xc0 [ 1845.497774] schedule_preempt_disabled+0x15/0x20 [ 1845.502393] __mutex_lock+0x801/0xc80 [ 1845.506061] ? fuse_lock_inode+0x31/0x40 [fuse] [ 1845.510612] ? fuse_lock_inode+0x31/0x40 [fuse] [ 1845.515144] fuse_lock_inode+0x31/0x40 [fuse]
which means they can't be the ones to blame. That makes their peer processes which own said mutexes responsible for not releasing the mutexes for that long:
[ 1846.167772] 2 locks held by cc1/128348: [ 1846.171603] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0 [ 1846.180844] #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse] [ 1846.224447] 2 locks held by cc1/128466: [ 1846.228284] #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0 [ 1846.237499] #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]
These processes, however, aren't reported as hung. Meaning that they must be waking up regularly, but not finishing the task in a timely fashion. At this point, without fuse client logs, it's hard to say what those processes might be up to, but Shachar's interpretation of this issue seems the most relevant: we observe some slowness in the fuse client which results in this stack trace. It's not clear whether the lookup flow is to blame as we don't have the stack traces for the pids in question. They are holding a generic inode lock so it could be anything.
#17 Updated by Leonid Usov 2 months ago
- Subject changed from deadlock in ceph-fuse causes teuthology job to hang and fail to Slowness or deadlock in ceph-fuse causes teuthology job to hang and fail