Project

General

Profile

Actions

Bug #62863

closed

Slowness or deadlock in ceph-fuse causes teuthology job to hang and fail

Added by Rishabh Dave 8 months ago. Updated 8 months ago.

Status:
Can't reproduce
Priority:
Low
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/rishabh-2023-09-12_12:12:15-fs-wip-rishabh-2023sep12-b2-testing-default-smithi/7394785/

It seems that this due to a deadlock in ceph-fuse, the read-write on CephFS (which was mounted using ceph-fuse) was stuck. Due to this the job was stuck for almost 3 hours after which the job terminated itself.

From teuthology.log -

2023-09-13T06:19:38.633 INFO:tasks.workunit.client.0.smithi159.stdout:  CC      kernel/irq/migration.o
2023-09-13T09:10:31.862 DEBUG:teuthology.orchestra.run:got remote process result: 124
2023-09-13T09:11:31.897 DEBUG:teuthology.orchestra.run:timed out waiting; will kill: <Greenlet at 0x7f213277b480: copy_file_to(<paramiko.ChannelFile from <paramiko.Channel 104 (, <Logger tasks.workunit.client.0.smithi159.stderr (, None, False)>
2023-09-13T09:12:31.899 DEBUG:teuthology.orchestra.run:timed out waiting; will kill: <Greenlet at 0x7f213277b6a0: copy_file_to(<paramiko.ChannelFile from <paramiko.Channel 104 (, <Logger tasks.workunit.client.0.smithi159.stdout (, None, False)>
2023-09-13T09:12:31.899 INFO:tasks.workunit:Stopping ['kernel_untar_build.sh'] on client.0...
2023-09-13T09:12:31.900 DEBUG:teuthology.orchestra.run.smithi159:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2023-09-13T09:12:32.787 ERROR:teuthology.run_tasks:Saw exception from tasks.

Note the time difference between following 2 entries -

2023-09-13T06:19:38.633 INFO:tasks.workunit.client.0.smithi159.stdout:  CC      kernel/irq/migration.o
2023-09-13T09:10:31.862 DEBUG:teuthology.orchestra.run:got remote process result: 124

From /rishabh-2023-09-12_12:12:15-fs-wip-rishabh-2023sep12-b2-testing-default-smithi/7394785/console_logs/smithi159.log -

[ 2214.285332] Showing all locks held in the system:^M
[ 2214.291512] 1 lock held by rcu_tasks_kthre/12:^M
[ 2214.295954]  #0: ffffffff8255dd50 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0x420^M
[ 2214.305517] 1 lock held by rcu_tasks_rude_/13:^M
[ 2214.309960]  #0: ffffffff8255dad0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0x420^M
[ 2214.319953] 1 lock held by khungtaskd/56:^M
[ 2214.323965]  #0: ffffffff8255e7c0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x12/0x1d0^M
[ 2214.332942] 1 lock held by ceph-fuse/128555:^M
[ 2214.337208]  #0: ffff888101ebdc68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M
[ 2214.346091] 1 lock held by ceph-fuse/128557:^M
[ 2214.350357]  #0: ffff888107c4ca68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M
[ 2214.359242] 1 lock held by ceph-fuse/128560:^M
[ 2214.363530]  #0: ffff88810d542268 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M
[ 2214.372396] 1 lock held by ceph-fuse/128571:^M
[ 2214.376667]  #0: ffff888160d11a68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M
[ 2214.385535] 1 lock held by ceph-fuse/128577:^M
[ 2214.389807]  #0: ffff888101ebe268 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M
[ 2214.398674] 1 lock held by ceph-fuse/128581:^M
[ 2214.402956]  #0: ffff888160d11468 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M
[ 2214.411820] 1 lock held by ceph-fuse/128593:^M
[ 2214.416084]  #0: ffff888101ebea68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M
[ 2214.424970] 1 lock held by ceph-fuse/128596:^M
[ 2214.429241]  #0: ffff88810d540868 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M
[ 2214.438111] 2 locks held by cc1/128215:^M
[ 2214.441947]  #0: ffff8883324771d0 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M
[ 2214.451159]  #1: ffff888332477628 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M
[ 2214.459965] 2 locks held by cc1/128296:^M
[ 2214.463796]  #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M
[ 2214.473009]  #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M
[ 2214.481788] 2 locks held by cc1/128348:^M
[ 2214.485619]  #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M
[ 2214.494833]  #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M
[ 2214.503611] 1 lock held by cc1/128369:^M
[ 2214.507356]  #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M
[ 2214.516569] 2 locks held by cc1/128419:^M
[ 2214.520400]  #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M
[ 2214.529613]  #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M
[ 2214.538412] 2 locks held by cc1/128466:^M
[ 2214.542248]  #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M
[ 2214.551462]  #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M
[ 2214.560251] 2 locks held by cc1/128595:^M
[ 2214.564088]  #0: ffff888346f89410 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: walk_component+0x74/0x160^M
[ 2214.573487]  #1: ffff888346f89868 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M

Actions

Also available in: Atom PDF