Actions
Bug #62863
closedSlowness or deadlock in ceph-fuse causes teuthology job to hang and fail
Status:
Can't reproduce
Priority:
Low
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
It seems that this due to a deadlock in ceph-fuse, the read-write on CephFS (which was mounted using ceph-fuse) was stuck. Due to this the job was stuck for almost 3 hours after which the job terminated itself.
From teuthology.log -
2023-09-13T06:19:38.633 INFO:tasks.workunit.client.0.smithi159.stdout: CC kernel/irq/migration.o 2023-09-13T09:10:31.862 DEBUG:teuthology.orchestra.run:got remote process result: 124 2023-09-13T09:11:31.897 DEBUG:teuthology.orchestra.run:timed out waiting; will kill: <Greenlet at 0x7f213277b480: copy_file_to(<paramiko.ChannelFile from <paramiko.Channel 104 (, <Logger tasks.workunit.client.0.smithi159.stderr (, None, False)> 2023-09-13T09:12:31.899 DEBUG:teuthology.orchestra.run:timed out waiting; will kill: <Greenlet at 0x7f213277b6a0: copy_file_to(<paramiko.ChannelFile from <paramiko.Channel 104 (, <Logger tasks.workunit.client.0.smithi159.stdout (, None, False)> 2023-09-13T09:12:31.899 INFO:tasks.workunit:Stopping ['kernel_untar_build.sh'] on client.0... 2023-09-13T09:12:31.900 DEBUG:teuthology.orchestra.run.smithi159:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2023-09-13T09:12:32.787 ERROR:teuthology.run_tasks:Saw exception from tasks.
Note the time difference between following 2 entries -
2023-09-13T06:19:38.633 INFO:tasks.workunit.client.0.smithi159.stdout: CC kernel/irq/migration.o 2023-09-13T09:10:31.862 DEBUG:teuthology.orchestra.run:got remote process result: 124
From /rishabh-2023-09-12_12:12:15-fs-wip-rishabh-2023sep12-b2-testing-default-smithi/7394785/console_logs/smithi159.log
-
[ 2214.285332] Showing all locks held in the system:^M [ 2214.291512] 1 lock held by rcu_tasks_kthre/12:^M [ 2214.295954] #0: ffffffff8255dd50 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0x420^M [ 2214.305517] 1 lock held by rcu_tasks_rude_/13:^M [ 2214.309960] #0: ffffffff8255dad0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0x420^M [ 2214.319953] 1 lock held by khungtaskd/56:^M [ 2214.323965] #0: ffffffff8255e7c0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x12/0x1d0^M [ 2214.332942] 1 lock held by ceph-fuse/128555:^M [ 2214.337208] #0: ffff888101ebdc68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.346091] 1 lock held by ceph-fuse/128557:^M [ 2214.350357] #0: ffff888107c4ca68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.359242] 1 lock held by ceph-fuse/128560:^M [ 2214.363530] #0: ffff88810d542268 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.372396] 1 lock held by ceph-fuse/128571:^M [ 2214.376667] #0: ffff888160d11a68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.385535] 1 lock held by ceph-fuse/128577:^M [ 2214.389807] #0: ffff888101ebe268 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.398674] 1 lock held by ceph-fuse/128581:^M [ 2214.402956] #0: ffff888160d11468 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.411820] 1 lock held by ceph-fuse/128593:^M [ 2214.416084] #0: ffff888101ebea68 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.424970] 1 lock held by ceph-fuse/128596:^M [ 2214.429241] #0: ffff88810d540868 (&pipe->mutex/1){+.+.}-{3:3}, at: splice_file_to_pipe+0x2a/0x80^M [ 2214.438111] 2 locks held by cc1/128215:^M [ 2214.441947] #0: ffff8883324771d0 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.451159] #1: ffff888332477628 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.459965] 2 locks held by cc1/128296:^M [ 2214.463796] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.473009] #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.481788] 2 locks held by cc1/128348:^M [ 2214.485619] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.494833] #1: ffff8883360805a8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.503611] 1 lock held by cc1/128369:^M [ 2214.507356] #0: ffff888336080150 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.516569] 2 locks held by cc1/128419:^M [ 2214.520400] #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.529613] #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.538412] 2 locks held by cc1/128466:^M [ 2214.542248] #0: ffff888265700790 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x5ca/0x9d0^M [ 2214.551462] #1: ffff888265700be8 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M [ 2214.560251] 2 locks held by cc1/128595:^M [ 2214.564088] #0: ffff888346f89410 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: walk_component+0x74/0x160^M [ 2214.573487] #1: ffff888346f89868 (&fi->mutex){+.+.}-{3:3}, at: fuse_lock_inode+0x31/0x40 [fuse]^M
Actions