Bug #58564
workunit suites/dbench.sh fails with error code 1
0%
Description
This dbench test job fails with error code 1 and error message write failed on handle 11133 (Resource temporarily unavailable)
.
Link to the job - https://pulpito.ceph.com/rishabh-2023-01-23_18:53:32-fs-wip-rishabh-testing-2022Nov24-11Jan2023-distro-default-smithi/7134091/-
Error messages from teuthology.log
-
2023-01-24T05:11:01.804 INFO:tasks.workunit.client.0.smithi067.stdout:[941246] write failed on handle 10813 (Resource temporarily unavailable) 2023-01-24T05:11:01.805 INFO:tasks.workunit.client.0.smithi067.stdout:Child failed with status 1 2023-01-24T05:11:01.806 DEBUG:teuthology.orchestra.run:got remote process result: 1 2023-01-24T05:11:01.807 INFO:tasks.workunit:Stopping ['suites/dbench.sh'] on client.0...
2023-01-24T04:52:25.251303+00:00 smithi067 kernel: <TASK> 2023-01-24T04:52:25.251326+00:00 smithi067 kernel: dump_stack_lvl+0x59/0x71 2023-01-24T04:52:25.251350+00:00 smithi067 kernel: check_noncircular+0xfe/0x110 2023-01-24T04:52:25.251376+00:00 smithi067 kernel: ? __lock_acquire+0xf12/0x1e40 2023-01-24T04:52:25.251402+00:00 smithi067 kernel: __lock_acquire+0x108b/0x1e40 2023-01-24T04:52:25.251430+00:00 smithi067 kernel: lock_acquire+0xd8/0x300 2023-01-24T04:52:25.251455+00:00 smithi067 kernel: ? static_key_slow_inc+0x12/0x30 2023-01-24T04:52:25.251479+00:00 smithi067 kernel: ? freezer_write+0x1d6/0x560 2023-01-24T04:52:25.251503+00:00 smithi067 kernel: cpus_read_lock+0x40/0xd0 2023-01-24T04:52:25.251529+00:00 smithi067 kernel: ? static_key_slow_inc+0x12/0x30 2023-01-24T04:52:25.251553+00:00 smithi067 kernel: static_key_slow_inc+0x12/0x30 2023-01-24T04:52:25.251576+00:00 smithi067 kernel: freezer_apply_state+0x98/0xb0 2023-01-24T04:52:25.251600+00:00 smithi067 kernel: freezer_write+0x327/0x560 2023-01-24T04:52:25.251627+00:00 smithi067 kernel: cgroup_file_write+0x95/0x230 2023-01-24T04:52:25.251652+00:00 smithi067 kernel: kernfs_fop_write_iter+0x13b/0x1d0 2023-01-24T04:52:25.251675+00:00 smithi067 kernel: vfs_write+0x348/0x4d0 2023-01-24T04:52:25.251699+00:00 smithi067 kernel: ksys_write+0x60/0xe0 2023-01-24T04:52:25.251723+00:00 smithi067 kernel: do_syscall_64+0x38/0x80 2023-01-24T04:52:25.251746+00:00 smithi067 kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd 2023-01-24T04:52:25.251769+00:00 smithi067 kernel: RIP: 0033:0x565371afab9b
The same failure was also seen here - http://pulpito.front.sepia.ceph.com/rishabh-2022-11-28_08:04:11-fs-wip-rishabh-testing-2022Nov24-1818-testing-default-smithi/7093328
History
#1 Updated by Rishabh Dave about 2 months ago
- Description updated (diff)
#2 Updated by Xiubo Li about 2 months ago
Rishabh,
BTW, is there any other call trace in the dmesg ? Currently I couldn't access the Sepia nodes so I couldn't read the logs.
Thanks
#3 Updated by Rishabh Dave about 1 month ago
Hi, Xiubo. I didn't check dmesg so I don't know if it had a call trace in it.
#4 Updated by Xiubo Li about 1 month ago
It's a kernel cgroup core deadlock bug:
====================================================== WARNING: possible circular locking dependency detected 6.1.0-rc5-ceph-gc90f64b588ff #1 Tainted: G S ------------------------------------------------------ runc/90769 is trying to acquire lock: ffffffff82664cb0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_inc+0xe/0x20 #012but task is already holding lock: ffffffff8276e468 (freezer_mutex){+.+.}-{3:3}, at: freezer_write+0x89/0x530 #012which lock already depends on the new lock. #012the existing dependency chain (in reverse order) is: #012-> #2 (freezer_mutex){+.+.}-{3:3}: __mutex_lock+0x9c/0xf20 freezer_attach+0x2c/0xf0 cgroup_migrate_execute+0x3f3/0x4c0 cgroup_attach_task+0x22e/0x3e0 __cgroup1_procs_write.constprop.12+0xfb/0x140 cgroup_file_write+0x91/0x230 kernfs_fop_write_iter+0x137/0x1d0 vfs_write+0x344/0x4d0 ksys_write+0x5c/0xd0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd #012-> #1 (cgroup_threadgroup_rwsem){++++}-{0:0}: percpu_down_write+0x45/0x2c0 cgroup_procs_write_start+0x84/0x270 __cgroup1_procs_write.constprop.12+0x57/0x140 cgroup_file_write+0x91/0x230 kernfs_fop_write_iter+0x137/0x1d0 vfs_write+0x344/0x4d0 ksys_write+0x5c/0xd0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd #012-> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x103f/0x1de0 lock_acquire+0xd4/0x2f0 cpus_read_lock+0x3c/0xd0 static_key_slow_inc+0xe/0x20 freezer_apply_state+0x98/0xb0 freezer_write+0x307/0x530 cgroup_file_write+0x91/0x230 kernfs_fop_write_iter+0x137/0x1d0 vfs_write+0x344/0x4d0 ksys_write+0x5c/0xd0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd #012other info that might help us debug this: Chain exists of:#012 cpu_hotplug_lock --> cgroup_threadgroup_rwsem --> freezer_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(freezer_mutex); lock(cgroup_threadgroup_rwsem); lock(freezer_mutex); lock(cpu_hotplug_lock); #012 *** DEADLOCK *** 5 locks held by runc/90769: #0: ffff88810bd064e8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x44/0x50 #1: ffff88817f828448 (sb_writers#6){.+.+}-{0:0}, at: ksys_write+0x5c/0xd0 #2: ffff888104cf5688 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x104/0x1d0 #3: ffff888107f885f0 (kn->active#170){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10d/0x1d0 #4: ffffffff8276e468 (freezer_mutex){+.+.}-{3:3}, at: freezer_write+0x89/0x530 #012stack backtrace: CPU: 3 PID: 90769 Comm: runc Tainted: G S 6.1.0-rc5-ceph-gc90f64b588ff #1 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 Call Trace: <TASK> dump_stack_lvl+0x55/0x6d check_noncircular+0xfe/0x110 ? __lock_acquire+0xec6/0x1de0 __lock_acquire+0x103f/0x1de0 lock_acquire+0xd4/0x2f0 ? static_key_slow_inc+0xe/0x20 ? freezer_write+0x1c6/0x530 cpus_read_lock+0x3c/0xd0 ? static_key_slow_inc+0xe/0x20 static_key_slow_inc+0xe/0x20 freezer_apply_state+0x98/0xb0 freezer_write+0x307/0x530 cgroup_file_write+0x91/0x230 kernfs_fop_write_iter+0x137/0x1d0 vfs_write+0x344/0x4d0 ksys_write+0x5c/0xd0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x558e89efab9b Code: e8 8a 2e fb ff eb 88 cc cc cc cc cc cc cc cc e8 7b 74 fb ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 RSP: 002b:000000c00019c010 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000c000031000 RCX: 0000558e89efab9b RDX: 0000000000000006 RSI: 000000c00019c1c8 RDI: 000000000000000d RBP: 000000c00019c060 R08: 0000000000000001 R09: 000000c00028ed20 R10: 0000000000000006 R11: 0000000000000202 R12: 000000c00019c0f0 R13: 0000000000000000 R14: 000000c0000001a0 R15: 00007f38d2e1476c </TASK>
#5 Updated by Xiubo Li about 1 month ago
Reported this to linux kernel mail list: https://lore.kernel.org/lkml/768be93b-a401-deab-600c-f946e0bd27fa@redhat.com/T/.
#6 Updated by Xiubo Li about 1 month ago
- Status changed from New to In Progress