Project

General

Profile

Actions

Bug #17153

closed

kernel hung task warnings on teuthology.front kernel

Added by Jeff Layton over 7 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

teuthology.front.sepia.ceph.com was recently upgraded to the ubuntu distro kernel (4.4.0-34-generic). Since then, I'm seeing some warnings in the ring buffer:

[998531.323457] INFO: task nginx:2786 blocked for more than 120 seconds.
[998531.323489]       Not tainted 4.4.0-34-generic #53-Ubuntu
[998531.323506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[998531.323529] nginx           D ffff880819a4b968     0  2786   2774 0x00000000
[998531.323533]  ffff880819a4b968 ffff88081ec00d80 ffff88081bb6e040 ffff880813a26040
[998531.323536]  ffff880819a4c000 ffff88081f3d6d00 7fffffffffffffff ffff880819a4bae0
[998531.323538]  ffffffff8182a610 ffff880819a4b980 ffffffff81829e15 0000000000000000
[998531.323540] Call Trace:
[998531.323564]  [<ffffffff8182a610>] ? bit_wait+0x60/0x60
[998531.323567]  [<ffffffff81829e15>] schedule+0x35/0x80
[998531.323569]  [<ffffffff8182cf35>] schedule_timeout+0x1b5/0x270
[998531.323573]  [<ffffffff811eb0a9>] ? get_partial_node.isra.61+0x1c9/0x200
[998531.323577]  [<ffffffff8106426e>] ? kvm_clock_get_cycles+0x1e/0x20
[998531.323579]  [<ffffffff8182a610>] ? bit_wait+0x60/0x60
[998531.323581]  [<ffffffff81829344>] io_schedule_timeout+0xa4/0x110
[998531.323583]  [<ffffffff8182a62b>] bit_wait_io+0x1b/0x70
[998531.323585]  [<ffffffff8182a3ae>] __wait_on_bit_lock+0x4e/0xb0
[998531.323590]  [<ffffffff8118d43b>] __lock_page+0xbb/0xe0
[998531.323594]  [<ffffffff810c3ce0>] ? autoremove_wake_function+0x40/0x40
[998531.323599]  [<ffffffff8123f981>] __generic_file_splice_read+0x4b1/0x5c0
[998531.323601]  [<ffffffff8123e2f0>] ? page_cache_pipe_buf_release+0x20/0x20
[998531.323605]  [<ffffffff8179dd93>] ? inet_sendpage+0x73/0xd0
[998531.323607]  [<ffffffff8123fe72>] generic_file_splice_read+0x42/0x80
[998531.323609]  [<ffffffff8123e779>] do_splice_to+0x69/0x80
[998531.323611]  [<ffffffff8123e84a>] splice_direct_to_actor+0xba/0x210
[998531.323613]  [<ffffffff8123e200>] ? do_splice_from+0x30/0x30
[998531.323615]  [<ffffffff8123ea38>] do_splice_direct+0x98/0xd0
[998531.323618]  [<ffffffff8120dd2f>] do_sendfile+0x1bf/0x3a0
[998531.323620]  [<ffffffff8120e96e>] SyS_sendfile64+0x5e/0xb0
[998531.323622]  [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71
[998531.323672] INFO: task teuthology:26992 blocked for more than 120 seconds.
[998531.323693]       Not tainted 4.4.0-34-generic #53-Ubuntu
[998531.323710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[998531.323732] teuthology      D ffff8802df4379a8     0 26992  25730 0x00000000
[998531.323734]  ffff8802df4379a8 ffff88081549e068 ffff880812fc0000 ffff8805e743d280
[998531.323736]  ffff8802df438000 ffff88081f596d00 7fffffffffffffff ffff8802df437b20
[998531.323738]  ffffffff8182a610 ffff8802df4379c0 ffffffff81829e15 0000000000000000
[998531.323740] Call Trace:
[998531.323743]  [<ffffffff8182a610>] ? bit_wait+0x60/0x60
[998531.323745]  [<ffffffff81829e15>] schedule+0x35/0x80
[998531.323746]  [<ffffffff8182cf35>] schedule_timeout+0x1b5/0x270
[998531.323750]  [<ffffffff81405dd5>] ? find_next_bit+0x15/0x20
[998531.323754]  [<ffffffff813f0fbf>] ? cpumask_next_and+0x2f/0x40
[998531.323756]  [<ffffffff810bcb65>] ? update_sd_lb_stats+0x115/0x520
[998531.323758]  [<ffffffff8106426e>] ? kvm_clock_get_cycles+0x1e/0x20
[998531.323760]  [<ffffffff8182a610>] ? bit_wait+0x60/0x60
[998531.323762]  [<ffffffff81829344>] io_schedule_timeout+0xa4/0x110
[998531.323764]  [<ffffffff8182a62b>] bit_wait_io+0x1b/0x70
[998531.323766]  [<ffffffff8182a3ae>] __wait_on_bit_lock+0x4e/0xb0
[998531.323768]  [<ffffffff8118d43b>] __lock_page+0xbb/0xe0
[998531.323771]  [<ffffffff810c3ce0>] ? autoremove_wake_function+0x40/0x40
[998531.323773]  [<ffffffff8118e79d>] pagecache_get_page+0x17d/0x1c0
[998531.323788]  [<ffffffffc039233a>] ? ceph_pool_perm_check+0x5a/0x700 [ceph]
[998531.323791]  [<ffffffff8118e806>] grab_cache_page_write_begin+0x26/0x40
[998531.323797]  [<ffffffffc0391638>] ceph_write_begin+0x48/0xe0 [ceph]
[998531.323799]  [<ffffffff8118db4e>] generic_perform_write+0xce/0x1c0
[998531.323803]  [<ffffffff812282a9>] ? file_update_time+0xc9/0x110
[998531.323809]  [<ffffffffc038c16a>] ceph_write_iter+0xf8a/0x1050 [ceph]
[998531.323812]  [<ffffffff8122dbc4>] ? mntput+0x24/0x40
[998531.323814]  [<ffffffff8121777d>] ? terminate_walk+0xbd/0xd0
[998531.323817]  [<ffffffff8121ce11>] ? filename_lookup+0xf1/0x180
[998531.323819]  [<ffffffff811ebac7>] ? kmem_cache_alloc+0x187/0x1f0
[998531.323821]  [<ffffffff8121c9d6>] ? getname_flags+0x56/0x1f0
[998531.323823]  [<ffffffff8120c97b>] new_sync_write+0x9b/0xe0
[998531.323825]  [<ffffffff8120c9e6>] __vfs_write+0x26/0x40
[998531.323827]  [<ffffffff8120d369>] vfs_write+0xa9/0x1a0
[998531.323828]  [<ffffffff8120e025>] SyS_write+0x55/0xc0
[998531.323830]  [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71
[998531.323833] INFO: task teuthology:2191 blocked for more than 120 seconds.
[998531.323853]       Not tainted 4.4.0-34-generic #53-Ubuntu
[998531.323869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[998531.323892] teuthology      D ffff880005c7b9a8     0  2191   6398 0x00000000
[998531.323894]  ffff880005c7b9a8 ffff88081549e068 ffff880620cce040 ffff880035c9d280
[998531.323896]  ffff880005c7c000 ffff88081f5d6d00 7fffffffffffffff ffff880005c7bb20
[998531.323898]  ffffffff8182a610 ffff880005c7b9c0 ffffffff81829e15 0000000000000000
[998531.323900] Call Trace:
[998531.323902]  [<ffffffff8182a610>] ? bit_wait+0x60/0x60
[998531.323904]  [<ffffffff81829e15>] schedule+0x35/0x80
[998531.323906]  [<ffffffff8182cf35>] schedule_timeout+0x1b5/0x270
[998531.323908]  [<ffffffff81749bf4>] ? sch_direct_xmit+0x74/0x220
[998531.323910]  [<ffffffff810ca961>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[998531.323912]  [<ffffffff8106426e>] ? kvm_clock_get_cycles+0x1e/0x20
[998531.323914]  [<ffffffff8182a610>] ? bit_wait+0x60/0x60
[998531.323916]  [<ffffffff81829344>] io_schedule_timeout+0xa4/0x110
[998531.323918]  [<ffffffff8182a62b>] bit_wait_io+0x1b/0x70
[998531.323920]  [<ffffffff8182a3ae>] __wait_on_bit_lock+0x4e/0xb0
[998531.323922]  [<ffffffff8118d43b>] __lock_page+0xbb/0xe0
[998531.323924]  [<ffffffff810c3ce0>] ? autoremove_wake_function+0x40/0x40
[998531.323926]  [<ffffffff8118e79d>] pagecache_get_page+0x17d/0x1c0
[998531.323933]  [<ffffffffc039233a>] ? ceph_pool_perm_check+0x5a/0x700 [ceph]
[998531.323935]  [<ffffffff8118e806>] grab_cache_page_write_begin+0x26/0x40
[998531.323940]  [<ffffffffc0391638>] ceph_write_begin+0x48/0xe0 [ceph]
[998531.323943]  [<ffffffff8118db4e>] generic_perform_write+0xce/0x1c0
[998531.323945]  [<ffffffff812282a9>] ? file_update_time+0xc9/0x110
[998531.323950]  [<ffffffffc038c16a>] ceph_write_iter+0xf8a/0x1050 [ceph]
[998531.323952]  [<ffffffff8122dbc4>] ? mntput+0x24/0x40
[998531.323955]  [<ffffffff8121777d>] ? terminate_walk+0xbd/0xd0
[998531.323957]  [<ffffffff8121ce11>] ? filename_lookup+0xf1/0x180
[998531.323959]  [<ffffffff811ebac7>] ? kmem_cache_alloc+0x187/0x1f0
[998531.323962]  [<ffffffff8121c9d6>] ? getname_flags+0x56/0x1f0
[998531.323964]  [<ffffffff8120c97b>] new_sync_write+0x9b/0xe0
[998531.323966]  [<ffffffff8120c9e6>] __vfs_write+0x26/0x40
[998531.323968]  [<ffffffff8120d369>] vfs_write+0xa9/0x1a0
[998531.323970]  [<ffffffff8120e025>] SyS_write+0x55/0xc0
[998531.323972]  [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71

I don't have sudo perms on this box, so I can't verify whether the tasks are still hung there, but I see several processes with matching pids that have been running for a very long time.

Actions

Also available in: Atom PDF