Project

General

Profile

Actions

Bug #15891

closed

[rbd] i/o to rbd block device stopped constantly

Added by Shinobu Kinjo almost 8 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Crash signature (v1):
Crash signature (v2):

Description

I/O to rbd block device stopped working with the following message.
RBD is utilized by xfs and used for dovecot!

Checking tcpdump on cluster node mon, osd, there was no packet from clients.
``do_fsync`` seems to not able to do fsync.

  1. RBD Client
  1. message
    May 15 20:31:28 dvct01b kernel: INFO: task kswapd0:76 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: kswapd0 D ffff880425fe2280 0 76 2 0x00000000
    May 15 20:31:28 dvct01b kernel: ffff880425817910 0000000000000046 ffff880425fe2280 ffff880425817fd8
    May 15 20:31:28 dvct01b kernel: ffff880425817fd8 ffff880425817fd8 ffff880425fe2280 00000110000a41c0
    May 15 20:31:28 dvct01b kernel: ffff88042745b528 0000000000000000 ffff880424db1c18 ffff880425fe2280
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0229d3e>] xfs_log_force_lsn+0x2e/0x90 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa021bfd9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa02184c7>] __xfs_iunpin_wait+0xa7/0x150 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
    May 15 20:31:28 dvct01b kernel: [<ffffffffa021bfd9>] xfs_iunpin_wait+0x19/0x20 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa021085c>] xfs_reclaim_inode+0x8c/0x350 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0210d87>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0211933>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa02208a5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff811e0c48>] prune_super+0xe8/0x170
    May 15 20:31:28 dvct01b kernel: [<ffffffff8117c555>] shrink_slab+0x165/0x300
    May 15 20:31:28 dvct01b kernel: [<ffffffff811d5eb1>] ? vmpressure+0x61/0x90
    May 15 20:31:28 dvct01b kernel: [<ffffffff811801c1>] balance_pgdat+0x4b1/0x5e0
    May 15 20:31:28 dvct01b kernel: [<ffffffff81180463>] kswapd+0x173/0x450
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
    May 15 20:31:28 dvct01b kernel: [<ffffffff811802f0>] ? balance_pgdat+0x5e0/0x5e0
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    May 15 20:31:28 dvct01b kernel: INFO: task xfsaild/rbd0:12866 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: xfsaild/rbd0 D ffff880424e43b90 0 12866 2 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff880425d43910 0000000000000046 ffff880425b8ae00 ffff880425d43fd8
    May 15 20:31:28 dvct01b kernel: ffff880425d43fd8 ffff880425d43fd8 ffff880425b8ae00 ffff88043fdd4780
    May 15 20:31:28 dvct01b kernel: 0000000000000000 7fffffffffffffff ffffe8ffffdc1880 ffff880424e43b90
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
    May 15 20:31:28 dvct01b kernel: [<ffffffff812cbd09>] ? blk_flush_plug_list+0xc9/0x230
    May 15 20:31:28 dvct01b kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d7245>] bt_get+0x135/0x1c0
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d766f>] blk_mq_get_tag+0xbf/0xf0
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d2c8b>] __blk_mq_alloc_request+0x1b/0x200
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d4bb1>] blk_mq_map_request+0x191/0x1f0
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d5c40>] blk_sq_make_request+0x80/0x380
    May 15 20:31:28 dvct01b kernel: [<ffffffff812c70ff>] ? generic_make_request_checks+0x24f/0x380
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116bfd9>] ? mempool_alloc+0x69/0x170
    May 15 20:31:28 dvct01b kernel: [<ffffffff812c7312>] generic_make_request+0xe2/0x130
    May 15 20:31:28 dvct01b kernel: [<ffffffff812c73d1>] submit_bio+0x71/0x150
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0207183>] _xfs_buf_ioapply+0x2f3/0x460 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0208a38>] ? __xfs_buf_delwri_submit+0x198/0x250 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0208744>] xfs_buf_submit+0x64/0x1c0 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0208a38>] __xfs_buf_delwri_submit+0x198/0x250 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020968f>] ? xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0234480>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020968f>] xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa02346c0>] xfsaild+0x240/0x5e0 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0234480>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    May 15 20:31:28 dvct01b kernel: INFO: task lmtp:76967 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: lmtp D ffff880425f28b80 0 76967 14464 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff88012c323e40 0000000000000086 ffff880425f28b80 ffff88012c323fd8
    May 15 20:31:28 dvct01b kernel: ffff88012c323fd8 ffff88012c323fd8 ffff880425f28b80 ffff880425f28b80
    May 15 20:31:28 dvct01b kernel: ffff88042745b528 ffff88012c323ef4 0000000000000001 ffff880425f28b80
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0229cc0>] _xfs_log_force_lsn+0x2a0/0x2f0 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
    May 15 20:31:28 dvct01b kernel: INFO: task lmtp:77114 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: lmtp D ffff88034a10fd80 0 77114 14464 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff88034a10fc20 0000000000000086 ffff8800bb911700 ffff88034a10ffd8
    May 15 20:31:28 dvct01b kernel: ffff88034a10ffd8 ffff88034a10ffd8 ffff8800bb911700 ffff88043fd14780
    May 15 20:31:28 dvct01b kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff88034a10fd80
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
    May 15 20:31:28 dvct01b kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
    May 15 20:31:28 dvct01b kernel: INFO: task lmtp:77116 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: lmtp D ffff880429724500 0 77116 14464 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff880110ce3e40 0000000000000086 ffff880429724500 ffff880110ce3fd8
    May 15 20:31:28 dvct01b kernel: ffff880110ce3fd8 ffff880110ce3fd8 ffff880429724500 00000110000a41c0
    May 15 20:31:28 dvct01b kernel: ffff88042745b528 ffff880110ce3ef4 ffff880424db1c18 ffff880429724500
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
    May 15 20:31:28 dvct01b kernel: INFO: task lmtp:77117 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: lmtp D ffff88042952b980 0 77117 14464 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff8801033b3e40 0000000000000086 ffff88042952b980 ffff8801033b3fd8
    May 15 20:31:28 dvct01b kernel: ffff8801033b3fd8 ffff8801033b3fd8 ffff88042952b980 00000110000a4200
    May 15 20:31:28 dvct01b kernel: ffff88042745b528 ffff8801033b3ef4 ffff880424c25618 ffff88042952b980
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
    May 15 20:31:28 dvct01b kernel: INFO: task lmtp:77118 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: lmtp D ffff88042952a280 0 77118 14464 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff88012f84fe40 0000000000000086 ffff88042952a280 ffff88012f84ffd8
    May 15 20:31:28 dvct01b kernel: ffff88012f84ffd8 ffff88012f84ffd8 ffff88042952a280 ffff88042952a280
    May 15 20:31:28 dvct01b kernel: ffff88042745b528 ffff88012f84fef4 0000000000000001 ffff88042952a280
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa0229cc0>] _xfs_log_force_lsn+0x2a0/0x2f0 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
    May 15 20:31:28 dvct01b kernel: INFO: task lmtp:77119 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: lmtp D ffff88017bbc3d80 0 77119 14464 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff88017bbc3c20 0000000000000086 ffff880425a83980 ffff88017bbc3fd8
    May 15 20:31:28 dvct01b kernel: ffff88017bbc3fd8 ffff88017bbc3fd8 ffff880425a83980 ffff88043fcd4780
    May 15 20:31:28 dvct01b kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff88017bbc3d80
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
    May 15 20:31:28 dvct01b kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
    May 15 20:31:28 dvct01b kernel: INFO: task lmtp:77121 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: lmtp D ffff8801c11cbd80 0 77121 14464 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff8801c11cbc20 0000000000000086 ffff880425a80b80 ffff8801c11cbfd8
    May 15 20:31:28 dvct01b kernel: ffff8801c11cbfd8 ffff8801c11cbfd8 ffff880425a80b80 ffff88043fc94780
    May 15 20:31:28 dvct01b kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff8801c11cbd80
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
    May 15 20:31:28 dvct01b kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
    May 15 20:31:28 dvct01b kernel: INFO: task lmtp:77122 blocked for more than 120 seconds.
    May 15 20:31:28 dvct01b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    May 15 20:31:28 dvct01b kernel: lmtp D ffff880179d4bd80 0 77122 14464 0x00000080
    May 15 20:31:28 dvct01b kernel: ffff880179d4bc20 0000000000000082 ffff880425528b80 ffff880179d4bfd8
    May 15 20:31:28 dvct01b kernel: ffff880179d4bfd8 ffff880179d4bfd8 ffff880425528b80 ffff88043fdd4780
    May 15 20:31:28 dvct01b kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff880179d4bd80
    May 15 20:31:28 dvct01b kernel: Call Trace:
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
    May 15 20:31:28 dvct01b kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
    May 15 20:31:28 dvct01b kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
    May 15 20:31:28 dvct01b kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
    May 15 20:31:28 dvct01b kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
    May 15 20:31:28 dvct01b kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
    May 15 20:31:28 dvct01b kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
    May 15 20:31:28 dvct01b kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
    May 15 20:31:28 dvct01b kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b

Related issues 1 (0 open1 closed)

Related to Linux kernel client - Bug #14022: map_sem for read + request_mutex are held indefinitelyResolvedIlya Dryomov12/08/2015

Actions
Actions #1

Updated by Shinobu Kinjo almost 8 years ago

ceph.conf
[snip]
debug_rbd_replay = 0/5
rbd_op_threads = 1
rbd_op_thread_timeout = 60
rbd_non_blocking_aio = true
rbd_cache = true
rbd_cache_writethrough_until_flush = true
rbd_cache_size = 33554432
rbd_cache_max_dirty = 25165824
rbd_cache_target_dirty = 16777216
rbd_cache_max_dirty_age = 1
rbd_cache_max_dirty_object = 0
rbd_cache_block_writes_upfront = false
rbd_concurrent_management_ops = 10
rbd_balance_snap_reads = false
rbd_localize_snap_reads = false
rbd_balance_parent_reads = false
rbd_localize_parent_reads = true
rbd_readahead_trigger_requests = 10
rbd_readahead_max_bytes = 524288
rbd_readahead_disable_after_bytes = 52428800
rbd_clone_copy_on_read = false
rbd_blacklist_on_break_lock = true
rbd_blacklist_expire_seconds = 0
rbd_request_timed_out_seconds = 30
rbd_validate_pool = true
rbd_tracing = false
rbd_default_format = 1
rbd_default_order = 22
rbd_default_stripe_count = 0
rbd_default_stripe_unit = 0
rbd_default_features = 3
[snip]

Actions #2

Updated by Shinobu Kinjo almost 8 years ago

There were 50k per hour random write to rbd utilized with xfs on the client.
Each write size was 80k bytes.

kernel
3.10.0-327.el7.x86_64

  1. During Stalled
    ceph -s
    cluster b9b3c4d2-bfbb-4b0d-b3e5-da9d63729c9e
    health HEALTH_WARN
    too few PGs per OSD (21 < min 30)
    monmap e1: 3 mons at {mon01=10.0.0.4:6789/0,mon02=10.0.0.5:6789/0,mon03=10.0.0.6:6789/0}
    election epoch 4, quorum 0,1,2 mon01a,mon01b,mon01c
    osdmap e65: 9 osds: 9 up, 9 in
    pgmap v355460: 64 pgs, 1 pools, 1464 GB data, 416 kobjects
    4229 GB used, 12525 GB / 16754 GB avail
    64 active+clean
    client io 7431 kB/s wr, 1909 op/s
Actions #3

Updated by Shinobu Kinjo almost 8 years ago

There are 2 clients mapping same rbd block device on top of rbd pool.

Actions #4

Updated by Shinobu Kinjo almost 8 years ago

Here is 2nd hung up.

May 15 20:31:28 dvct02 kernel: INFO: task kswapd0:76 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: kswapd0 D ffff880425fe2280 0 76 2 0x00000000
May 15 20:31:28 d02 kernel: ffff880425817910 0000000000000046 ffff880425fe2280 ffff880425817fd8
May 15 20:31:28 d02 kernel: ffff880425817fd8 ffff880425817fd8 ffff880425fe2280 00000110000a41c0
May 15 20:31:28 d02 kernel: ffff88042745b528 0000000000000000 ffff880424db1c18 ffff880425fe2280
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa0229d3e>] xfs_log_force_lsn+0x2e/0x90 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa021bfd9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa02184c7>] __xfs_iunpin_wait+0xa7/0x150 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffffa021bfd9>] xfs_iunpin_wait+0x19/0x20 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa021085c>] xfs_reclaim_inode+0x8c/0x350 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0210d87>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0211933>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa02208a5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff811e0c48>] prune_super+0xe8/0x170
May 15 20:31:28 d02 kernel: [<ffffffff8117c555>] shrink_slab+0x165/0x300
May 15 20:31:28 d02 kernel: [<ffffffff811d5eb1>] ? vmpressure+0x61/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811801c1>] balance_pgdat+0x4b1/0x5e0
May 15 20:31:28 d02 kernel: [<ffffffff81180463>] kswapd+0x173/0x450
May 15 20:31:28 d02 kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
May 15 20:31:28 d02 kernel: [<ffffffff811802f0>] ? balance_pgdat+0x5e0/0x5e0
May 15 20:31:28 d02 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 15 20:31:28 d02 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 15 20:31:28 d02 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 15 20:31:28 d02 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 15 20:31:28 d02 kernel: INFO: task xfsaild/rbd0:12866 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: xfsaild/rbd0 D ffff880424e43b90 0 12866 2 0x00000080
May 15 20:31:28 d02 kernel: ffff880425d43910 0000000000000046 ffff880425b8ae00 ffff880425d43fd8
May 15 20:31:28 d02 kernel: ffff880425d43fd8 ffff880425d43fd8 ffff880425b8ae00 ffff88043fdd4780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffe8ffffdc1880 ffff880424e43b90
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812cbd09>] ? blk_flush_plug_list+0xc9/0x230
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff812d7245>] bt_get+0x135/0x1c0
May 15 20:31:28 d02 kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
May 15 20:31:28 d02 kernel: [<ffffffff812d766f>] blk_mq_get_tag+0xbf/0xf0
May 15 20:31:28 d02 kernel: [<ffffffff812d2c8b>] __blk_mq_alloc_request+0x1b/0x200
May 15 20:31:28 d02 kernel: [<ffffffff812d4bb1>] blk_mq_map_request+0x191/0x1f0
May 15 20:31:28 d02 kernel: [<ffffffff812d5c40>] blk_sq_make_request+0x80/0x380
May 15 20:31:28 d02 kernel: [<ffffffff812c70ff>] ? generic_make_request_checks+0x24f/0x380
May 15 20:31:28 d02 kernel: [<ffffffff8116bfd9>] ? mempool_alloc+0x69/0x170
May 15 20:31:28 d02 kernel: [<ffffffff812c7312>] generic_make_request+0xe2/0x130
May 15 20:31:28 d02 kernel: [<ffffffff812c73d1>] submit_bio+0x71/0x150
May 15 20:31:28 d02 kernel: [<ffffffffa0207183>] _xfs_buf_ioapply+0x2f3/0x460 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0208a38>] ? __xfs_buf_delwri_submit+0x198/0x250 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0208744>] xfs_buf_submit+0x64/0x1c0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0208a38>] __xfs_buf_delwri_submit+0x198/0x250 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa020968f>] ? xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0234480>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa020968f>] xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa02346c0>] xfsaild+0x240/0x5e0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0234480>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 15 20:31:28 d02 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 15 20:31:28 d02 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 15 20:31:28 d02 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 15 20:31:28 d02 kernel: INFO: task lmtp:76967 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff880425f28b80 0 76967 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff88012c323e40 0000000000000086 ffff880425f28b80 ffff88012c323fd8
May 15 20:31:28 d02 kernel: ffff88012c323fd8 ffff88012c323fd8 ffff880425f28b80 ffff880425f28b80
May 15 20:31:28 d02 kernel: ffff88042745b528 ffff88012c323ef4 0000000000000001 ffff880425f28b80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229cc0>] _xfs_log_force_lsn+0x2a0/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77114 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff88034a10fd80 0 77114 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff88034a10fc20 0000000000000086 ffff8800bb911700 ffff88034a10ffd8
May 15 20:31:28 d02 kernel: ffff88034a10ffd8 ffff88034a10ffd8 ffff8800bb911700 ffff88043fd14780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff88034a10fd80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 15 20:31:28 d02 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77116 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff880429724500 0 77116 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff880110ce3e40 0000000000000086 ffff880429724500 ffff880110ce3fd8
May 15 20:31:28 d02 kernel: ffff880110ce3fd8 ffff880110ce3fd8 ffff880429724500 00000110000a41c0
May 15 20:31:28 d02 kernel: ffff88042745b528 ffff880110ce3ef4 ffff880424db1c18 ffff880429724500
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77117 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff88042952b980 0 77117 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff8801033b3e40 0000000000000086 ffff88042952b980 ffff8801033b3fd8
May 15 20:31:28 d02 kernel: ffff8801033b3fd8 ffff8801033b3fd8 ffff88042952b980 00000110000a4200
May 15 20:31:28 d02 kernel: ffff88042745b528 ffff8801033b3ef4 ffff880424c25618 ffff88042952b980
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77118 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff88042952a280 0 77118 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff88012f84fe40 0000000000000086 ffff88042952a280 ffff88012f84ffd8
May 15 20:31:28 d02 kernel: ffff88012f84ffd8 ffff88012f84ffd8 ffff88042952a280 ffff88042952a280
May 15 20:31:28 d02 kernel: ffff88042745b528 ffff88012f84fef4 0000000000000001 ffff88042952a280
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229cc0>] _xfs_log_force_lsn+0x2a0/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77119 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff88017bbc3d80 0 77119 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff88017bbc3c20 0000000000000086 ffff880425a83980 ffff88017bbc3fd8
May 15 20:31:28 d02 kernel: ffff88017bbc3fd8 ffff88017bbc3fd8 ffff880425a83980 ffff88043fcd4780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff88017bbc3d80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 15 20:31:28 d02 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77121 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff8801c11cbd80 0 77121 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff8801c11cbc20 0000000000000086 ffff880425a80b80 ffff8801c11cbfd8
May 15 20:31:28 d02 kernel: ffff8801c11cbfd8 ffff8801c11cbfd8 ffff880425a80b80 ffff88043fc94780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff8801c11cbd80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 15 20:31:28 d02 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77122 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff880179d4bd80 0 77122 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff880179d4bc20 0000000000000082 ffff880425528b80 ffff880179d4bfd8
May 15 20:31:28 d02 kernel: ffff880179d4bfd8 ffff880179d4bfd8 ffff880425528b80 ffff88043fdd4780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff880179d4bd80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 15 20:31:28 d02 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b

Actions #5

Updated by Shinobu Kinjo almost 8 years ago

Here is 2nd hung up.

May 15 20:31:28 d02 kernel: INFO: task kswapd0:76 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: kswapd0 D ffff880425fe2280 0 76 2 0x00000000
May 15 20:31:28 d02 kernel: ffff880425817910 0000000000000046 ffff880425fe2280 ffff880425817fd8
May 15 20:31:28 d02 kernel: ffff880425817fd8 ffff880425817fd8 ffff880425fe2280 00000110000a41c0
May 15 20:31:28 d02 kernel: ffff88042745b528 0000000000000000 ffff880424db1c18 ffff880425fe2280
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa0229d3e>] xfs_log_force_lsn+0x2e/0x90 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa021bfd9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa02184c7>] __xfs_iunpin_wait+0xa7/0x150 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffffa021bfd9>] xfs_iunpin_wait+0x19/0x20 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa021085c>] xfs_reclaim_inode+0x8c/0x350 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0210d87>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0211933>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa02208a5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff811e0c48>] prune_super+0xe8/0x170
May 15 20:31:28 d02 kernel: [<ffffffff8117c555>] shrink_slab+0x165/0x300
May 15 20:31:28 d02 kernel: [<ffffffff811d5eb1>] ? vmpressure+0x61/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811801c1>] balance_pgdat+0x4b1/0x5e0
May 15 20:31:28 d02 kernel: [<ffffffff81180463>] kswapd+0x173/0x450
May 15 20:31:28 d02 kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
May 15 20:31:28 d02 kernel: [<ffffffff811802f0>] ? balance_pgdat+0x5e0/0x5e0
May 15 20:31:28 d02 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 15 20:31:28 d02 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 15 20:31:28 d02 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 15 20:31:28 d02 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 15 20:31:28 d02 kernel: INFO: task xfsaild/rbd0:12866 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: xfsaild/rbd0 D ffff880424e43b90 0 12866 2 0x00000080
May 15 20:31:28 d02 kernel: ffff880425d43910 0000000000000046 ffff880425b8ae00 ffff880425d43fd8
May 15 20:31:28 d02 kernel: ffff880425d43fd8 ffff880425d43fd8 ffff880425b8ae00 ffff88043fdd4780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffe8ffffdc1880 ffff880424e43b90
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812cbd09>] ? blk_flush_plug_list+0xc9/0x230
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff812d7245>] bt_get+0x135/0x1c0
May 15 20:31:28 d02 kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
May 15 20:31:28 d02 kernel: [<ffffffff812d766f>] blk_mq_get_tag+0xbf/0xf0
May 15 20:31:28 d02 kernel: [<ffffffff812d2c8b>] __blk_mq_alloc_request+0x1b/0x200
May 15 20:31:28 d02 kernel: [<ffffffff812d4bb1>] blk_mq_map_request+0x191/0x1f0
May 15 20:31:28 d02 kernel: [<ffffffff812d5c40>] blk_sq_make_request+0x80/0x380
May 15 20:31:28 d02 kernel: [<ffffffff812c70ff>] ? generic_make_request_checks+0x24f/0x380
May 15 20:31:28 d02 kernel: [<ffffffff8116bfd9>] ? mempool_alloc+0x69/0x170
May 15 20:31:28 d02 kernel: [<ffffffff812c7312>] generic_make_request+0xe2/0x130
May 15 20:31:28 d02 kernel: [<ffffffff812c73d1>] submit_bio+0x71/0x150
May 15 20:31:28 d02 kernel: [<ffffffffa0207183>] _xfs_buf_ioapply+0x2f3/0x460 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0208a38>] ? __xfs_buf_delwri_submit+0x198/0x250 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0208744>] xfs_buf_submit+0x64/0x1c0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0208a38>] __xfs_buf_delwri_submit+0x198/0x250 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa020968f>] ? xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0234480>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa020968f>] xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa02346c0>] xfsaild+0x240/0x5e0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffffa0234480>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 15 20:31:28 d02 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 15 20:31:28 d02 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 15 20:31:28 d02 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 15 20:31:28 d02 kernel: INFO: task lmtp:76967 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff880425f28b80 0 76967 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff88012c323e40 0000000000000086 ffff880425f28b80 ffff88012c323fd8
May 15 20:31:28 d02 kernel: ffff88012c323fd8 ffff88012c323fd8 ffff880425f28b80 ffff880425f28b80
May 15 20:31:28 d02 kernel: ffff88042745b528 ffff88012c323ef4 0000000000000001 ffff880425f28b80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229cc0>] _xfs_log_force_lsn+0x2a0/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77114 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff88034a10fd80 0 77114 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff88034a10fc20 0000000000000086 ffff8800bb911700 ffff88034a10ffd8
May 15 20:31:28 d02 kernel: ffff88034a10ffd8 ffff88034a10ffd8 ffff8800bb911700 ffff88043fd14780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff88034a10fd80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 15 20:31:28 d02 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77116 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff880429724500 0 77116 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff880110ce3e40 0000000000000086 ffff880429724500 ffff880110ce3fd8
May 15 20:31:28 d02 kernel: ffff880110ce3fd8 ffff880110ce3fd8 ffff880429724500 00000110000a41c0
May 15 20:31:28 d02 kernel: ffff88042745b528 ffff880110ce3ef4 ffff880424db1c18 ffff880429724500
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77117 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff88042952b980 0 77117 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff8801033b3e40 0000000000000086 ffff88042952b980 ffff8801033b3fd8
May 15 20:31:28 d02 kernel: ffff8801033b3fd8 ffff8801033b3fd8 ffff88042952b980 00000110000a4200
May 15 20:31:28 d02 kernel: ffff88042745b528 ffff8801033b3ef4 ffff880424c25618 ffff88042952b980
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77118 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff88042952a280 0 77118 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff88012f84fe40 0000000000000086 ffff88042952a280 ffff88012f84ffd8
May 15 20:31:28 d02 kernel: ffff88012f84ffd8 ffff88012f84ffd8 ffff88042952a280 ffff88042952a280
May 15 20:31:28 d02 kernel: ffff88042745b528 ffff88012f84fef4 0000000000000001 ffff88042952a280
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa0229cc0>] _xfs_log_force_lsn+0x2a0/0x2f0 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 15 20:31:28 d02 kernel: [<ffffffffa020c4e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77119 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff88017bbc3d80 0 77119 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff88017bbc3c20 0000000000000086 ffff880425a83980 ffff88017bbc3fd8
May 15 20:31:28 d02 kernel: ffff88017bbc3fd8 ffff88017bbc3fd8 ffff880425a83980 ffff88043fcd4780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff88017bbc3d80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 15 20:31:28 d02 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77121 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff8801c11cbd80 0 77121 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff8801c11cbc20 0000000000000086 ffff880425a80b80 ffff8801c11cbfd8
May 15 20:31:28 d02 kernel: ffff8801c11cbfd8 ffff8801c11cbfd8 ffff880425a80b80 ffff88043fc94780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff8801c11cbd80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 15 20:31:28 d02 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 15 20:31:28 d02 kernel: INFO: task lmtp:77122 blocked for more than 120 seconds.
May 15 20:31:28 d02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 20:31:28 d02 kernel: lmtp D ffff880179d4bd80 0 77122 14464 0x00000080
May 15 20:31:28 d02 kernel: ffff880179d4bc20 0000000000000082 ffff880425528b80 ffff880179d4bfd8
May 15 20:31:28 d02 kernel: ffff880179d4bfd8 ffff880179d4bfd8 ffff880425528b80 ffff88043fdd4780
May 15 20:31:28 d02 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff880179d4bd80
May 15 20:31:28 d02 kernel: Call Trace:
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 15 20:31:28 d02 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 15 20:31:28 d02 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 15 20:31:28 d02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 15 20:31:28 d02 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 15 20:31:28 d02 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 15 20:31:28 d02 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 15 20:31:28 d02 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 15 20:31:28 d02 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 15 20:31:28 d02 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 15 20:31:28 d02 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 15 20:31:28 d02 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 15 20:31:28 d02 kernel: [<ffffffffa020c396>] xfs_file_fsync+0x66/0x200 [xfs]
May 15 20:31:28 d02 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 15 20:31:28 d02 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 15 20:31:28 d02 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b

Actions #6

Updated by Shinobu Kinjo almost 8 years ago

Sorry, please ignore #4, #5. They are same of trace log in 1st report -;
And that trace log is 2nd hung up.

Here is 1st hung up.
It seems to be caused by deadlock and is not same of 2nd one.

May 12 10:36:14 d01 kernel: INFO: task kthreadd:2 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: kthreadd D ffff8804295a3900 0 2 0 0x00000000
May 12 10:36:14 d01 kernel: ffff8804295a36c0 0000000000000046 ffff880429528b80 ffff8804295a3fd8
May 12 10:36:14 d01 kernel: ffff8804295a3fd8 ffff8804295a3fd8 ffff880429528b80 ffff88043fd94780
May 12 10:36:14 d01 kernel: 0000000000000000 7fffffffffffffff ffffffffa0227fd9 ffff8804295a3900
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffffa0227fd9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 12 10:36:14 d01 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 12 10:36:14 d01 kernel: [<ffffffff810d812c>] ? ktime_get_ts64+0x4c/0xf0
May 12 10:36:14 d01 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 12 10:36:14 d01 kernel: [<ffffffffa0227fd9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 12 10:36:14 d01 kernel: [<ffffffff810a67e6>] ? prepare_to_wait+0x56/0x90
May 12 10:36:14 d01 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 12 10:36:14 d01 kernel: [<ffffffffa02244f9>] __xfs_iunpin_wait+0xd9/0x150 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 12 10:36:14 d01 kernel: [<ffffffffa0227fd9>] xfs_iunpin_wait+0x19/0x20 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa021c85c>] xfs_reclaim_inode+0x8c/0x350 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa021cd87>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa021d933>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa022c8a5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff811e0c48>] prune_super+0xe8/0x170
May 12 10:36:14 d01 kernel: [<ffffffff8117c555>] shrink_slab+0x165/0x300
May 12 10:36:14 d01 kernel: [<ffffffff811d5e71>] ? vmpressure+0x21/0x90
May 12 10:36:14 d01 kernel: [<ffffffff8117f6d2>] do_try_to_free_pages+0x3c2/0x4e0
May 12 10:36:14 d01 kernel: [<ffffffff8117f8ec>] try_to_free_pages+0xfc/0x180
May 12 10:36:14 d01 kernel: [<ffffffff811735ed>] __alloc_pages_nodemask+0x7fd/0xb90
May 12 10:36:14 d01 kernel: [<ffffffff81078d53>] copy_process.part.25+0x163/0x1610
May 12 10:36:14 d01 kernel: [<ffffffff810c218e>] ? dequeue_task_fair+0x40e/0x620
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: [<ffffffff8107a3e1>] do_fork+0xe1/0x320
May 12 10:36:14 d01 kernel: [<ffffffff8107a646>] kernel_thread+0x26/0x30
May 12 10:36:14 d01 kernel: [<ffffffff810a65d2>] kthreadd+0x2b2/0x2f0
May 12 10:36:14 d01 kernel: [<ffffffff810a6320>] ? kthread_create_on_cpu+0x60/0x60
May 12 10:36:14 d01 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 12 10:36:14 d01 kernel: [<ffffffff810a6320>] ? kthread_create_on_cpu+0x60/0x60
May 12 10:36:14 d01 kernel: INFO: task kswapd0:76 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: kswapd0 D 0000000000000000 0 76 2 0x00000000
May 12 10:36:14 d01 kernel: ffff8804258438f0 0000000000000046 ffff88042917e780 ffff880425843fd8
May 12 10:36:14 d01 kernel: ffff880425843fd8 ffff880425843fd8 ffff88042917e780 ffff880035b90fd8
May 12 10:36:14 d01 kernel: ffff880035b90fe0 7fffffffffffffff ffff88042917e780 0000000000000000
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 12 10:36:14 d01 kernel: [<ffffffff812cc224>] ? blk_finish_plug+0x14/0x40
May 12 10:36:14 d01 kernel: [<ffffffffa02131c4>] ? _xfs_buf_ioapply+0x334/0x460 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff8163b1f6>] wait_for_completion+0x116/0x170
May 12 10:36:14 d01 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 12 10:36:14 d01 kernel: [<ffffffffa0214fc4>] ? xfs_bwrite+0x24/0x60 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa0214b74>] xfs_buf_submit_wait+0x84/0x1d0 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa0214fc4>] xfs_bwrite+0x24/0x60 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa021cafb>] xfs_reclaim_inode+0x32b/0x350 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa021cd87>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa021d933>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa022c8a5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff811e0c48>] prune_super+0xe8/0x170
May 12 10:36:14 d01 kernel: [<ffffffff8117c555>] shrink_slab+0x165/0x300
May 12 10:36:14 d01 kernel: [<ffffffff811d5ed7>] ? vmpressure+0x87/0x90
May 12 10:36:14 d01 kernel: [<ffffffff811801c1>] balance_pgdat+0x4b1/0x5e0
May 12 10:36:14 d01 kernel: [<ffffffff81180463>] kswapd+0x173/0x450
May 12 10:36:14 d01 kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
May 12 10:36:14 d01 kernel: [<ffffffff811802f0>] ? balance_pgdat+0x5e0/0x5e0
May 12 10:36:14 d01 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: INFO: task rbd:10270 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: rbd D ffff8804234367c8 0 10270 2 0x00000080
May 12 10:36:14 d01 kernel: Workqueue: rbd rbd_queue_workfn [rbd]
May 12 10:36:14 d01 kernel: ffff8804287fbc50 0000000000000046 ffff880424db8000 ffff8804287fbfd8
May 12 10:36:14 d01 kernel: ffff8804287fbfd8 ffff8804287fbfd8 ffff880424db8000 ffff8804234367c0
May 12 10:36:14 d01 kernel: ffff8804234367c4 ffff880424db8000 00000000ffffffff ffff8804234367c8
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff8163bf09>] schedule_preempt_disabled+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffff81639c05>] __mutex_lock_slowpath+0xc5/0x1c0
May 12 10:36:14 d01 kernel: [<ffffffff8163906f>] mutex_lock+0x1f/0x2f
May 12 10:36:14 d01 kernel: [<ffffffffa03ba663>] ceph_osdc_start_request+0x33/0x70 [libceph]
May 12 10:36:14 d01 kernel: [<ffffffffa0391539>] rbd_obj_request_submit+0x29/0x50 [rbd]
May 12 10:36:14 d01 kernel: [<ffffffffa03954d5>] rbd_img_obj_request_submit+0x205/0x4d0 [rbd]
May 12 10:36:14 d01 kernel: [<ffffffffa03957f4>] rbd_img_request_submit+0x54/0x90 [rbd]
May 12 10:36:14 d01 kernel: [<ffffffffa039686b>] rbd_queue_workfn+0x2db/0x420 [rbd]
May 12 10:36:14 d01 kernel: [<ffffffff8109d5db>] process_one_work+0x17b/0x470
May 12 10:36:14 d01 kernel: [<ffffffff8109e0a4>] rescuer_thread+0x214/0x400
May 12 10:36:14 d01 kernel: [<ffffffff8109de90>] ? manage_workers.isra.24+0x2d0/0x2d0
May 12 10:36:14 d01 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: INFO: task xfsaild/rbd0:21086 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: xfsaild/rbd0 D ffff8804256c0710 0 21086 2 0x00000080
May 12 10:36:14 d01 kernel: ffff880424c83910 0000000000000046 ffff880035e58b80 ffff880424c83fd8
May 12 10:36:14 d01 kernel: ffff880424c83fd8 ffff880424c83fd8 ffff880035e58b80 ffff88043fc14780
May 12 10:36:14 d01 kernel: 0000000000000000 7fffffffffffffff ffffe8ffffc01880 ffff8804256c0710
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 12 10:36:14 d01 kernel: [<ffffffff812cbd09>] ? blk_flush_plug_list+0xc9/0x230
May 12 10:36:14 d01 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 12 10:36:14 d01 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 12 10:36:14 d01 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 12 10:36:14 d01 kernel: [<ffffffff812d7245>] bt_get+0x135/0x1c0
May 12 10:36:14 d01 kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
May 12 10:36:14 d01 kernel: [<ffffffff812d766f>] blk_mq_get_tag+0xbf/0xf0
May 12 10:36:14 d01 kernel: [<ffffffff812d2c8b>] __blk_mq_alloc_request+0x1b/0x200
May 12 10:36:14 d01 kernel: [<ffffffff812d4bb1>] blk_mq_map_request+0x191/0x1f0
May 12 10:36:14 d01 kernel: [<ffffffff812d5c40>] blk_sq_make_request+0x80/0x380
May 12 10:36:14 d01 kernel: [<ffffffff812c70ff>] ? generic_make_request_checks+0x24f/0x380
May 12 10:36:14 d01 kernel: [<ffffffff8116bfd9>] ? mempool_alloc+0x69/0x170
May 12 10:36:14 d01 kernel: [<ffffffff812c7312>] generic_make_request+0xe2/0x130
May 12 10:36:14 d01 kernel: [<ffffffff812c73d1>] submit_bio+0x71/0x150
May 12 10:36:14 d01 kernel: [<ffffffffa0213183>] _xfs_buf_ioapply+0x2f3/0x460 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa0214a38>] ? __xfs_buf_delwri_submit+0x198/0x250 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa0214744>] xfs_buf_submit+0x64/0x1c0 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa0214a38>] __xfs_buf_delwri_submit+0x198/0x250 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa021568f>] ? xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa0240480>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa021568f>] xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa02406c0>] xfsaild+0x240/0x5e0 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa0240480>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 12 10:36:14 d01 kernel: [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: INFO: task kworker/u16:0:67676 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: kworker/u16:0 D ffff8804256c0710 0 67676 2 0x00000080
May 12 10:36:14 d01 kernel: Workqueue: writeback bdi_writeback_workfn (flush-252:0)
May 12 10:36:14 d01 kernel: ffff88001a8b3590 0000000000000046 ffff880426818000 ffff88001a8b3fd8
May 12 10:36:14 d01 kernel: ffff88001a8b3fd8 ffff88001a8b3fd8 ffff880426818000 ffff88043fc94780
May 12 10:36:14 d01 kernel: 0000000000000000 7fffffffffffffff ffffe8ffffc81880 ffff8804256c0710
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 12 10:36:14 d01 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 12 10:36:14 d01 kernel: [<ffffffff810d812c>] ? ktime_get_ts64+0x4c/0xf0
May 12 10:36:14 d01 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 12 10:36:14 d01 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 12 10:36:14 d01 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 12 10:36:14 d01 kernel: [<ffffffff812d7245>] bt_get+0x135/0x1c0
May 12 10:36:14 d01 kernel: [<ffffffff810a6ac0>] ? wake_up_atomic_t+0x30/0x30
May 12 10:36:14 d01 kernel: [<ffffffff812d766f>] blk_mq_get_tag+0xbf/0xf0
May 12 10:36:14 d01 kernel: [<ffffffff812d2c8b>] __blk_mq_alloc_request+0x1b/0x200
May 12 10:36:14 d01 kernel: [<ffffffff812d4bb1>] blk_mq_map_request+0x191/0x1f0
May 12 10:36:14 d01 kernel: [<ffffffff812d5c40>] blk_sq_make_request+0x80/0x380
May 12 10:36:14 d01 kernel: [<ffffffff812c70ff>] ? generic_make_request_checks+0x24f/0x380
May 12 10:36:14 d01 kernel: [<ffffffff8116bfd9>] ? mempool_alloc+0x69/0x170
May 12 10:36:14 d01 kernel: [<ffffffff812c7312>] generic_make_request+0xe2/0x130
May 12 10:36:14 d01 kernel: [<ffffffff812c73d1>] submit_bio+0x71/0x150
May 12 10:36:14 d01 kernel: [<ffffffffa020d343>] xfs_submit_ioend_bio.isra.12+0x33/0x40 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa020d41a>] xfs_submit_ioend+0xca/0x130 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffffa020e112>] xfs_vm_writepage+0x2a2/0x5d0 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff81173af3>] __writepage+0x13/0x50
May 12 10:36:14 d01 kernel: [<ffffffff81174611>] write_cache_pages+0x251/0x4d0
May 12 10:36:14 d01 kernel: [<ffffffff81173ae0>] ? global_dirtyable_memory+0x70/0x70
May 12 10:36:14 d01 kernel: [<ffffffff812cbd09>] ? blk_flush_plug_list+0xc9/0x230
May 12 10:36:14 d01 kernel: [<ffffffff811748dd>] generic_writepages+0x4d/0x80
May 12 10:36:14 d01 kernel: [<ffffffffa020d9b3>] xfs_vm_writepages+0x43/0x50 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff8117598e>] do_writepages+0x1e/0x40
May 12 10:36:14 d01 kernel: [<ffffffff81208460>] __writeback_single_inode+0x40/0x220
May 12 10:36:14 d01 kernel: [<ffffffff81208ece>] writeback_sb_inodes+0x25e/0x420
May 12 10:36:14 d01 kernel: [<ffffffff8120912f>] __writeback_inodes_wb+0x9f/0xd0
May 12 10:36:14 d01 kernel: [<ffffffff81209973>] wb_writeback+0x263/0x2f0
May 12 10:36:14 d01 kernel: [<ffffffff811f878c>] ? get_nr_inodes+0x4c/0x70
May 12 10:36:14 d01 kernel: [<ffffffff8120bbfb>] bdi_writeback_workfn+0x2cb/0x460
May 12 10:36:14 d01 kernel: [<ffffffff8109d5db>] process_one_work+0x17b/0x470
May 12 10:36:14 d01 kernel: [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
May 12 10:36:14 d01 kernel: [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
May 12 10:36:14 d01 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: INFO: task lmtp:68253 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: lmtp D ffff88042978dc00 0 68253 24733 0x00000080
May 12 10:36:14 d01 kernel: ffff88012d713e40 0000000000000086 ffff88042978dc00 ffff88012d713fd8
May 12 10:36:14 d01 kernel: ffff88012d713fd8 ffff88012d713fd8 ffff88042978dc00 00000083000e87c0
May 12 10:36:14 d01 kernel: ffff880035bfb128 ffff88012d713ef4 ffff880035928318 ffff88042978dc00
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffffa0235ba4>] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 12 10:36:14 d01 kernel: [<ffffffffa02184e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 12 10:36:14 d01 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 12 10:36:14 d01 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 12 10:36:14 d01 kernel: INFO: task lmtp:68347 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: lmtp D ffff880083ef7d80 0 68347 24733 0x00000080
May 12 10:36:14 d01 kernel: ffff880083ef7c20 0000000000000086 ffff880004fa9700 ffff880083ef7fd8
May 12 10:36:14 d01 kernel: ffff880083ef7fd8 ffff880083ef7fd8 ffff880004fa9700 ffff88043fd54780
May 12 10:36:14 d01 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff880083ef7d80
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 12 10:36:14 d01 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 12 10:36:14 d01 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 12 10:36:14 d01 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 12 10:36:14 d01 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 12 10:36:14 d01 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 12 10:36:14 d01 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 12 10:36:14 d01 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 12 10:36:14 d01 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 12 10:36:14 d01 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 12 10:36:14 d01 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 12 10:36:14 d01 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 12 10:36:14 d01 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 12 10:36:14 d01 kernel: [<ffffffffa0218396>] xfs_file_fsync+0x66/0x200 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 12 10:36:14 d01 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 12 10:36:14 d01 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 12 10:36:14 d01 kernel: INFO: task lmtp:68674 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: lmtp D ffff88011cbb3d80 0 68674 24733 0x00000080
May 12 10:36:14 d01 kernel: ffff88011cbb3c20 0000000000000082 ffff880425ba2280 ffff88011cbb3fd8
May 12 10:36:14 d01 kernel: ffff88011cbb3fd8 ffff88011cbb3fd8 ffff880425ba2280 ffff88043fd14780
May 12 10:36:14 d01 kernel: 0000000000000000 7fffffffffffffff ffffffff81168960 ffff88011cbb3d80
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 12 10:36:14 d01 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffff81638b19>] schedule_timeout+0x209/0x2d0
May 12 10:36:14 d01 kernel: [<ffffffff812d44e5>] ? blk_mq_run_hw_queue+0xa5/0xd0
May 12 10:36:14 d01 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
May 12 10:36:14 d01 kernel: [<ffffffff81168960>] ? wait_on_page_read+0x60/0x60
May 12 10:36:14 d01 kernel: [<ffffffff8163a45e>] io_schedule_timeout+0xae/0x130
May 12 10:36:14 d01 kernel: [<ffffffff8163a4f8>] io_schedule+0x18/0x20
May 12 10:36:14 d01 kernel: [<ffffffff8116896e>] sleep_on_page+0xe/0x20
May 12 10:36:14 d01 kernel: [<ffffffff81638ca0>] __wait_on_bit+0x60/0x90
May 12 10:36:14 d01 kernel: [<ffffffff811686f6>] wait_on_page_bit+0x86/0xb0
May 12 10:36:14 d01 kernel: [<ffffffff810a6b40>] ? wake_atomic_t_function+0x40/0x40
May 12 10:36:14 d01 kernel: [<ffffffff81168831>] filemap_fdatawait_range+0x111/0x1b0
May 12 10:36:14 d01 kernel: [<ffffffff8116a85f>] filemap_write_and_wait_range+0x3f/0x70
May 12 10:36:14 d01 kernel: [<ffffffffa0218396>] xfs_file_fsync+0x66/0x200 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 12 10:36:14 d01 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 12 10:36:14 d01 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 12 10:36:14 d01 kernel: INFO: task lmtp:68675 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: lmtp D ffff880427e28b80 0 68675 24733 0x00000080
May 12 10:36:14 d01 kernel: ffff8803a4cc7e40 0000000000000086 ffff880427e28b80 ffff8803a4cc7fd8
May 12 10:36:14 d01 kernel: ffff8803a4cc7fd8 ffff8803a4cc7fd8 ffff880427e28b80 ffff880427e28b80
May 12 10:36:14 d01 kernel: ffff880035bfb128 ffff8803a4cc7ef4 0000000000000001 ffff880427e28b80
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff8163ae29>] schedule+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffffa0235cc0>] _xfs_log_force_lsn+0x2a0/0x2f0 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff810b8c00>] ? wake_up_state+0x20/0x20
May 12 10:36:14 d01 kernel: [<ffffffffa02184e0>] xfs_file_fsync+0x1b0/0x200 [xfs]
May 12 10:36:14 d01 kernel: [<ffffffff8120f9c5>] do_fsync+0x65/0xa0
May 12 10:36:14 d01 kernel: [<ffffffff8120fcb3>] SyS_fdatasync+0x13/0x20
May 12 10:36:14 d01 kernel: [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b
May 12 10:36:14 d01 kernel: INFO: task kworker/2:1:68851 blocked for more than 120 seconds.
May 12 10:36:14 d01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:36:14 d01 kernel: kworker/2:1 D ffff8804234367c8 0 68851 2 0x00000080
May 12 10:36:14 d01 kernel: Workqueue: ceph-msgr con_work [libceph]
May 12 10:36:14 d01 kernel: ffff880108bebb90 0000000000000046 ffff880424dba280 ffff880108bebfd8
May 12 10:36:14 d01 kernel: ffff880108bebfd8 ffff880108bebfd8 ffff880424dba280 ffff8804234367c0
May 12 10:36:14 d01 kernel: ffff8804234367c4 ffff880424dba280 00000000ffffffff ffff8804234367c8
May 12 10:36:14 d01 kernel: Call Trace:
May 12 10:36:14 d01 kernel: [<ffffffff8163bf09>] schedule_preempt_disabled+0x29/0x70
May 12 10:36:14 d01 kernel: [<ffffffff81639c05>] __mutex_lock_slowpath+0xc5/0x1c0
May 12 10:36:14 d01 kernel: [<ffffffff8163906f>] mutex_lock+0x1f/0x2f
May 12 10:36:14 d01 kernel: [<ffffffffa03b7b67>] alloc_msg+0xb7/0x2b0 [libceph]
May 12 10:36:14 d01 kernel: [<ffffffffa03b2204>] read_partial_message+0x2a4/0x940 [libceph]
May 12 10:36:14 d01 kernel: [<ffffffff8151162a>] ? kernel_recvmsg+0x3a/0x50
May 12 10:36:14 d01 kernel: [<ffffffffa03b2bf8>] try_read+0x358/0x10c0 [libceph]
May 12 10:36:14 d01 kernel: [<ffffffff8101cd15>] ? native_sched_clock+0x35/0x80
May 12 10:36:14 d01 kernel: [<ffffffff8101cd69>] ? sched_clock+0x9/0x10
May 12 10:36:14 d01 kernel: [<ffffffff810bb675>] ? sched_clock_cpu+0x85/0xc0
May 12 10:36:14 d01 kernel: [<ffffffffa03b3a19>] con_work+0xb9/0x640 [libceph]
May 12 10:36:14 d01 kernel: [<ffffffff8109d5db>] process_one_work+0x17b/0x470
May 12 10:36:14 d01 kernel: [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
May 12 10:36:14 d01 kernel: [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
May 12 10:36:14 d01 kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
May 12 10:36:14 d01 kernel: [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
May 12 10:36:14 d01 kernel: [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140

Actions #7

Updated by Jason Dillaman almost 8 years ago

  • Project changed from Ceph to Linux kernel client
Actions #8

Updated by Ilya Dryomov almost 8 years ago

If you still have it in that state or if it reproduces again, could you please do as described in http://tracker.ceph.com/issues/14022#note-3?
I think the underlying issue is the same, but I can't tell who is holding lock from the above.

Actions #9

Updated by Ilya Dryomov almost 8 years ago

  • Related to Bug #14022: map_sem for read + request_mutex are held indefinitely added
Actions #10

Updated by Ilya Dryomov almost 8 years ago

  • Category set to rbd
  • Status changed from New to Need More Info
  • Assignee set to Ilya Dryomov
Actions #11

Updated by Shinobu Kinjo almost 8 years ago

Just to track
10k write / hour => No hung up.

Actions #12

Updated by Ilya Dryomov almost 8 years ago

  • Category changed from rbd to libceph
  • Priority changed from High to Low

OSD client has been rewritten in 4.7, but I'd be happy to look into this if the necessary info is gathered.

Actions #13

Updated by Shinobu Kinjo almost 8 years ago

In 4.7, same issue should not happen but 4.6 or older?
If you could tell me why you changed to libceph, it would be helpful to get more better understanding.

Actions #14

Updated by Ilya Dryomov almost 8 years ago

The high-level summary is in #9779. In particular, request_mutex was killed in favor of with per-OSD mutexes.

We haven't root caused this issue, so it could still be there in 4.7, but if you are going to try to reproduce, do it on the same kernel it occurred originally.

Actions #15

Updated by Ilya Dryomov about 5 years ago

  • Status changed from Need More Info to Resolved

No new occurrences, request_mutex is gone, closing.

Actions

Also available in: Atom PDF