Project

General

Profile

Actions

Support #12648

open

[old topic] OSD will stuck if running kernel rbd or rbd-fuse on the same node with OSD daemons

Added by Zhi Zhang over 8 years ago. Updated over 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:

Description

I know this is an old topic and not recommended all the time, but we are limited to the hardware quantity and this will be a great cost-saving for us, so want to check with you whether there is any possibility to achieve such usage.

Kernel version: 3.10.83 with more rbd fixes backported from higher kernel version
Ceph version: 0.94.1

Test: run multiple docker containers on OSD nodes and each container has one rbd mapped as storage; run fio test in each container

What we have tried:
1. use kernel rbd or rbd-fuse
2. set vm.min_free_kbytes=1024000 and vm.vfs_cache_pressure=200, because we observed free memory was very low and cached memory was very high when OSD daemon stuck

Above 2 trials are failed. The dmesg is as follow when OSD daemon stuck.

[ 1825.505283] INFO: task kswapd0:1160 blocked for more than 120 seconds.
[ 1825.505287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505288] kswapd0         D ffff88085fc11e00     0  1160      2 0x00000000
[ 1825.505293]  ffff880852fcb938 0000000000000002 ffff880852fcbfd8 0000000000011e00
[ 1825.505298]  ffff880852fcbfd8 0000000000011e00 ffffffff81f84440 ffff8810529c4b30
[ 1825.505300]  ffff88085fc126c0 ffff88087ffb06e8 0000000000000002 ffffffff81101230
[ 1825.505303] Call Trace:
[ 1825.505315]  [<ffffffff81101230>] ? wait_on_page_read+0x60/0x60
[ 1825.505322]  [<ffffffff81adf43d>] io_schedule+0x9d/0x130
[ 1825.505324]  [<ffffffff8110123e>] sleep_on_page+0xe/0x20
[ 1825.505327]  [<ffffffff81add120>] __wait_on_bit+0x60/0x90
[ 1825.505330]  [<ffffffff81100fff>] wait_on_page_bit+0x7f/0x90
[ 1825.505335]  [<ffffffff81065fe0>] ? autoremove_wake_function+0x40/0x40
[ 1825.505341]  [<ffffffff81112036>] shrink_page_list+0x1a6/0x850
[ 1825.505344]  [<ffffffff81112ba7>] shrink_inactive_list+0x147/0x400
[ 1825.505346]  [<ffffffff81113391>] shrink_lruvec+0x201/0x4a0
[ 1825.505350]  [<ffffffff811142d6>] mem_cgroup_shrink_node_zone+0x96/0x120
[ 1825.505354]  [<ffffffff8115c80f>] mem_cgroup_soft_limit_reclaim+0x1bf/0x4d0
[ 1825.505357]  [<ffffffff81114692>] balance_pgdat+0x332/0x5a0
[ 1825.505360]  [<ffffffff81114a56>] kswapd+0x156/0x3e0
[ 1825.505362]  [<ffffffff81065fa0>] ? wake_up_bit+0x30/0x30
[ 1825.505365]  [<ffffffff81114900>] ? balance_pgdat+0x5a0/0x5a0
[ 1825.505367]  [<ffffffff81065140>] kthread+0xc0/0xd0
[ 1825.505369]  [<ffffffff81065080>] ? insert_kthread_work+0x40/0x40
[ 1825.505373]  [<ffffffff81ae8938>] ret_from_fork+0x58/0x90
[ 1825.505375]  [<ffffffff81065080>] ? insert_kthread_work+0x40/0x40
[ 1825.505377] INFO: task kswapd1:1161 blocked for more than 120 seconds.
[ 1825.505377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505378] kswapd1         D ffff88107fcf1e00     0  1161      2 0x00000000
[ 1825.505380]  ffff880853a3f938 0000000000000002 ffff880853a3ffd8 0000000000011e00
[ 1825.505383]  ffff880853a3ffd8 0000000000011e00 ffff881052bc8000 ffff8810529c6440
[ 1825.505386]  ffff88107fcf26c0 ffff88107ffc3ae8 0000000000000002 ffffffff81101230
[ 1825.505389] Call Trace:
[ 1825.505391]  [<ffffffff81101230>] ? wait_on_page_read+0x60/0x60
[ 1825.505394]  [<ffffffff81adf43d>] io_schedule+0x9d/0x130
[ 1825.505396]  [<ffffffff8110123e>] sleep_on_page+0xe/0x20
[ 1825.505399]  [<ffffffff81add120>] __wait_on_bit+0x60/0x90
[ 1825.505401]  [<ffffffff81100fff>] wait_on_page_bit+0x7f/0x90
[ 1825.505403]  [<ffffffff81065fe0>] ? autoremove_wake_function+0x40/0x40
[ 1825.505406]  [<ffffffff81112036>] shrink_page_list+0x1a6/0x850
[ 1825.505409]  [<ffffffff81112ba7>] shrink_inactive_list+0x147/0x400
[ 1825.505412]  [<ffffffff81113391>] shrink_lruvec+0x201/0x4a0
[ 1825.505415]  [<ffffffff811142d6>] mem_cgroup_shrink_node_zone+0x96/0x120
[ 1825.505417]  [<ffffffff8115c80f>] mem_cgroup_soft_limit_reclaim+0x1bf/0x4d0
[ 1825.505420]  [<ffffffff81114692>] balance_pgdat+0x332/0x5a0
[ 1825.505423]  [<ffffffff81114a56>] kswapd+0x156/0x3e0
[ 1825.505425]  [<ffffffff81065fa0>] ? wake_up_bit+0x30/0x30
[ 1825.505427]  [<ffffffff81114900>] ? balance_pgdat+0x5a0/0x5a0
[ 1825.505429]  [<ffffffff81065140>] kthread+0xc0/0xd0
[ 1825.505431]  [<ffffffff81065080>] ? insert_kthread_work+0x40/0x40
[ 1825.505433]  [<ffffffff81ae8938>] ret_from_fork+0x58/0x90
[ 1825.505435]  [<ffffffff81065080>] ? insert_kthread_work+0x40/0x40
[ 1825.505470] INFO: task sap1002:9096 blocked for more than 120 seconds.
[ 1825.505471] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505472] sap1002         D ffff88107fc71e00     0  9096      1 0x00000004
[ 1825.505474]  ffff881052137db8 0000000000000002 ffff881052137fd8 0000000000011e00
[ 1825.505476]  ffff881052137fd8 0000000000011e00 ffff88085467b220 ffff88104d434b30
[ 1825.505479]  ffff88104d434b30 ffff88104d0abfb0 ffffffffffffffff ffff88104d0abfb8
[ 1825.505482] Call Trace:
[ 1825.505485]  [<ffffffff81adf169>] schedule+0x29/0x70
[ 1825.505487]  [<ffffffff81ae009d>] rwsem_down_read_failed+0xdd/0x120
[ 1825.505492]  [<ffffffff81459704>] call_rwsem_down_read_failed+0x14/0x30
[ 1825.505495]  [<ffffffff8116d3c2>] ? final_putname+0x22/0x50
[ 1825.505498]  [<ffffffff81addd37>] ? down_read+0x17/0x20
[ 1825.505502]  [<ffffffff8103d4cb>] get_mm_exe_file+0x1b/0x40
[ 1825.505505]  [<ffffffff811c5c03>] proc_exe_link+0x53/0xa0
[ 1825.505507]  [<ffffffff811c4ecf>] proc_pid_readlink+0x3f/0xe0
[ 1825.505510]  [<ffffffff8116760c>] SyS_readlinkat+0x8c/0x100
[ 1825.505512]  [<ffffffff8116769b>] SyS_readlink+0x1b/0x20
[ 1825.505514]  [<ffffffff81ae8bdb>] tracesys+0xdd/0xe2
[ 1825.505521] INFO: task base:10081 blocked for more than 120 seconds.
[ 1825.505522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505523] base            D ffff88107fcb1e00     0 10081      1 0x00000000
[ 1825.505525]  ffff880ff0e13cf8 0000000000000002 ffff880ff0e13fd8 0000000000011e00
[ 1825.505528]  ffff880ff0e13fd8 0000000000011e00 ffff88085467e440 ffff8810520fb220
[ 1825.505531]  ffff8810520fb220 ffff88104d0abfb0 ffffffffffffffff ffff88104d0abfb8
[ 1825.505533] Call Trace:
[ 1825.505536]  [<ffffffff81adf169>] schedule+0x29/0x70
[ 1825.505538]  [<ffffffff81ae009d>] rwsem_down_read_failed+0xdd/0x120
[ 1825.505541]  [<ffffffff81459704>] call_rwsem_down_read_failed+0x14/0x30
[ 1825.505544]  [<ffffffff81addd37>] ? down_read+0x17/0x20
[ 1825.505547]  [<ffffffff81129182>] __access_remote_vm+0x42/0x1c0
[ 1825.505549]  [<ffffffff81129e60>] access_process_vm+0x50/0x70
[ 1825.505552]  [<ffffffff811c3fda>] proc_pid_cmdline+0x8a/0x120
[ 1825.505554]  [<ffffffff811c52af>] proc_info_read+0x8f/0xe0
[ 1825.505558]  [<ffffffff811621bc>] vfs_read+0x9c/0x170
[ 1825.505560]  [<ffffffff81162c79>] SyS_read+0x49/0xa0
[ 1825.505562]  [<ffffffff81ae8bdb>] tracesys+0xdd/0xe2
[ 1825.505605] INFO: task ceph-osd:7911 blocked for more than 120 seconds.
[ 1825.505606] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505607] ceph-osd        D ffff88107fc71e00     0  7911   7898 0x00000000
[ 1825.505609]  ffff881017613498 0000000000000002 ffff881017613fd8 0000000000011e00
[ 1825.505611]  ffff881017613fd8 0000000000011e00 ffff88085467b220 ffff8810020a3220
[ 1825.505614]  ffff88107fc726c0 ffff88107ffcc4e8 0000000000000002 ffffffff81101230
[ 1825.505617] Call Trace:
[ 1825.505619]  [<ffffffff81101230>] ? wait_on_page_read+0x60/0x60
[ 1825.505622]  [<ffffffff81adf43d>] io_schedule+0x9d/0x130
[ 1825.505624]  [<ffffffff8110123e>] sleep_on_page+0xe/0x20
[ 1825.505627]  [<ffffffff81add120>] __wait_on_bit+0x60/0x90
[ 1825.505629]  [<ffffffff81100fff>] wait_on_page_bit+0x7f/0x90
[ 1825.505631]  [<ffffffff81065fe0>] ? autoremove_wake_function+0x40/0x40
[ 1825.505634]  [<ffffffff81112036>] shrink_page_list+0x1a6/0x850
[ 1825.505637]  [<ffffffff81112ba7>] shrink_inactive_list+0x147/0x400
[ 1825.505640]  [<ffffffff81113391>] shrink_lruvec+0x201/0x4a0
[ 1825.505643]  [<ffffffff811142d6>] mem_cgroup_shrink_node_zone+0x96/0x120
[ 1825.505645]  [<ffffffff8115c80f>] mem_cgroup_soft_limit_reclaim+0x1bf/0x4d0
[ 1825.505648]  [<ffffffff81113caf>] do_try_to_free_pages+0x21f/0x520
[ 1825.505651]  [<ffffffff81114081>] try_to_free_pages+0xd1/0x160
[ 1825.505654]  [<ffffffff8110a66c>] __alloc_pages_nodemask+0x67c/0xa30
[ 1825.505658]  [<ffffffff81141e69>] alloc_pages_current+0xa9/0x170
[ 1825.505661]  [<ffffffff81101a47>] __page_cache_alloc+0x87/0xb0
[ 1825.505663]  [<ffffffff81101eef>] grab_cache_page_write_begin+0x6f/0xd0
[ 1825.505667]  [<ffffffff81236ac8>] ext4_da_write_begin+0x98/0x310
[ 1825.505669]  [<ffffffff81101663>] generic_file_buffered_write+0x103/0x270
[ 1825.505672]  [<ffffffff811032e1>] __generic_file_aio_write+0x1c1/0x3d0
[ 1825.505675]  [<ffffffff8110354d>] generic_file_aio_write+0x5d/0xc0
[ 1825.505679]  [<ffffffff8122cf42>] ext4_file_write+0xa2/0x3e0
[ 1825.505685]  [<ffffffff81074c18>] ? try_to_wake_up+0x1b8/0x290
[ 1825.505689]  [<ffffffff81093720>] ? get_futex_key+0x140/0x290
[ 1825.505691]  [<ffffffff81161c90>] do_sync_write+0x80/0xb0
[ 1825.505694]  [<ffffffff8116234d>] vfs_write+0xbd/0x1e0
[ 1825.505697]  [<ffffffff81162d19>] SyS_write+0x49/0xa0
[ 1825.505699]  [<ffffffff81ae8bdb>] tracesys+0xdd/0xe2
[ 1825.505727] INFO: task ceph-osd:8402 blocked for more than 120 seconds.
[ 1825.505728] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505729] ceph-osd        D ffff88085fd71e00     0  8402   8389 0x00000000
[ 1825.505732]  ffff880ff0eff498 0000000000000002 ffff880ff0efffd8 0000000000011e00
[ 1825.505734]  ffff880ff0efffd8 0000000000011e00 ffff8808546a0000 ffff88105032e440
[ 1825.505737]  ffff88085fd726c0 ffff88087ff8b1e8 0000000000000002 ffffffff81101230
[ 1825.505740] Call Trace:
[ 1825.505742]  [<ffffffff81101230>] ? wait_on_page_read+0x60/0x60
[ 1825.505745]  [<ffffffff81adf43d>] io_schedule+0x9d/0x130
[ 1825.505747]  [<ffffffff8110123e>] sleep_on_page+0xe/0x20
[ 1825.505750]  [<ffffffff81add120>] __wait_on_bit+0x60/0x90
[ 1825.505752]  [<ffffffff81100fff>] wait_on_page_bit+0x7f/0x90
[ 1825.505754]  [<ffffffff81065fe0>] ? autoremove_wake_function+0x40/0x40
[ 1825.505757]  [<ffffffff81112036>] shrink_page_list+0x1a6/0x850
[ 1825.505760]  [<ffffffff81112ba7>] shrink_inactive_list+0x147/0x400
[ 1825.505763]  [<ffffffff81113391>] shrink_lruvec+0x201/0x4a0
[ 1825.505765]  [<ffffffff811142d6>] mem_cgroup_shrink_node_zone+0x96/0x120
[ 1825.505768]  [<ffffffff8115c80f>] mem_cgroup_soft_limit_reclaim+0x1bf/0x4d0
[ 1825.505771]  [<ffffffff81113caf>] do_try_to_free_pages+0x21f/0x520
[ 1825.505773]  [<ffffffff81114081>] try_to_free_pages+0xd1/0x160
[ 1825.505775]  [<ffffffff8110a66c>] __alloc_pages_nodemask+0x67c/0xa30
[ 1825.505778]  [<ffffffff81141e69>] alloc_pages_current+0xa9/0x170
[ 1825.505781]  [<ffffffff81101a47>] __page_cache_alloc+0x87/0xb0
[ 1825.505783]  [<ffffffff81101eef>] grab_cache_page_write_begin+0x6f/0xd0
[ 1825.505786]  [<ffffffff81236ac8>] ext4_da_write_begin+0x98/0x310
[ 1825.505788]  [<ffffffff81101663>] generic_file_buffered_write+0x103/0x270
[ 1825.505791]  [<ffffffff811032e1>] __generic_file_aio_write+0x1c1/0x3d0
[ 1825.505794]  [<ffffffff8110354d>] generic_file_aio_write+0x5d/0xc0
[ 1825.505796]  [<ffffffff8122cf42>] ext4_file_write+0xa2/0x3e0
[ 1825.505799]  [<ffffffff81074c18>] ? try_to_wake_up+0x1b8/0x290
[ 1825.505801]  [<ffffffff81093720>] ? get_futex_key+0x140/0x290
[ 1825.505803]  [<ffffffff81161c90>] do_sync_write+0x80/0xb0
[ 1825.505806]  [<ffffffff8116234d>] vfs_write+0xbd/0x1e0
[ 1825.505809]  [<ffffffff81162d19>] SyS_write+0x49/0xa0
[ 1825.505810]  [<ffffffff81ae8bdb>] tracesys+0xdd/0xe2
[ 1825.505820] INFO: task ceph-osd:9305 blocked for more than 120 seconds.
[ 1825.505821] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505822] ceph-osd        D ffff88107fcb1e00     0  9305   8389 0x00000000
[ 1825.505824]  ffff880843755e68 0000000000000002 ffff880843755fd8 0000000000011e00
[ 1825.505827]  ffff880843755fd8 0000000000011e00 ffff88085467e440 ffff88080be0cb30
[ 1825.505829]  ffff88080be0cb30 ffff88104d0abfb0 ffff88104d0abfb8 ffffffff00000000
[ 1825.505832] Call Trace:
[ 1825.505835]  [<ffffffff81adf169>] schedule+0x29/0x70
[ 1825.505837]  [<ffffffff81ae01d5>] rwsem_down_write_failed+0xf5/0x18f
[ 1825.505840]  [<ffffffff81459733>] call_rwsem_down_write_failed+0x13/0x20
[ 1825.505843]  [<ffffffff81addd64>] ? down_write+0x24/0x26
[ 1825.505845]  [<ffffffff8112e665>] SyS_brk+0x35/0x170
[ 1825.505847]  [<ffffffff81ae8bdb>] tracesys+0xdd/0xe2
[ 1825.505852] INFO: task ceph-osd:9327 blocked for more than 120 seconds.
[ 1825.505852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505853] ceph-osd        D ffff88085fcd1e00     0  9327   8389 0x00000000
[ 1825.505855]  ffff8807d987bd58 0000000000000002 ffff8807d987bfd8 0000000000011e00
[ 1825.505858]  ffff8807d987bfd8 0000000000011e00 ffff880854688000 ffff88084e991910
[ 1825.505860]  ffff88084e991910 ffff88104d0abfb0 ffffffffffffffff ffff88104d0abfb8
[ 1825.505863] Call Trace:
[ 1825.505866]  [<ffffffff81adf169>] schedule+0x29/0x70
[ 1825.505868]  [<ffffffff81ae009d>] rwsem_down_read_failed+0xdd/0x120
[ 1825.505870]  [<ffffffff81074d40>] ? wake_up_state+0x10/0x20
[ 1825.505873]  [<ffffffff81459704>] call_rwsem_down_read_failed+0x14/0x30
[ 1825.505875]  [<ffffffff81addd37>] ? down_read+0x17/0x20
[ 1825.505879]  [<ffffffff81ae3fb1>] __do_page_fault+0x1a1/0x530
[ 1825.505883]  [<ffffffff810c7eac>] ? acct_account_cputime+0x1c/0x20
[ 1825.505885]  [<ffffffff8107758b>] ? account_user_time+0x8b/0xa0
[ 1825.505887]  [<ffffffff81077a45>] ? vtime_account_user+0x55/0x70
[ 1825.505890]  [<ffffffff81ae436c>] do_page_fault+0x2c/0x50
[ 1825.505892]  [<ffffffff81ae0b22>] page_fault+0x22/0x30
[ 1825.505897] INFO: task ceph-osd:9376 blocked for more than 120 seconds.
[ 1825.505897] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505898] ceph-osd        D ffff88107fc31e00     0  9376   8389 0x00000000
[ 1825.505900]  ffff88104e8c1938 0000000000000002 ffff88104e8c1fd8 0000000000011e00
[ 1825.505903]  ffff88104e8c1fd8 0000000000011e00 ffff880854678000 ffff881002cd0000
[ 1825.505905]  ffff881002cd0000 ffff88104d0abfb0 ffffffffffffffff ffff88104d0abfb8
[ 1825.505908] Call Trace:
[ 1825.505911]  [<ffffffff81adf169>] schedule+0x29/0x70
[ 1825.505913]  [<ffffffff81ae009d>] rwsem_down_read_failed+0xdd/0x120
[ 1825.505915]  [<ffffffff81459704>] call_rwsem_down_read_failed+0x14/0x30
[ 1825.505918]  [<ffffffff81457fd9>] ? copy_user_enhanced_fast_string+0x9/0x20
[ 1825.505920]  [<ffffffff81457fd9>] ? copy_user_enhanced_fast_string+0x9/0x20
[ 1825.505923]  [<ffffffff81addd37>] ? down_read+0x17/0x20
[ 1825.505925]  [<ffffffff81ae3fb1>] __do_page_fault+0x1a1/0x530
[ 1825.505929]  [<ffffffff81068606>] ? __remove_hrtimer+0x46/0xa0
[ 1825.505932]  [<ffffffff81068a08>] ? hrtimer_try_to_cancel+0x48/0xd0
[ 1825.505934]  [<ffffffff81addc4b>] ? schedule_hrtimeout_range_clock+0xbb/0x150
[ 1825.505937]  [<ffffffff81ae436c>] do_page_fault+0x2c/0x50
[ 1825.505939]  [<ffffffff81ae0b22>] page_fault+0x22/0x30
[ 1825.505941]  [<ffffffff81457fd9>] ? copy_user_enhanced_fast_string+0x9/0x20
[ 1825.505944]  [<ffffffff8145d71a>] ? memcpy_toiovec+0x4a/0x90
[ 1825.505947]  [<ffffffff819b963d>] skb_copy_datagram_iovec+0x5d/0x270
[ 1825.505950]  [<ffffffff81ae04b6>] ? _raw_spin_unlock_bh+0x16/0x20
[ 1825.505954]  [<ffffffff81a0df37>] tcp_recvmsg+0x1c7/0xa80
[ 1825.505958]  [<ffffffff81a3553c>] inet_recvmsg+0x6c/0x80
[ 1825.505962]  [<ffffffff819aad7e>] sock_recvmsg+0x7e/0xa0
[ 1825.505966]  [<ffffffff81024b27>] ? native_smp_send_reschedule+0x47/0x60
[ 1825.505969]  [<ffffffff81072965>] ? check_preempt_curr+0x75/0xa0
[ 1825.505971]  [<ffffffff810729a9>] ? ttwu_do_wakeup+0x19/0xd0
[ 1825.505974]  [<ffffffff81072b3d>] ? ttwu_do_activate.constprop.81+0x5d/0x70
[ 1825.505976]  [<ffffffff819abc78>] SYSC_recvfrom+0xe8/0x160
[ 1825.505979]  [<ffffffff810c7eac>] ? acct_account_cputime+0x1c/0x20
[ 1825.505981]  [<ffffffff8107758b>] ? account_user_time+0x8b/0xa0
[ 1825.505983]  [<ffffffff81077a45>] ? vtime_account_user+0x55/0x70
[ 1825.505988]  [<ffffffff8100e4f8>] ? syscall_trace_enter+0x18/0x210
[ 1825.505991]  [<ffffffff819ad3ae>] SyS_recvfrom+0xe/0x10
[ 1825.505993]  [<ffffffff81ae8bdb>] tracesys+0xdd/0xe2
[ 1825.505995] INFO: task ceph-osd:9377 blocked for more than 120 seconds.
[ 1825.505996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1825.505996] ceph-osd        D ffff88107fd11e00     0  9377   8389 0x00000000
[ 1825.505998]  ffff881050019938 0000000000000002 ffff881050019fd8 0000000000011e00
[ 1825.506001]  ffff881050019fd8 0000000000011e00 ffff8808546a4b30 ffff880ffe336440
[ 1825.506003]  ffff880ffe336440 ffff88104d0abfb0 ffffffffffffffff ffff88104d0abfb8
[ 1825.506006] Call Trace:
[ 1825.506009]  [<ffffffff81adf169>] schedule+0x29/0x70
[ 1825.506011]  [<ffffffff81ae009d>] rwsem_down_read_failed+0xdd/0x120
[ 1825.506014]  [<ffffffff81459704>] call_rwsem_down_read_failed+0x14/0x30
[ 1825.506016]  [<ffffffff81457fd9>] ? copy_user_enhanced_fast_string+0x9/0x20
[ 1825.506018]  [<ffffffff81457fd9>] ? copy_user_enhanced_fast_string+0x9/0x20
[ 1825.506021]  [<ffffffff81addd37>] ? down_read+0x17/0x20
[ 1825.506023]  [<ffffffff81ae3fb1>] __do_page_fault+0x1a1/0x530
[ 1825.506025]  [<ffffffff81068606>] ? __remove_hrtimer+0x46/0xa0
[ 1825.506028]  [<ffffffff81068a08>] ? hrtimer_try_to_cancel+0x48/0xd0
[ 1825.506031]  [<ffffffff81addc4b>] ? schedule_hrtimeout_range_clock+0xbb/0x150
[ 1825.506033]  [<ffffffff81ae436c>] do_page_fault+0x2c/0x50
[ 1825.506035]  [<ffffffff81ae0b22>] page_fault+0x22/0x30
[ 1825.506038]  [<ffffffff81457fd9>] ? copy_user_enhanced_fast_string+0x9/0x20
[ 1825.506040]  [<ffffffff8145d71a>] ? memcpy_toiovec+0x4a/0x90
[ 1825.506042]  [<ffffffff819b963d>] skb_copy_datagram_iovec+0x5d/0x270
[ 1825.506044]  [<ffffffff81ae04b6>] ? _raw_spin_unlock_bh+0x16/0x20
[ 1825.506046]  [<ffffffff81a0df37>] tcp_recvmsg+0x1c7/0xa80
[ 1825.506049]  [<ffffffff81a3553c>] inet_recvmsg+0x6c/0x80
[ 1825.506051]  [<ffffffff819aad7e>] sock_recvmsg+0x7e/0xa0
[ 1825.506053]  [<ffffffff81024b27>] ? native_smp_send_reschedule+0x47/0x60
[ 1825.506055]  [<ffffffff81072965>] ? check_preempt_curr+0x75/0xa0
[ 1825.506057]  [<ffffffff810729a9>] ? ttwu_do_wakeup+0x19/0xd0
[ 1825.506060]  [<ffffffff81072b3d>] ? ttwu_do_activate.constprop.81+0x5d/0x70
[ 1825.506062]  [<ffffffff819abc78>] SYSC_recvfrom+0xe8/0x160
[ 1825.506065]  [<ffffffff810c7eac>] ? acct_account_cputime+0x1c/0x20
[ 1825.506067]  [<ffffffff8107758b>] ? account_user_time+0x8b/0xa0
[ 1825.506069]  [<ffffffff81077a45>] ? vtime_account_user+0x55/0x70
[ 1825.506071]  [<ffffffff8100e4f8>] ? syscall_trace_enter+0x18/0x210
[ 1825.506074]  [<ffffffff819ad3ae>] SyS_recvfrom+0xe/0x10
[ 1825.506076]  [<ffffffff81ae8bdb>] tracesys+0xdd/0xe2

Any comment will be appreciated.


Files

dmesg_before_hang (505 KB) dmesg_before_hang Zhi Zhang, 08/12/2015 10:46 AM
dmesg_just_hang (503 KB) dmesg_just_hang Zhi Zhang, 08/12/2015 10:47 AM
Actions #1

Updated by Ilya Dryomov over 8 years ago

This looks like a bunch of processes stuck trying to acquire mmap_sem, while another bunch is waiting for page writeback.

rbd-fuse is not meant for any serious use, so let's concentrate on the kernel client here. It's not immediately obvious what's to blame here. It would be nice to see who is holding the mmap_sem, along with what the rest of the tasks are up to. Try reproducing it, and when it gets stuck do

echo w >/proc/sysrq-trigger
echo d >/proc/sysrq-trigger # needs lockdep
echo t >/proc/sysrq-trigger

then wait for a while and repeat. A few samples should be enough. Make sure lockdep is enabled in your kernel.

Actions #2

Updated by Zhi Zhang over 8 years ago

Hi Ilya,

It's really hard to catch the timing to collect dmesg logs under our test scenario because of low memory at that point. Usually the system will hang when we found something got stuck and tried to enable /proc/sysrq-trigger. So I collect the dmesg (as attached dmesg_before_hang) when the memory is very low but system hasn't hung. After it is hung, echo anything to /proc/sysrq-trigger will also hang, so I only collect the dmesg again (as attached dmesg_just_hang).

In the dmesg_before_hang, there seems to be a deadlock:

[ 7196.038325] irq event stamp: 11794189
[ 7196.102178] hardirqs last  enabled at (11794189): [<ffffffff81b2bdac>] _raw_spin_unlock_irq+0x2c/0x30
[ 7196.264555] hardirqs last disabled at (11794188): [<ffffffff81b2bbc9>] _raw_spin_lock_irq+0x19/0x80
[ 7196.411608] softirqs last  enabled at (11794186): [<ffffffff810493f9>] __do_softirq+0x189/0x250
[ 7196.560717] softirqs last disabled at (11794131): [<ffffffff81b35f8c>] call_softirq+0x1c/0x30
[ 7196.704001] 
other info that might help us debug this:
[ 7196.804881]  Possible unsafe locking scenario:

[ 7196.874936]        CPU0
[ 7196.902291]        ----
[ 7196.943092]   lock(&(&ip->i_lock)->mr_lock);
[ 7197.020493]   <Interrupt>
[ 7197.067031]     lock(&(&ip->i_lock)->mr_lock);
[ 7197.148235] 
 *** DEADLOCK ***

[ 7197.251345] 2 locks held by kswapd1/1164:
[ 7197.321014]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811247ff>] shrink_slab+0x3f/0x300
[ 7197.450879]  #1:  (&type->s_umount_key#35){++++++}, at: [<ffffffff8117ac44>] grab_super_passive+0x44/0x90

After it is hung, restarting OSD will also fail.

2015-08-12 16:44:55.292043 7fd9283ea880  0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-osd, pid 15389
2015-08-12 16:44:55.299677 7fd9283ea880 10 filestore(/var/lib/ceph/osd/ceph-16) dump_stop
2015-08-12 16:44:55.305418 7fd9283ea880 10 ErasureCodePluginSelectJerasure: load: jerasure_sse4 
2015-08-12 16:44:55.309540 7fd9283ea880 10 load: jerasure load: lrc load: isa 
2015-08-12 16:44:55.309823 7fd9283ea880  5 filestore(/var/lib/ceph/osd/ceph-16) test_mount basedir /var/lib/ceph/osd/ceph-16 journal /var/lib/ceph/osd/ceph-16/journal
2015-08-12 16:44:55.309853 7fd9283ea880  0 filestore(/var/lib/ceph/osd/ceph-16) lock_fsid failed to lock /var/lib/ceph/osd/ceph-16/fsid, is another ceph-osd still running? (11) Resource temporarily unavailable
2015-08-12 16:44:55.309888 7fd9283ea880 -1 osd.16 0 OSD::pre_init: object store '/var/lib/ceph/osd/ceph-16' is currently in use. (Is ceph-osd already running?)
2015-08-12 16:44:55.309897 7fd9283ea880 -1  ** ERROR: osd pre_init failed: (16) Device or resource busy
Actions

Also available in: Atom PDF