Actions
Bug #45609
closedkclient: kclient node get stuck when the async create/unlink enabled
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Crash signature (v1):
Crash signature (v2):
Description
This issue was initially from Zheng Yan, and locally I have reproduced it by running for 5 hours:
Steps:
1, enable async create/unlink ops by specifying the `-o nowsync` when mounting
2, in one bash terminal run:
# for i in {0..1000}; do pkill -9 ceph-mds; sleep 30; ./bin/ceph-mds -i a -c /data/ceph/build/ceph.conf; sleep 20; done
3, in another bash terminal run:
# for i in {0..1000}; do for i in {0..10}; do mkdir /mnt/cephfs/dir_$i; touch /mnt/cephfs/dir_$i/$i; touch /mnt/cephfs/dir_$i/$i_1; echo "create /mnt/cephfs/dir_$i/$i"; done; sleep 2; for i in {0..10}; do rm -rf /mnt/cephfs/dir_$i; echo "unlinking /mnt/cephfs/dir_$i"; done; done
<4>[15912.549018] libceph: mds0 (1)10.72.36.245:6827 socket closed (con state OPEN)
<4>[15912.623341] libceph: mds0 (1)10.72.36.245:6827 socket error on write
<4>[15913.327196] libceph: mds0 (1)10.72.36.245:6827 socket error on write
<4>[15914.351224] libceph: mds0 (1)10.72.36.245:6827 socket error on write
<4>[15916.335132] libceph: mds0 (1)10.72.36.245:6827 socket error on write
<4>[15920.366979] libceph: mds0 (1)10.72.36.245:6827 socket error on write
<4>[15928.494697] libceph: mds0 (1)10.72.36.245:6827 socket error on write
<6>[15949.636088] ceph: mds0 reconnect start
<6>[15949.636099] ceph: session 00000000f871e47b state reconnecting
<6>[15949.641863] ceph: mds0 reconnect success
<6>[15952.723364] ceph: mds0 recovery completed
<6>[15981.740583] libceph: mon2 (1)10.72.36.245:40083 session lost, hunting for new mon
<3>[16097.960293] INFO: task kworker/18:1:32111 blocked for more than 122 seconds.
<3>[16097.960860] Tainted: G E 5.7.0-rc5+ #80
<3>[16097.961332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16097.961868] kworker/18:1 D 0 32111 2 0x80004080
<6>[16097.962151] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16097.962188] Call Trace:
<4>[16097.962353] ? __schedule+0x276/0x6e0
<4>[16097.962359] ? schedule+0x40/0xb0
<4>[16097.962364] ? schedule_preempt_disabled+0xa/0x10
<4>[16097.962368] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16097.962460] ? kick_requests+0x21/0x100 [ceph]
<4>[16097.962485] ? ceph_mdsc_handle_mdsmap+0x19c/0x5f0 [ceph]
<4>[16097.962503] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16097.962523] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16097.962580] ? dispatch+0x77/0x930 [libceph]
<4>[16097.962602] ? try_read+0x78b/0x11e0 [libceph]
<4>[16097.962619] ? __switch_to_asm+0x40/0x70
<4>[16097.962623] ? __switch_to_asm+0x34/0x70
<4>[16097.962627] ? __switch_to_asm+0x40/0x70
<4>[16097.962631] ? __switch_to_asm+0x34/0x70
<4>[16097.962635] ? __switch_to_asm+0x40/0x70
<4>[16097.962654] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16097.962713] ? process_one_work+0x1ad/0x370
<4>[16097.962717] ? worker_thread+0x30/0x390
<4>[16097.962722] ? create_worker+0x1a0/0x1a0
<4>[16097.962737] ? kthread+0x112/0x130
<4>[16097.962742] ? kthread_park+0x80/0x80
<4>[16097.962747] ? ret_from_fork+0x35/0x40
<3>[16097.962758] INFO: task kworker/25:1:1747 blocked for more than 122 seconds.
<3>[16097.963233] Tainted: G E 5.7.0-rc5+ #80
<3>[16097.963792] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16097.964298] kworker/25:1 D 0 1747 2 0x80004080
<6>[16097.964325] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16097.964331] Call Trace:
<4>[16097.964340] ? __schedule+0x276/0x6e0
<4>[16097.964344] ? schedule+0x40/0xb0
<4>[16097.964347] ? schedule_preempt_disabled+0xa/0x10
<4>[16097.964351] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16097.964376] ? handle_reply+0x33f/0x6f0 [ceph]
<4>[16097.964407] ? dispatch+0xa6/0xbc0 [ceph]
<4>[16097.964429] ? read_partial_message+0x214/0x770 [libceph]
<4>[16097.964449] ? try_read+0x78b/0x11e0 [libceph]
<4>[16097.964454] ? __switch_to_asm+0x40/0x70
<4>[16097.964458] ? __switch_to_asm+0x34/0x70
<4>[16097.964461] ? __switch_to_asm+0x40/0x70
<4>[16097.964465] ? __switch_to_asm+0x34/0x70
<4>[16097.964470] ? __switch_to_asm+0x40/0x70
<4>[16097.964489] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16097.964494] ? process_one_work+0x1ad/0x370
<4>[16097.964498] ? worker_thread+0x30/0x390
<4>[16097.964501] ? create_worker+0x1a0/0x1a0
<4>[16097.964506] ? kthread+0x112/0x130
<4>[16097.964511] ? kthread_park+0x80/0x80
<4>[16097.964516] ? ret_from_fork+0x35/0x40
<3>[16097.964542] INFO: task kworker/27:2:25505 blocked for more than 122 seconds.
<3>[16097.965033] Tainted: G E 5.7.0-rc5+ #80
<3>[16097.965521] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16097.966053] kworker/27:2 D 0 25505 2 0x80004080
<6>[16097.966090] Workqueue: events delayed_work [ceph]
<4>[16097.966093] Call Trace:
<4>[16097.966100] ? __schedule+0x276/0x6e0
<4>[16097.966104] ? __switch_to_asm+0x40/0x70
<4>[16097.966107] ? schedule+0x40/0xb0
<4>[16097.966137] ? schedule_preempt_disabled+0xa/0x10
<4>[16097.966140] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16097.966144] ? __switch_to_asm+0x40/0x70
<4>[16097.966148] ? __switch_to_asm+0x40/0x70
<4>[16097.966152] ? __switch_to_asm+0x34/0x70
<4>[16097.966176] ? delayed_work+0x2e/0x290 [ceph]
<4>[16097.966181] ? process_one_work+0x1ad/0x370
<4>[16097.966186] ? worker_thread+0x30/0x390
<4>[16097.966189] ? create_worker+0x1a0/0x1a0
<4>[16097.966194] ? kthread+0x112/0x130
<4>[16097.966198] ? kthread_park+0x80/0x80
<4>[16097.966203] ? ret_from_fork+0x35/0x40
<3>[16220.835757] INFO: task kworker/18:1:32111 blocked for more than 245 seconds.
<3>[16220.836212] Tainted: G E 5.7.0-rc5+ #80
<3>[16220.836682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16220.837213] kworker/18:1 D 0 32111 2 0x80004080
<6>[16220.837253] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16220.837256] Call Trace:
<4>[16220.837270] ? __schedule+0x276/0x6e0
<4>[16220.837274] ? schedule+0x40/0xb0
<4>[16220.837277] ? schedule_preempt_disabled+0xa/0x10
<4>[16220.837281] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16220.837310] ? kick_requests+0x21/0x100 [ceph]
<4>[16220.837335] ? ceph_mdsc_handle_mdsmap+0x19c/0x5f0 [ceph]
<4>[16220.837372] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16220.837386] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16220.837409] ? dispatch+0x77/0x930 [libceph]
<4>[16220.837429] ? try_read+0x78b/0x11e0 [libceph]
<4>[16220.837437] ? __switch_to_asm+0x40/0x70
<4>[16220.837441] ? __switch_to_asm+0x34/0x70
<4>[16220.837444] ? __switch_to_asm+0x40/0x70
<4>[16220.837448] ? __switch_to_asm+0x34/0x70
<4>[16220.837452] ? __switch_to_asm+0x40/0x70
<4>[16220.837472] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16220.837478] ? process_one_work+0x1ad/0x370
<4>[16220.837482] ? worker_thread+0x30/0x390
<4>[16220.837486] ? create_worker+0x1a0/0x1a0
<4>[16220.837491] ? kthread+0x112/0x130
<4>[16220.837496] ? kthread_park+0x80/0x80
<4>[16220.837501] ? ret_from_fork+0x35/0x40
<3>[16220.837516] INFO: task kworker/25:1:1747 blocked for more than 245 seconds.
<3>[16220.838157] Tainted: G E 5.7.0-rc5+ #80
<3>[16220.838702] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16220.839220] kworker/25:1 D 0 1747 2 0x80004080
<6>[16220.839248] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16220.839251] Call Trace:
<4>[16220.839258] ? __schedule+0x276/0x6e0
<4>[16220.839262] ? schedule+0x40/0xb0
<4>[16220.839266] ? schedule_preempt_disabled+0xa/0x10
<4>[16220.839270] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16220.839296] ? handle_reply+0x33f/0x6f0 [ceph]
<4>[16220.839318] ? dispatch+0xa6/0xbc0 [ceph]
<4>[16220.839337] ? read_partial_message+0x214/0x770 [libceph]
<4>[16220.839357] ? try_read+0x78b/0x11e0 [libceph]
<4>[16220.839363] ? __switch_to_asm+0x40/0x70
<4>[16220.839367] ? __switch_to_asm+0x34/0x70
<4>[16220.839371] ? __switch_to_asm+0x40/0x70
<4>[16220.839375] ? __switch_to_asm+0x34/0x70
<4>[16220.839378] ? __switch_to_asm+0x40/0x70
<4>[16220.839397] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16220.839402] ? process_one_work+0x1ad/0x370
<4>[16220.839406] ? worker_thread+0x30/0x390
<4>[16220.839409] ? create_worker+0x1a0/0x1a0
<4>[16220.839414] ? kthread+0x112/0x130
<4>[16220.839418] ? kthread_park+0x80/0x80
<4>[16220.839423] ? ret_from_fork+0x35/0x40
<3>[16220.839436] INFO: task kworker/27:2:25505 blocked for more than 245 seconds.
<3>[16220.839982] Tainted: G E 5.7.0-rc5+ #80
<3>[16220.840488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16220.841036] kworker/27:2 D 0 25505 2 0x80004080
<6>[16220.841066] Workqueue: events delayed_work [ceph]
<4>[16220.841075] Call Trace:
<4>[16220.841081] ? __schedule+0x276/0x6e0
<4>[16220.841085] ? __switch_to_asm+0x40/0x70
<4>[16220.841089] ? schedule+0x40/0xb0
<4>[16220.841104] ? schedule_preempt_disabled+0xa/0x10
<4>[16220.841114] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16220.841123] ? __switch_to_asm+0x40/0x70
<4>[16220.841130] ? __switch_to_asm+0x40/0x70
<4>[16220.841138] ? __switch_to_asm+0x34/0x70
<4>[16220.841161] ? delayed_work+0x2e/0x290 [ceph]
<4>[16220.841170] ? process_one_work+0x1ad/0x370
<4>[16220.841174] ? worker_thread+0x30/0x390
<4>[16220.841181] ? create_worker+0x1a0/0x1a0
<4>[16220.841186] ? kthread+0x112/0x130
<4>[16220.841194] ? kthread_park+0x80/0x80
<4>[16220.841199] ? ret_from_fork+0x35/0x40
<3>[16343.711131] INFO: task kworker/18:1:32111 blocked for more than 368 seconds.
<3>[16343.711610] Tainted: G E 5.7.0-rc5+ #80
<3>[16343.712057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16343.712586] kworker/18:1 D 0 32111 2 0x80004080
<6>[16343.712642] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16343.712647] Call Trace:
<4>[16343.712713] ? __schedule+0x276/0x6e0
<4>[16343.712718] ? schedule+0x40/0xb0
<4>[16343.712721] ? schedule_preempt_disabled+0xa/0x10
<4>[16343.712725] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16343.712758] ? kick_requests+0x21/0x100 [ceph]
<4>[16343.712783] ? ceph_mdsc_handle_mdsmap+0x19c/0x5f0 [ceph]
<4>[16343.712799] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16343.712813] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16343.712835] ? dispatch+0x77/0x930 [libceph]
<4>[16343.712855] ? try_read+0x78b/0x11e0 [libceph]
<4>[16343.712865] ? __switch_to_asm+0x40/0x70
<4>[16343.712869] ? __switch_to_asm+0x34/0x70
<4>[16343.712873] ? __switch_to_asm+0x40/0x70
<4>[16343.712876] ? __switch_to_asm+0x34/0x70
<4>[16343.712880] ? __switch_to_asm+0x40/0x70
<4>[16343.712899] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16343.712906] ? process_one_work+0x1ad/0x370
<4>[16343.712909] ? worker_thread+0x30/0x390
<4>[16343.712918] ? create_worker+0x1a0/0x1a0
<4>[16343.712949] ? kthread+0x112/0x130
<4>[16343.712954] ? kthread_park+0x80/0x80
<4>[16343.712959] ? ret_from_fork+0x35/0x40
<3>[16343.712970] INFO: task kworker/25:1:1747 blocked for more than 368 seconds.
<3>[16343.713489] Tainted: G E 5.7.0-rc5+ #80
<3>[16343.714003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16343.714498] kworker/25:1 D 0 1747 2 0x80004080
<6>[16343.714536] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16343.714538] Call Trace:
<4>[16343.714543] ? __schedule+0x276/0x6e0
<4>[16343.714547] ? schedule+0x40/0xb0
<4>[16343.714550] ? schedule_preempt_disabled+0xa/0x10
<4>[16343.714554] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16343.714580] ? handle_reply+0x33f/0x6f0 [ceph]
<4>[16343.714604] ? dispatch+0xa6/0xbc0 [ceph]
<4>[16343.714626] ? read_partial_message+0x214/0x770 [libceph]
<4>[16343.714646] ? try_read+0x78b/0x11e0 [libceph]
<4>[16343.714651] ? __switch_to_asm+0x40/0x70
<4>[16343.714654] ? __switch_to_asm+0x34/0x70
<4>[16343.714658] ? __switch_to_asm+0x40/0x70
<4>[16343.714662] ? __switch_to_asm+0x34/0x70
<4>[16343.714665] ? __switch_to_asm+0x40/0x70
<4>[16343.714685] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16343.714689] ? process_one_work+0x1ad/0x370
<4>[16343.714693] ? worker_thread+0x30/0x390
<4>[16343.714697] ? create_worker+0x1a0/0x1a0
<4>[16343.714701] ? kthread+0x112/0x130
<4>[16343.714706] ? kthread_park+0x80/0x80
<4>[16343.714710] ? ret_from_fork+0x35/0x40
<3>[16343.714724] INFO: task kworker/27:2:25505 blocked for more than 368 seconds.
<3>[16343.715235] Tainted: G E 5.7.0-rc5+ #80
<3>[16343.715724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16343.716274] kworker/27:2 D 0 25505 2 0x80004080
<6>[16343.716304] Workqueue: events delayed_work [ceph]
<4>[16343.716311] Call Trace:
<4>[16343.716319] ? __schedule+0x276/0x6e0
<4>[16343.716327] ? __switch_to_asm+0x40/0x70
<4>[16343.716331] ? schedule+0x40/0xb0
<4>[16343.716338] ? schedule_preempt_disabled+0xa/0x10
<4>[16343.716342] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16343.716349] ? __switch_to_asm+0x40/0x70
<4>[16343.716353] ? __switch_to_asm+0x40/0x70
<4>[16343.716361] ? __switch_to_asm+0x34/0x70
<4>[16343.716397] ? delayed_work+0x2e/0x290 [ceph]
<4>[16343.716405] ? process_one_work+0x1ad/0x370
<4>[16343.716409] ? worker_thread+0x30/0x390
<4>[16343.716413] ? create_worker+0x1a0/0x1a0
<4>[16343.716417] ? kthread+0x112/0x130
<4>[16343.716422] ? kthread_park+0x80/0x80
<4>[16343.716431] ? ret_from_fork+0x35/0x40
<3>[16466.586567] INFO: task kworker/18:1:32111 blocked for more than 491 seconds.
<3>[16466.586820] Tainted: G E 5.7.0-rc5+ #80
<3>[16466.587063] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16466.587275] kworker/18:1 D 0 32111 2 0x80004080
<6>[16466.587311] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16466.587318] Call Trace:
<4>[16466.587374] ? __schedule+0x276/0x6e0
<4>[16466.587376] ? schedule+0x40/0xb0
<4>[16466.587378] ? schedule_preempt_disabled+0xa/0x10
<4>[16466.587380] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16466.587407] ? kick_requests+0x21/0x100 [ceph]
<4>[16466.587424] ? ceph_mdsc_handle_mdsmap+0x19c/0x5f0 [ceph]
<4>[16466.587448] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16466.587461] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16466.587472] ? dispatch+0x77/0x930 [libceph]
<4>[16466.587489] ? try_read+0x78b/0x11e0 [libceph]
<4>[16466.587499] ? __switch_to_asm+0x40/0x70
<4>[16466.587537] ? __switch_to_asm+0x34/0x70
<4>[16466.587545] ? __switch_to_asm+0x40/0x70
<4>[16466.587553] ? __switch_to_asm+0x34/0x70
<4>[16466.587561] ? __switch_to_asm+0x40/0x70
<4>[16466.587576] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16466.587581] ? process_one_work+0x1ad/0x370
<4>[16466.587583] ? worker_thread+0x30/0x390
<4>[16466.587585] ? create_worker+0x1a0/0x1a0
<4>[16466.587589] ? kthread+0x112/0x130
<4>[16466.587591] ? kthread_park+0x80/0x80
<4>[16466.587594] ? ret_from_fork+0x35/0x40
Actions