Actions
Bug #50826
closedkceph: stock RHEL kernel hangs on snaptests with mon|osd thrashers
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/ceph/teuthology-archive/pdonnell-2021-05-14_21:45:42-fs-master-distro-basic-smithi/6115757/teuthology.log
and
/ceph/teuthology-archive/pdonnell-2021-05-14_21:45:42-fs-master-distro-basic-smithi/6115769/teuthology.log
I see:
[ 2459.191432] INFO: task git:214273 blocked for more than 120 seconds. [ 2459.198266] Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1 [ 2459.204471] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2459.212884] git D 0 214273 136000 0x00000080 [ 2459.218901] Call Trace: [ 2459.221802] __schedule+0x2a6/0x700 [ 2459.225931] ? __dentry_kill+0x121/0x170 [ 2459.230363] schedule+0x38/0xa0 [ 2459.233922] io_schedule+0x12/0x40 [ 2459.237741] __lock_page+0x141/0x240 [ 2459.241724] ? file_check_and_advance_wb_err+0xd0/0xd0 [ 2459.247309] pagecache_get_page+0x19a/0x2d0 [ 2459.252075] grab_cache_page_write_begin+0x1f/0x40 [ 2459.257309] ceph_write_begin+0x40/0x130 [ceph] [ 2459.262265] generic_perform_write+0xf4/0x1b0 [ 2459.267044] ? file_update_time+0xed/0x130 [ 2459.271559] ceph_write_iter+0xa75/0xc90 [ceph] [ 2459.276495] ? atime_needs_update+0x77/0xe0 [ 2459.281076] ? touch_atime+0x33/0xe0 [ 2459.285318] ? _copy_to_user+0x26/0x30 [ 2459.289542] ? cp_new_stat+0x150/0x180 [ 2459.293712] ? new_sync_write+0x124/0x170 [ 2459.298131] ? ceph_fallocate+0x5f0/0x5f0 [ceph] [ 2459.303149] new_sync_write+0x124/0x170 [ 2459.307393] vfs_write+0xa5/0x1a0 [ 2459.311111] ksys_write+0x4f/0xb0 [ 2459.314826] do_syscall_64+0x5b/0x1a0 [ 2459.318895] entry_SYSCALL_64_after_hwframe+0x65/0xca
in one of the dmesg logs and
[ 4169.826043] libceph: reset on mds4 [ 4169.826044] ceph: mds4 closed our session [ 4169.826045] ceph: mds4 reconnect start [ 4169.826062] libceph: mds1 (1)172.21.15.74:6836 connection reset [ 4169.826064] libceph: reset on mds1 [ 4169.826064] ceph: mds1 closed our session [ 4169.826065] ceph: mds1 reconnect start [ 4169.827221] ceph: mds1 reconnect denied [ 4169.827231] ceph: mds4 reconnect denied [ 4169.831925] libceph: reset on mds2 [ 4169.838044] libceph: reset on mds3 [ 4169.844154] ceph: mds2 closed our session [ 4169.844154] ceph: mds2 reconnect start [ 4169.861696] ceph: mds2 reconnect denied [ 4169.863550] ceph: mds3 closed our session [ 4169.911453] ceph: mds3 reconnect start [ 4169.916937] ceph: mds3 reconnect denied [ 4221.027310] libceph: mon0 (1)172.21.15.16:6789 socket closed (con state OPEN) [ 4221.034962] libceph: mon0 (1)172.21.15.16:6789 session lost, hunting for new mon .. ad infinitum
in the other
Actions