Project

General

Profile

Actions

Bug #50826

closed

kceph: stock RHEL kernel hangs on snaptests with mon|osd thrashers

Added by Patrick Donnelly almost 3 years ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/ceph/teuthology-archive/pdonnell-2021-05-14_21:45:42-fs-master-distro-basic-smithi/6115757/teuthology.log

and

/ceph/teuthology-archive/pdonnell-2021-05-14_21:45:42-fs-master-distro-basic-smithi/6115769/teuthology.log

I see:

[ 2459.191432] INFO: task git:214273 blocked for more than 120 seconds.
[ 2459.198266]       Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
[ 2459.204471] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2459.212884] git             D    0 214273 136000 0x00000080
[ 2459.218901] Call Trace:
[ 2459.221802]  __schedule+0x2a6/0x700
[ 2459.225931]  ? __dentry_kill+0x121/0x170
[ 2459.230363]  schedule+0x38/0xa0
[ 2459.233922]  io_schedule+0x12/0x40
[ 2459.237741]  __lock_page+0x141/0x240
[ 2459.241724]  ? file_check_and_advance_wb_err+0xd0/0xd0
[ 2459.247309]  pagecache_get_page+0x19a/0x2d0
[ 2459.252075]  grab_cache_page_write_begin+0x1f/0x40
[ 2459.257309]  ceph_write_begin+0x40/0x130 [ceph]
[ 2459.262265]  generic_perform_write+0xf4/0x1b0
[ 2459.267044]  ? file_update_time+0xed/0x130
[ 2459.271559]  ceph_write_iter+0xa75/0xc90 [ceph]
[ 2459.276495]  ? atime_needs_update+0x77/0xe0
[ 2459.281076]  ? touch_atime+0x33/0xe0
[ 2459.285318]  ? _copy_to_user+0x26/0x30
[ 2459.289542]  ? cp_new_stat+0x150/0x180
[ 2459.293712]  ? new_sync_write+0x124/0x170
[ 2459.298131]  ? ceph_fallocate+0x5f0/0x5f0 [ceph]
[ 2459.303149]  new_sync_write+0x124/0x170
[ 2459.307393]  vfs_write+0xa5/0x1a0
[ 2459.311111]  ksys_write+0x4f/0xb0
[ 2459.314826]  do_syscall_64+0x5b/0x1a0
[ 2459.318895]  entry_SYSCALL_64_after_hwframe+0x65/0xca

in one of the dmesg logs and

[ 4169.826043] libceph: reset on mds4
[ 4169.826044] ceph: mds4 closed our session
[ 4169.826045] ceph: mds4 reconnect start
[ 4169.826062] libceph: mds1 (1)172.21.15.74:6836 connection reset
[ 4169.826064] libceph: reset on mds1
[ 4169.826064] ceph: mds1 closed our session
[ 4169.826065] ceph: mds1 reconnect start
[ 4169.827221] ceph: mds1 reconnect denied
[ 4169.827231] ceph: mds4 reconnect denied
[ 4169.831925] libceph: reset on mds2
[ 4169.838044] libceph: reset on mds3
[ 4169.844154] ceph: mds2 closed our session
[ 4169.844154] ceph: mds2 reconnect start
[ 4169.861696] ceph: mds2 reconnect denied
[ 4169.863550] ceph: mds3 closed our session
[ 4169.911453] ceph: mds3 reconnect start
[ 4169.916937] ceph: mds3 reconnect denied
[ 4221.027310] libceph: mon0 (1)172.21.15.16:6789 socket closed (con state OPEN)
[ 4221.034962] libceph: mon0 (1)172.21.15.16:6789 session lost, hunting for new mon
.. ad infinitum

in the other


Related issues 1 (0 open1 closed)

Related to CephFS - Bug #50281: qa: untar_snap_rm timeoutResolvedJeff Layton

Actions
Actions

Also available in: Atom PDF