Project

General

Profile

Actions

Bug #55236

open

qa: fs/snaps tests fails with "hit max job timeout"

Added by Venky Shankar about 2 years ago. Updated 7 months ago.

Status:
Triaged
Priority:
Normal
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa, task(medium)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seen here: https://pulpito.ceph.com/vshankar-2022-04-07_15:19:12-fs-wip-vshankar-testing-55110-20220407-173953-testing-default-smithi/6780931/

yaml matrix

Description: fs/thrash/workloads/{begin/{0-install 1-ceph 2-logrotate} clusters/1a5s-mds-1c-client conf/{client mds mon osd} distro/{rhel_8} mount/fuse msgr-failures/osd-mds-delay objectstore-ec/bluestore-comp-ec-root overrides/{frag prefetch_dirfrags/no prefetch_entire_dirfrags/no races session_timeout thrashosds-health whitelist_health whitelist_wrongly_marked_down} ranks/3 tasks/{1-thrash/osd 2-workunit/fs/snaps}}

fs/snaps runs `untar_snap_rm.sh' while a task thrashes OSDs. The test reaches a point where we see the following error:

2022-04-07T18:55:55.943 INFO:tasks.workunit.client.0.smithi131.stdout:'.snap/k/linux-2.6.33/arch/ia64/scripts/check-segrel.lds' -> './k/linux-2.6.33/arch/ia64/scripts/check-segrel.lds'
2022-04-07T18:55:55.944 INFO:tasks.workunit.client.0.smithi131.stdout:'.snap/k/linux-2.6.33/arch/ia64/scripts/check-serialize.S' -> './k/linux-2.6.33/arch/ia64/scripts/check-serialize.S'
2022-04-07T18:55:55.944 INFO:tasks.workunit.client.0.smithi131.stdout:'.snap/k/linux-2.6.33/arch/ia64/scripts/check-text-align.S' -> './k/linux-2.6.33/arch/ia64/scripts/check-text-align.S'
2022-04-07T18:55:55.944 INFO:tasks.workunit.client.0.smithi131.stdout:'.snap/k/linux-2.6.33/arch/ia64/scripts/pvcheck.sed' -> './k/linux-2.6.33/arch/ia64/scripts/pvcheck.sed'
2022-04-07T18:55:55.944 INFO:tasks.workunit.client.0.smithi131.stdout:'.snap/k/linux-2.6.33/arch/ia64/scripts/toolchain-flags' -> './k/linux-2.6.33/arch/ia64/scripts/toolchain-flags'
2022-04-07T18:55:55.945 INFO:tasks.workunit.client.0.smithi131.stdout:'.snap/k/linux-2.6.33/arch/ia64/scripts/unwcheck.py' -> './k/linux-2.6.33/arch/ia64/scripts/unwcheck.py'
2022-04-07T18:55:55.945 INFO:tasks.workunit.client.0.smithi131.stdout:'.snap/k/linux-2.6.33/arch/ia64/sn' -> './k/linux-2.6.33/arch/ia64/sn'
2022-04-07T18:55:55.945 INFO:tasks.workunit.client.0.smithi131.stdout:'.snap/k/linux-2.6.33/arch/ia64/sn/Makefile' -> './k/linux-2.6.33/arch/ia64/sn/Makefile'
2022-04-07T18:55:55.945 INFO:tasks.workunit.client.0.smithi131.stderr:cp: error reading '.snap/k/linux-2.6.33/arch/ia64/sn/Makefile': Connection timed out
2022-04-07T18:55:56.034 INFO:tasks.ceph.osd.1.smithi131.stderr:2022-04-07T18:55:56.032+0000 7f1b106c1700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 1  (PID: 299246) UID: 0
2022-04-07T18:55:56.042 INFO:teuthology.orchestra.run.smithi131.stderr:2022-04-07T18:55:56.040+0000 7f38f759e700  1  Processor -- start
2022-04-07T18:55:56.042 INFO:teuthology.orchestra.run.smithi131.stderr:2022-04-07T18:55:56.040+0000 7f38f759e700  1 --  start start
2022-04-07T18:55:56.045 INFO:teuthology.orchestra.run.smithi131.stderr:2022-04-07T18:55:56.043+0000 7f38f759e700  1 --2-  >> v2:172.21.15.173:3300/0 conn(0x7f38f004f3e0 0x7f38f00dee40 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2022-04-07T18:55:56.046 INFO:teuthology.orchestra.run.smithi131.stderr:2022-04-07T18:55:56.044+0000 7f38f759e700  1 --2-  >> [v2:172.21.15.173:3301/0,v1:172.21.15.173:6790/0] conn(0x7f38f00dcd40 0x7f38f00dc210 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect

Connection timed out when reading a file from a snapshot. This might not be related to CephFS as such and the timeout could be coming form the OSDs. That needs to be checked however. (If that's the case then this tracker could be moved to RADOS component).

Actions #1

Updated by Venky Shankar about 2 years ago

Another instance: https://pulpito.ceph.com/vshankar-2022-04-09_12:55:41-fs-wip-vshankar-testing-55110-20220408-203242-testing-default-smithi/6783880/

In this case, the job encountered an IO error:

2022-04-09T16:08:12.061 INFO:tasks.workunit.client.0.smithi050.stdout:removed 'k/linux-2.6.33/sound/soc/codecs/wm8903.h'
2022-04-09T16:08:12.062 INFO:tasks.workunit.client.0.smithi050.stdout:removed 'k/linux-2.6.33/sound/soc/codecs/wm8993.h'
2022-04-09T16:08:12.062 INFO:tasks.workunit.client.0.smithi050.stdout:removed 'k/linux-2.6.33/sound/soc/codecs/wm8523.c'
2022-04-09T16:08:12.062 INFO:tasks.workunit.client.0.smithi050.stdout:removed 'k/linux-2.6.33/sound/soc/codecs/wm8580.c'
2022-04-09T16:08:12.062 INFO:tasks.workunit.client.0.smithi050.stdout:removed 'k/linux-2.6.33/sound/soc/codecs/ad1938.h'
2022-04-09T16:08:12.062 INFO:tasks.workunit.client.0.smithi050.stdout:removed 'k/linux-2.6.33/sound/soc/codecs/ac97.h'
2022-04-09T16:08:12.063 INFO:tasks.workunit.client.0.smithi050.stdout:removed 'k/linux-2.6.33/sound/soc/codecs/ad1836.h'
2022-04-09T16:08:12.063 INFO:tasks.workunit.client.0.smithi050.stdout:removed 'k/linux-2.6.33/sound/soc/codecs/wm8960.c'
2022-04-09T16:08:12.064 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/ads117x.h': Input/output error
2022-04-09T16:08:12.064 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/max9877.h': Input/output error
2022-04-09T16:08:12.064 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/wm9713.c': Input/output error
2022-04-09T16:08:12.064 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/stac9766.h': Input/output error
2022-04-09T16:08:12.065 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/wm8728.c': Input/output error
2022-04-09T16:08:12.065 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/ad1938.c': Input/output error
2022-04-09T16:08:12.065 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/wm8903.c': Input/output error
2022-04-09T16:08:12.066 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/wm9081.c': Input/output error
2022-04-09T16:08:12.066 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/wm8510.c': Input/output error
2022-04-09T16:08:12.066 INFO:tasks.workunit.client.0.smithi050.stderr:rm: cannot remove 'k/linux-2.6.33/sound/soc/codecs/wm8350.c': Input/output error

The EIO might be from the MDS rather than the OSDs (in the earlier failed job from the tracker description).

Actions #2

Updated by Venky Shankar about 2 years ago

  • Status changed from New to Triaged
  • Assignee set to Kotresh Hiremath Ravishankar
Actions #3

Updated by Patrick Donnelly 7 months ago

  • Target version deleted (v18.0.0)
Actions

Also available in: Atom PDF