Project

General

Profile

Actions

Bug #50821

open

qa: untar_snap_rm failure during mds thrashing

Added by Patrick Donnelly almost 3 years ago. Updated 4 days ago.

Status:
New
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-05-14T22:51:46.078 INFO:tasks.workunit.client.0.smithi094.stderr:tar: linux-2.6.33/arch/microblaze: Cannot stat: Permission denied
2021-05-14T22:51:46.078 INFO:tasks.workunit.client.0.smithi094.stderr:tar: linux-2.6.33/arch: Cannot stat: Permission denied
2021-05-14T22:51:46.078 INFO:tasks.workunit.client.0.smithi094.stderr:tar: linux-2.6.33: Cannot stat: Permission denied
2021-05-14T22:51:46.078 INFO:tasks.workunit.client.0.smithi094.stderr:tar: Error is not recoverable: exiting now
2021-05-14T22:51:46.079 DEBUG:teuthology.orchestra.run:got remote process result: 2
2021-05-14T22:51:46.080 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
2021-05-14T22:51:46.080 DEBUG:teuthology.orchestra.run.smithi094:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2021-05-14T22:51:46.264 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/run_tasks.py", line 91, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_batrick_ceph_e78e41c7f45263bfc3d22dafa953b7e485aac84d/qa/tasks/workunit.py", line 147, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/github.com_batrick_ceph_e78e41c7f45263bfc3d22dafa953b7e485aac84d/qa/tasks/workunit.py", line 297, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_batrick_ceph_e78e41c7f45263bfc3d22dafa953b7e485aac84d/qa/tasks/workunit.py", line 425, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/orchestra/remote.py", line 509, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e8260827019668a766d85490/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed (workunit test fs/snaps/untar_snap_rm.sh) on smithi094 with status 2: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=e78e41c7f45263bfc3d22dafa953b7e485aac84d TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/untar_snap_rm.sh'

From: /ceph/teuthology-archive/pdonnell-2021-05-14_21:45:42-fs-master-distro-basic-smithi/6115751/teuthology.log

With RHEL stock kernel. Might be related to some other issues I've been suddenly seeing with the stock RHEL kernel.


Related issues 4 (3 open1 closed)

Related to CephFS - Bug #50823: qa: RuntimeError: timeout waiting for cluster to stabilizeNew

Actions
Related to CephFS - Bug #50824: qa: snaptest-git-ceph bus errorWon't FixXiubo Li

Actions
Related to CephFS - Bug #51278: mds: "FAILED ceph_assert(!segments.empty())"TriagedVenky Shankar

Actions
Related to CephFS - Bug #64707: suites/fsstress.sh hangs on one client - test times outNewXiubo Li

Actions
Actions #1

Updated by Patrick Donnelly almost 3 years ago

I don't think this is related to #50281 but may be.

Actions #2

Updated by Patrick Donnelly almost 3 years ago

  • Related to Bug #50823: qa: RuntimeError: timeout waiting for cluster to stabilize added
Actions #3

Updated by Patrick Donnelly almost 3 years ago

  • Related to Bug #50824: qa: snaptest-git-ceph bus error added
Actions #4

Updated by Patrick Donnelly almost 3 years ago

  • Related to Bug #51278: mds: "FAILED ceph_assert(!segments.empty())" added
Actions #5

Updated by Venky Shankar about 2 years ago

Similar failure here: https://pulpito.ceph.com/vshankar-2022-04-11_12:24:06-fs-wip-vshankar-testing1-20220411-144044-testing-default-smithi/6786336/

although in this instance, we see ESTALE/EIO.

2022-04-11T15:56:23.599 INFO:teuthology.orchestra.run.smithi141.stderr:2022-04-11T15:56:23.590+0000 7f3cba9ff700  1 -- 172.21.15.141:0/3624046670 --> [v2:172.21.15.153:6808/205989,v1:172.21.15.153:6809/205989] -- command(tid 11: {"prefix": "get_command_descriptions"}) v1
 -- 0x7f3c90018dc0 con 0x7f3c90011730
2022-04-11T15:56:23.599 INFO:teuthology.orchestra.run.smithi141.stderr:2022-04-11T15:56:23.590+0000 7f3cb37fe700  1 --2- 172.21.15.141:0/3624046670 >> [v2:172.21.15.153:6808/205989,v1:172.21.15.153:6809/205989] conn(0x7f3c90011730 0x7f3c90011b60 unknown :-1 s=BANNER_CONN
ECTING pgs=0 cs=0 l=1 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0
2022-04-11T15:56:23.628 INFO:tasks.ceph.osd.7.smithi153.stderr:2022-04-11T15:56:23.619+0000 7f22a0340700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 7  (PID: 27672) UID: 0
2022-04-11T15:56:23.644 INFO:tasks.workunit.client.0.smithi141.stdout:'.snap/k' -> './k'
2022-04-11T15:56:23.644 INFO:tasks.workunit.client.0.smithi141.stdout:'.snap/k/linux-2.6.33.tar.bz2' -> './k/linux-2.6.33.tar.bz2'
2022-04-11T15:56:23.645 INFO:tasks.workunit.client.0.smithi141.stderr:cp: error writing './k/linux-2.6.33.tar.bz2': Stale file handle
2022-04-11T15:56:23.645 INFO:teuthology.orchestra.run.smithi141.stderr:umount: /home/ubuntu/cephtest/mnt.0: target is busy.
2022-04-11T15:56:23.646 INFO:tasks.workunit.client.0.smithi141.stderr:cp: cannot stat '.snap/k/linux-2.6.33': Input/output error
2022-04-11T15:56:23.646 INFO:tasks.workunit.client.0.smithi141.stderr:cp: preserving times for './k': Input/output error
2022-04-11T15:56:23.647 INFO:teuthology.orchestra.run.smithi141.stderr:2022-04-11T15:56:23.639+0000 7f3cb37fe700  1 --2- 172.21.15.141:0/3624046670 >> [v2:172.21.15.153:6808/205989,v1:172.21.15.153:6809/205989] conn(0x7f3c90011730 0x7f3c90011b60 crc :-1 s=READY pgs=222 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).ready entity=osd.5 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2022-04-11T15:56:23.647 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-04-11T15:56:23.648 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
2022-04-11T15:56:23.648 DEBUG:teuthology.orchestra.run.smithi141:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2022-04-11T15:56:23.658 DEBUG:teuthology.orchestra.run:got remote process result: 32
Actions #6

Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v17.0.0)
Actions #7

Updated by Venky Shankar 9 months ago

This popped up again with centos 9.stream, but I don't think anything to do with the distro. ref: /a/yuriw-2023-07-26_14:28:57-fs-wip-yuri-testing-2023-07-25-0833-reef-distro-default-smithi/7353025

The failures are the usual -EIO errno:

2023-07-26T21:51:59.533 INFO:tasks.workunit.client.0.smithi043.stdout:'.snap/k/linux-2.6.33/drivers/isdn/mISDN/dsp_dtmf.c' -> './k/linux-2.6.33/drivers/isdn/mISDN/dsp_dtmf.c'
2023-07-26T21:51:59.534 INFO:tasks.workunit.client.0.smithi043.stdout:'.snap/k/linux-2.6.33/drivers/isdn/mISDN/dsp_ecdis.h' -> './k/linux-2.6.33/drivers/isdn/mISDN/dsp_ecdis.h'
2023-07-26T21:51:59.534 INFO:tasks.workunit.client.0.smithi043.stdout:'.snap/k/linux-2.6.33/drivers/isdn/mISDN/dsp_hwec.c' -> './k/linux-2.6.33/drivers/isdn/mISDN/dsp_hwec.c'
2023-07-26T21:51:59.534 DEBUG:teuthology.orchestra.run:got remote process result: 1
2023-07-26T21:51:59.535 INFO:tasks.workunit.client.0.smithi043.stderr:cp: cannot stat '.snap/k/linux-2.6.33/drivers/isdn/mISDN/dsp_hwec.h': Input/output error
2023-07-26T21:51:59.535 INFO:tasks.workunit.client.0.smithi043.stderr:cp: cannot stat '.snap/k/linux-2.6.33/drivers/isdn/mISDN/dsp_pipeline.c': Input/output error
2023-07-26T21:51:59.535 INFO:tasks.workunit.client.0.smithi043.stderr:cp: cannot stat '.snap/k/linux-2.6.33/drivers/isdn/mISDN/dsp_tones.c': Input/output error
2023-07-26T21:51:59.535 INFO:tasks.workunit.client.0.smithi043.stderr:cp: cannot stat '.snap/k/linux-2.6.33/drivers/isdn/mISDN/fsm.c': Input/output error
2023-07-26T21:51:59.535 INFO:tasks.workunit.client.0.smithi043.stderr:cp: cannot stat '.snap/k/linux-2.6.33/drivers/isdn/mISDN/fsm.h': Input/output error

No MDS core dumps and/or anything in the kernel ring buffer.

Actions #8

Updated by Venky Shankar about 1 month ago

  • Category set to Correctness/Safety
  • Assignee set to Xiubo Li
  • Target version set to v20.0.0
Actions #9

Updated by Venky Shankar about 1 month ago

  • Related to Bug #64707: suites/fsstress.sh hangs on one client - test times out added
Actions #10

Updated by Patrick Donnelly 4 days ago

Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381041
Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381043
Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381045
Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381047
Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381049
Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381051
Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381053
Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381055
Apr 21 02:55:25 smithi043 kernel: ceph:  dropping unsafe request 381057
Apr 21 02:55:25 smithi043 kernel: ceph: ceph_do_invalidate_pages: inode 1000000b5c4.fffffffffffffffe is shut down
Apr 21 02:55:25 smithi043 kernel: ceph: ceph_do_invalidate_pages: inode 1000000bc24.fffffffffffffffe is shut down
Apr 21 02:55:25 smithi043 kernel: ceph: ceph_do_invalidate_pages: inode 1000000bd57.fffffffffffffffe is shut down
Apr 21 02:55:25 smithi043 kernel: ceph: ceph_do_invalidate_pages: inode 1000000c4dd.fffffffffffffffe is shut down
Apr 21 02:55:25 smithi043 kernel: ceph: ceph_do_invalidate_pages: inode 1000000c4e7.fffffffffffffffe is shut down
Apr 21 02:55:25 smithi043 kernel: ceph: ceph_do_invalidate_pages: inode 1000000c4e6.fffffffffffffffe is shut down
Apr 21 02:55:26 smithi043 sudo[71057]:   ubuntu : PWD=/home/ubuntu ; USER=root ; COMMAND=/bin/rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
Apr 21 02:55:26 smithi043 sudo[71057]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1000)
Apr 21 02:55:26 smithi043 sudo[71089]:   ubuntu : PWD=/home/ubuntu ; USER=root ; ENV=PATH=/usr/sbin:/home/ubuntu/.local/bin:/home/ubuntu/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbinCOMMAND=/bin/lsof
Apr 21 02:55:26 smithi043 sudo[71089]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1000)
Apr 21 02:55:26 smithi043 sudo[71057]: pam_unix(sudo:session): session closed for user root
Apr 21 02:55:27 smithi043 kernel: libceph: mds0 (1)172.21.15.73:6839 socket closed (con state V1_BANNER)

From: /teuthology/pdonnell-2024-04-20_23:33:17-fs-wip-pdonnell-testing-20240420.180737-debug-distro-default-smithi/7665863/remote/smithi043/syslog/journalctl-b0.gz

Actions

Also available in: Atom PDF