Project

General

Profile

Bug #56634

qa: workunit snaptest-intodir.sh fails with MDS crash

Added by Rishabh Dave 7 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.front.sepia.ceph.com/rishabh-2022-07-08_23:53:34-fs-wip-rishabh-testing-2022Jul08-1820-testing-default-smithi/6920984

1a/rpm/el8/BUILD/ceph-17.0.0-13318-g10d6351a/src/mds/CInode.cc: 2560: FAILED ceph_assert(!"unmatched fragstat" == g_conf()->mds_verify_scatter)
2022-07-09T05:37:05.089 INFO:tasks.ceph.mds.a.smithi032.stderr:
2022-07-09T05:37:05.090 INFO:tasks.ceph.mds.a.smithi032.stderr: ceph version 17.0.0-13318-g10d6351a (10d6351a921d0691675d827b5bf030ef8a89b733) quincy (dev)
2022-07-09T05:37:05.090 INFO:tasks.ceph.mds.a.smithi032.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7fd3ecf26f44]
2022-07-09T05:37:05.090 INFO:tasks.ceph.mds.a.smithi032.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x2c2165) [0x7fd3ecf27165]
2022-07-09T05:37:05.091 INFO:tasks.ceph.mds.a.smithi032.stderr: 3: (CInode::finish_scatter_gather_update(int, boost::intrusive_ptr<MutationImpl>&)+0x2d12) [0x55d5482f7182]
2022-07-09T05:37:05.091 INFO:tasks.ceph.mds.a.smithi032.stderr: 4: (Locker::scatter_writebehind(ScatterLock*)+0x26e) [0x55d548228e7e]
2022-07-09T05:37:05.092 INFO:tasks.ceph.mds.a.smithi032.stderr: 5: (Locker::simple_sync(SimpleLock*, bool*)+0x42a) [0x55d548229a7a]
2022-07-09T05:37:05.092 INFO:tasks.ceph.mds.a.smithi032.stderr: 6: (Locker::scatter_nudge(ScatterLock*, MDSContext*, bool)+0x53d) [0x55d54822ea2d]
2022-07-09T05:37:05.092 INFO:tasks.ceph.mds.a.smithi032.stderr: 7: (Locker::scatter_tick()+0x204) [0x55d54822f464]
2022-07-09T05:37:05.093 INFO:tasks.ceph.mds.a.smithi032.stderr: 8: (Locker::tick()+0xd) [0x55d54824a2fd]
2022-07-09T05:37:05.093 INFO:tasks.ceph.mds.a.smithi032.stderr: 9: (MDSRankDispatcher::tick()+0x227) [0x55d54805ee77]
2022-07-09T05:37:05.094 INFO:tasks.ceph.mds.a.smithi032.stderr: 10: (Context::complete(int)+0xd) [0x55d54803b57d]
2022-07-09T05:37:05.095 INFO:tasks.ceph.mds.a.smithi032.stderr: 11: (CommonSafeTimer<ceph::fair_mutex>::timer_thread()+0x18b) [0x7fd3ed03e5ab]
2022-07-09T05:37:05.095 INFO:tasks.ceph.mds.a.smithi032.stderr: 12: (CommonSafeTimerThread<ceph::fair_mutex>::entry()+0x11) [0x7fd3ed0405b1]
2022-07-09T05:37:05.096 INFO:tasks.ceph.mds.a.smithi032.stderr: 13: /lib64/libpthread.so.0(+0x81cf) [0x7fd3ebebf1cf]
2022-07-09T05:37:05.097 INFO:tasks.ceph.mds.a.smithi032.stderr: 14: clone()
2022-07-09T05:37:05.097 INFO:tasks.ceph.mds.a.smithi032.stderr:
2022-07-09T05:35:50.911 INFO:teuthology.orchestra.run.smithi032.stderr:dumped fsmap epoch 60
2022-07-09T05:35:50.918 INFO:tasks.mds_thrash.fs.[cephfs]:kill mds.a (rank=0)
2022-07-09T05:35:50.919 DEBUG:tasks.ceph.mds.a:waiting for process to exit
2022-07-09T05:35:50.919 INFO:teuthology.orchestra.run:waiting for 300
2022-07-09T05:35:50.920 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-07-09T05:35:50.920 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/daemon/state.py", line 139, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 479, in wait
    proc.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi032 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f --cluster ceph -i a'
2022-07-09T05:35:50.920 INFO:tasks.ceph.mds.a:Stopped
2022-07-09T05:36:24.261 INFO:teuthology.orchestra.run.smithi032.stderr:lsof: WARNING: can't stat() ceph file system /home/ubuntu/cephtest/mnt.0
2022-07-09T05:36:24.262 INFO:teuthology.orchestra.run.smithi032.stderr:      Output information may be incomplete.
2022-07-09T05:36:24.419 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/run_tasks.py", line 103, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/run_tasks.py", line 82, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/workunit.py", line 148, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/workunit.py", line 298, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/workunit.py", line 427, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/remote.py", line 510, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed (workunit test fs/snaps/snaptest-intodir.sh) on smithi032 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=10d6351a921d0691675d827b5bf030ef8a89b733 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-intodir.sh'
2022-07-09T05:36:24.568 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=30646434097a46ea805ec27c6494edb1
2022-07-09T05:39:06.639 INFO:tasks.ceph.mds.a:Stopped
2022-07-09T05:39:06.639 DEBUG:tasks.ceph.mds.c:waiting for process to exit
2022-07-09T05:39:06.640 INFO:teuthology.orchestra.run:waiting for 300
2022-07-09T05:39:06.641 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-07-09T05:39:06.641 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph.py", line 1441, in run_daemon
    yield
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/contextutil.py", line 33, in nested
    yield vars
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph.py", line 1908, in task
    osd_scrub_pgs(ctx, config)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph.py", line 1236, in osd_scrub_pgs
    stats = manager.get_pg_stats()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph_manager.py", line 2346, in get_pg_stats
    out = self.raw_cluster_cmd('pg', 'dump', '--format=json')
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph_manager.py", line 1600, in raw_cluster_cmd
    return self.run_cluster_cmd(**kwargs).stdout.getvalue()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph_manager.py", line 1591, in run_cluster_cmd
    return self.controller.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/remote.py", line 510, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi032 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph pg dump --format=json'

History

#1 Updated by Rishabh Dave 7 months ago

  • Description updated (diff)

#2 Updated by Rishabh Dave 7 months ago

  • Description updated (diff)

#3 Updated by Venky Shankar 5 months ago

Leaving this unassigned for now. Please bring this up if we hit this again.

Also available in: Atom PDF