Actions
Bug #56634
openqa: workunit snaptest-intodir.sh fails with MDS crash
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
1a/rpm/el8/BUILD/ceph-17.0.0-13318-g10d6351a/src/mds/CInode.cc: 2560: FAILED ceph_assert(!"unmatched fragstat" == g_conf()->mds_verify_scatter) 2022-07-09T05:37:05.089 INFO:tasks.ceph.mds.a.smithi032.stderr: 2022-07-09T05:37:05.090 INFO:tasks.ceph.mds.a.smithi032.stderr: ceph version 17.0.0-13318-g10d6351a (10d6351a921d0691675d827b5bf030ef8a89b733) quincy (dev) 2022-07-09T05:37:05.090 INFO:tasks.ceph.mds.a.smithi032.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7fd3ecf26f44] 2022-07-09T05:37:05.090 INFO:tasks.ceph.mds.a.smithi032.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x2c2165) [0x7fd3ecf27165] 2022-07-09T05:37:05.091 INFO:tasks.ceph.mds.a.smithi032.stderr: 3: (CInode::finish_scatter_gather_update(int, boost::intrusive_ptr<MutationImpl>&)+0x2d12) [0x55d5482f7182] 2022-07-09T05:37:05.091 INFO:tasks.ceph.mds.a.smithi032.stderr: 4: (Locker::scatter_writebehind(ScatterLock*)+0x26e) [0x55d548228e7e] 2022-07-09T05:37:05.092 INFO:tasks.ceph.mds.a.smithi032.stderr: 5: (Locker::simple_sync(SimpleLock*, bool*)+0x42a) [0x55d548229a7a] 2022-07-09T05:37:05.092 INFO:tasks.ceph.mds.a.smithi032.stderr: 6: (Locker::scatter_nudge(ScatterLock*, MDSContext*, bool)+0x53d) [0x55d54822ea2d] 2022-07-09T05:37:05.092 INFO:tasks.ceph.mds.a.smithi032.stderr: 7: (Locker::scatter_tick()+0x204) [0x55d54822f464] 2022-07-09T05:37:05.093 INFO:tasks.ceph.mds.a.smithi032.stderr: 8: (Locker::tick()+0xd) [0x55d54824a2fd] 2022-07-09T05:37:05.093 INFO:tasks.ceph.mds.a.smithi032.stderr: 9: (MDSRankDispatcher::tick()+0x227) [0x55d54805ee77] 2022-07-09T05:37:05.094 INFO:tasks.ceph.mds.a.smithi032.stderr: 10: (Context::complete(int)+0xd) [0x55d54803b57d] 2022-07-09T05:37:05.095 INFO:tasks.ceph.mds.a.smithi032.stderr: 11: (CommonSafeTimer<ceph::fair_mutex>::timer_thread()+0x18b) [0x7fd3ed03e5ab] 2022-07-09T05:37:05.095 INFO:tasks.ceph.mds.a.smithi032.stderr: 12: (CommonSafeTimerThread<ceph::fair_mutex>::entry()+0x11) [0x7fd3ed0405b1] 2022-07-09T05:37:05.096 INFO:tasks.ceph.mds.a.smithi032.stderr: 13: /lib64/libpthread.so.0(+0x81cf) [0x7fd3ebebf1cf] 2022-07-09T05:37:05.097 INFO:tasks.ceph.mds.a.smithi032.stderr: 14: clone() 2022-07-09T05:37:05.097 INFO:tasks.ceph.mds.a.smithi032.stderr:
2022-07-09T05:35:50.911 INFO:teuthology.orchestra.run.smithi032.stderr:dumped fsmap epoch 60 2022-07-09T05:35:50.918 INFO:tasks.mds_thrash.fs.[cephfs]:kill mds.a (rank=0) 2022-07-09T05:35:50.919 DEBUG:tasks.ceph.mds.a:waiting for process to exit 2022-07-09T05:35:50.919 INFO:teuthology.orchestra.run:waiting for 300 2022-07-09T05:35:50.920 DEBUG:teuthology.orchestra.run:got remote process result: 1 2022-07-09T05:35:50.920 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/daemon/state.py", line 139, in stop run.wait([self.proc], timeout=timeout) File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 479, in wait proc.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 183, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed on smithi032 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f --cluster ceph -i a' 2022-07-09T05:35:50.920 INFO:tasks.ceph.mds.a:Stopped
2022-07-09T05:36:24.261 INFO:teuthology.orchestra.run.smithi032.stderr:lsof: WARNING: can't stat() ceph file system /home/ubuntu/cephtest/mnt.0 2022-07-09T05:36:24.262 INFO:teuthology.orchestra.run.smithi032.stderr: Output information may be incomplete. 2022-07-09T05:36:24.419 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/run_tasks.py", line 103, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/run_tasks.py", line 82, in run_one_task return task(**kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/workunit.py", line 148, in task cleanup=cleanup) File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/workunit.py", line 298, in _spawn_on_all_clients timeout=timeout) File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/parallel.py", line 84, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/parallel.py", line 98, in __next__ resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/parallel.py", line 30, in resurrect_traceback raise exc.exc_info[1] File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/parallel.py", line 23, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/workunit.py", line 427, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/remote.py", line 510, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 455, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 183, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed (workunit test fs/snaps/snaptest-intodir.sh) on smithi032 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=10d6351a921d0691675d827b5bf030ef8a89b733 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-intodir.sh' 2022-07-09T05:36:24.568 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=30646434097a46ea805ec27c6494edb1
2022-07-09T05:39:06.639 INFO:tasks.ceph.mds.a:Stopped 2022-07-09T05:39:06.639 DEBUG:tasks.ceph.mds.c:waiting for process to exit 2022-07-09T05:39:06.640 INFO:teuthology.orchestra.run:waiting for 300 2022-07-09T05:39:06.641 DEBUG:teuthology.orchestra.run:got remote process result: 1 2022-07-09T05:39:06.641 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph.py", line 1441, in run_daemon yield File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/contextutil.py", line 33, in nested yield vars File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph.py", line 1908, in task osd_scrub_pgs(ctx, config) File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph.py", line 1236, in osd_scrub_pgs stats = manager.get_pg_stats() File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph_manager.py", line 2346, in get_pg_stats out = self.raw_cluster_cmd('pg', 'dump', '--format=json') File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph_manager.py", line 1600, in raw_cluster_cmd return self.run_cluster_cmd(**kwargs).stdout.getvalue() File "/home/teuthworker/src/git.ceph.com_ceph-c_10d6351a921d0691675d827b5bf030ef8a89b733/qa/tasks/ceph_manager.py", line 1591, in run_cluster_cmd return self.controller.run(**kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/remote.py", line 510, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 455, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_10062088f503b43eff3624326bda825b23438f9b/teuthology/orchestra/run.py", line 183, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed on smithi032 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph pg dump --format=json'
Updated by Venky Shankar over 1 year ago
Leaving this unassigned for now. Please bring this up if we hit this again.
Actions