Bug #45829
closedfs: ceph_test_libcephfs abort in TestUtime
0%
Description
2020-06-02T00:39:35.510 INFO:tasks.workunit.client.0.smithi154.stdout:[ RUN ] LibCephFS.TestUtime 2020-06-02T00:39:46.460 INFO:teuthology.orchestra.run.smithi068:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-06-02T00:39:46.483 INFO:teuthology.orchestra.run.smithi154:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-06-02T00:39:58.368 INFO:tasks.workunit.client.0.smithi154.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/libcephfs/test.sh: line 3: 31546 Aborted (core dumped) ceph_test_libcephfs 2020-06-02T00:39:58.369 DEBUG:teuthology.orchestra.run:got remote process result: 134
From: /ceph/teuthology-archive/pdonnell-2020-06-01_22:23:46-fs-wip-pdonnell-testing-20200601.182437-distro-basic-smithi/5110525/teuthology.log
Core dump is:
pdonnell@teuthology /a/pdonnell-2020-06-01_22:23:46-fs-wip-pdonnell-testing-20200601.182437-distro-basic-smithi/5110525$ file ./remote/smithi154/coredump/1591058375.31546.core ./remote/smithi154/coredump/1591058375.31546.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), too many program headers (4887) pdonnell@teuthology /a/pdonnell-2020-06-01_22:23:46-fs-wip-pdonnell-testing-20200601.182437-distro-basic-smithi/5110525$ realpath !$ realpath ./remote/smithi154/coredump/1591058375.31546.core /ceph/teuthology-archive/pdonnell-2020-06-01_22:23:46-fs-wip-pdonnell-testing-20200601.182437-distro-basic-smithi/5110525/remote/smithi154/coredump/1591058375.31546.core
I haven't been able to load the coredump in gdb. I get this error:
pdonnell@senta03 ~$ sudo ceph/src/script/ceph-debug-docker.sh wip-pdonnell-testing-20200601.182437 centos:8 branch: wip-pdonnell-testing-20200601.182437 sha1: latest env: centos:8 /tmp/tmp.hj1iEtHLnL /home/pdonnell docker build --tag pdonnell:ceph-ci-wip-pdonnell-testing-20200601.182437-latest-centos-8 . Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. STEP 1: FROM centos:8 STEP 2: WORKDIR /root --> Using cache f6af401813e10a57b4b790eab59a5fd5d2164ba54ec6975640cc9f17fc0ce88e STEP 3: RUN yum update -y && yum install -y epel-release wget psmisc ca-certificates gdb --> Using cache ef44b584002a001343bf93c50f4a830645a4ff14d00699c7901a0eeb3696bd9c STEP 4: RUN wget -O /etc/yum.repos.d/ceph-dev.repo https://shaman.ceph.com/api/repos/ceph/wip-pdonnell-testing-20200601.182437/latest/centos/8/repo && yum clean all && yum upgrade -y --> Using cache 9d6822b69a298960d5b05460549e0ed25e7201905a19cc9db9315d51f00319bc STEP 5: COMMIT pdonnell:ceph-ci-wip-pdonnell-testing-20200601.182437-latest-centos-8 9d6822b69a298960d5b05460549e0ed25e7201905a19cc9db9315d51f00319bc real 0m2.825s user 0m0.236s sys 0m0.082s /home/pdonnell built image pdonnell:ceph-ci-wip-pdonnell-testing-20200601.182437-latest-centos-8 sudo docker run --privileged -ti -v /ceph:/ceph:ro pdonnell:ceph-ci-wip-pdonnell-testing-20200601.182437-latest-centos-8 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. [root@264a2f98e22b ~]# gdb -q /ceph/teuthology-archive/pdonnell-2020-06-01_22:23:46-fs-wip-pdonnell-testing-20200601.182437-distro-basic-smithi/5110525/remote/smithi154/coredump/1591058375.31546.core "/ceph/teuthology-archive/pdonnell-2020-06-01_22:23:46-fs-wip-pdonnell-testing-20200601.182437-distro-basic-smithi/5110525/remote/smithi154/coredump/1591058375.31546.core": not in executable format: Operation not permitted (gdb)
Couldn't find any helpful information on why that would be. Maybe it's because it has too many elf program headers.
I don't think it's related to https://github.com/ceph/ceph/pull/26004
Updated by Patrick Donnelly almost 4 years ago
Another in a different master test branch:
/ceph/teuthology-archive/pdonnell-2020-06-02_19:51:19-fs-wip-pdonnell-testing-20200602.153347-distro-basic-smithi/5114800/teuthology.log
without #26004. It looks very reproducible: http://pulpito.ceph.com/pdonnell-2020-06-02_19:51:19-fs-wip-pdonnell-testing-20200602.153347-distro-basic-smithi/
Updated by Patrick Donnelly almost 4 years ago
- Status changed from New to Triaged
- Assignee set to Xiubo Li
- Priority changed from Normal to High
- Component(FS) tools added
Updated by Xiubo Li almost 4 years ago
- Status changed from Triaged to In Progress
I can access to the lab now, will continue work on this.
Updated by Xiubo Li almost 4 years ago
From one of my test case's teuthology.log, we can see that, it is aborted when doing the mount(), in create_file_event() and adding the event:
2020-06-15T05:50:42.948 INFO:tasks.workunit.client.0.smithi200.stdout:[ RUN ] LibCephFS.TestUtime
2020-06-15T05:50:42.960 INFO:tasks.workunit.client.0.smithi200.stderr:/build/ceph-16.0.0-2443-g3ecd7ea/src/msg/async/Event.cc: In function 'int EventCenter::create_file_event(int, int, EventCallbackRef)' thread 7f8ce57fa700 time 2020-06-15T05:50:42.957161+0000
2020-06-15T05:50:42.961 INFO:tasks.workunit.client.0.smithi200.stderr:/build/ceph-16.0.0-2443-g3ecd7ea/src/msg/async/Event.cc: 239: ceph_abort_msg("BUG!")
2020-06-15T05:50:42.962 INFO:tasks.workunit.client.0.smithi200.stderr: ceph version 16.0.0-2443-g3ecd7ea (3ecd7ea32af62d2ccc433d77c2e4f44ea29c6d99) pacific (dev)
2020-06-15T05:50:42.962 INFO:tasks.workunit.client.0.smithi200.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe1) [0x7f8cf3c85c08]
2020-06-15T05:50:42.962 INFO:tasks.workunit.client.0.smithi200.stderr: 2: (EventCenter::create_file_event(int, int, EventCallback*)+0xbaf) [0x7f8cf3f719cf]
2020-06-15T05:50:42.962 INFO:tasks.workunit.client.0.smithi200.stderr: 3: (EventCenter::set_owner()+0x3c8) [0x7f8cf3f71e08]
2020-06-15T05:50:42.963 INFO:tasks.workunit.client.0.smithi200.stderr: 4: (()+0x564f55) [0x7f8cf3f75f55]
2020-06-15T05:50:42.963 INFO:tasks.workunit.client.0.smithi200.stderr: 5: (()+0xbd6df) [0x7f8cf2fc26df]
2020-06-15T05:50:42.963 INFO:tasks.workunit.client.0.smithi200.stderr: 6: (()+0x76db) [0x7f8cf349d6db]
2020-06-15T05:50:42.963 INFO:tasks.workunit.client.0.smithi200.stderr: 7: (clone()+0x3f) [0x7f8cf2a1d88f]
2020-06-15T05:50:59.241 INFO:teuthology.orchestra.run.smithi092:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-06-15T05:50:59.244 INFO:teuthology.orchestra.run.smithi200:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-06-15T05:51:26.489 INFO:tasks.workunit.client.0.smithi200.stderr:Aborted (core dumped)
2020-06-15T05:51:26.491 DEBUG:teuthology.orchestra.run:got remote process result: 134
2020-06-15T05:51:26.492 INFO:tasks.workunit:Stopping ['libcephfs/test.sh'] on client.0...
Updated by Xiubo Li almost 4 years ago
For this test case it crashed in ShutDownrace instead:
2020-06-15T05:55:19.089 INFO:tasks.workunit.client.0.smithi099.stdout:[ RUN ] LibCephFS.ShutdownRace
2020-06-15T05:55:28.584 INFO:teuthology.orchestra.run.smithi052:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-06-15T05:55:28.588 INFO:teuthology.orchestra.run.smithi099:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-06-15T05:55:29.893 DEBUG:teuthology.orchestra.run:got remote process result: 141
2020-06-15T05:55:29.893 INFO:tasks.workunit:Stopping ['libcephfs/test.sh'] on client.0...
2020-06-15T05:55:29.894 INFO:teuthology.orchestra.run.smithi099:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2020-06-15T05:55:30.072 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 91, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 70, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-xiubli-fs-testing-2020-06-12-1613/qa/tasks/workunit.py", line 127, in task
coverage_and_limits=not config.get('no_coverage_and_limits', None))
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 87, in __exit__
for result in self:
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 101, in __next__
resurrect_traceback(result)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 37, in resurrect_traceback
reraise(*exc_info)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 24, in capture_traceback
return func(*args, **kwargs)
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-xiubli-fs-testing-2020-06-12-1613/qa/tasks/workunit.py", line 415, in _run_tests
label="workunit test {workunit}".format(workunit=workunit)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 206, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 475, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status
node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed (workunit test libcephfs/test.sh) on smithi099 with status 141: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=3ecd7ea32af62d2ccc433d77c2e4f44ea29c6d99 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/libcephfs/test.sh'
Updated by Xiubo Li almost 4 years ago
The fd is used before initialized:
2020-06-16T15:07:35.708 INFO:tasks.workunit.client.0.smithi205.stdout:[ RUN ] LibCephFS.TestUtime
2020-06-16T15:07:35.712 INFO:tasks.workunit.client.0.smithi205.stderr:lxb ---------- 1
2020-06-16T15:07:35.715 INFO:tasks.workunit.client.0.smithi205.stderr:lxb ---------- 2
2020-06-16T15:07:35.718 INFO:tasks.workunit.client.0.smithi205.stderr:lxb ---------- 3
2020-06-16T15:07:35.718 INFO:tasks.workunit.client.0.smithi205.stderr:lxb ---------- 4
2020-06-16T15:07:35.727 INFO:tasks.workunit.client.0.smithi205.stderr:create_file_eventlxb add event failed, ret=-9 fd=-1 mask=1 original mask is 7536755
2020-06-16T15:07:41.603 INFO:teuthology.orchestra.run.smithi089:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-06-16T15:07:41.608 INFO:teuthology.orchestra.run.smithi205:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-06-16T15:08:13.859 INFO:teuthology.orchestra.run.smithi089:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-06-16T15:08:13.875 INFO:teuthology.orchestra.run.smithi205:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-06-16T15:08:22.886 INFO:tasks.workunit.client.0.smithi205.stderr:Segmentation fault (core dumped)
2020-06-16T15:08:22.887 DEBUG:teuthology.orchestra.run:got remote process result: 139
2020-06-16T15:08:22.888 INFO:tasks.workunit:Stopping ['libcephfs/test.sh'] on client.0...
Updated by Xiubo Li almost 4 years ago
It is too many open files and reach the limitation:
2020-06-17T04:09:29.434 INFO:tasks.workunit.client.0.smithi038.stdout:[ RUN ] LibCephFS.TestUtime
2020-06-17T04:09:29.435 INFO:tasks.workunit.client.0.smithi038.stderr:lxb ---------- 1
2020-06-17T04:09:29.435 INFO:tasks.workunit.client.0.smithi038.stderr:lxb ---------- 2
2020-06-17T04:09:29.436 INFO:tasks.workunit.client.0.smithi038.stderr:lxb ---------- 3
2020-06-17T04:09:29.436 INFO:tasks.workunit.client.0.smithi038.stderr:lxb ---------- 4
2020-06-17T04:09:29.657 INFO:tasks.workunit.client.0.smithi038.stderr:init can't create notify pipe: (24) Too many open files
2020-06-17T04:09:29.658 INFO:tasks.workunit.client.0.smithi038.stderr:NetworkStack lxb w->center.init, ret=-24
2020-06-17T04:09:29.658 INFO:tasks.workunit.client.0.smithi038.stderr:NetworkStack lxb w->center.init, ret=-24
2020-06-17T04:09:29.658 INFO:tasks.workunit.client.0.smithi038.stderr:NetworkStack lxb w->center.init, ret=-24
2020-06-17T04:09:29.658 INFO:tasks.workunit.client.0.smithi038.stderr:NetworkStack lxb w->center.init, ret=-24
Updated by Xiubo Li almost 4 years ago
- Status changed from In Progress to Fix Under Review
Fixed in teuthology: https://github.com/ceph/teuthology/pull/1508
Updated by Kefu Chai almost 4 years ago
- Status changed from Fix Under Review to Resolved