Project

General

Profile

Bug #36389

untar encounters unexpected EPERM on kclient/multimds cluster with thrashing

Added by Patrick Donnelly about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
Start date:
10/10/2018
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, kceph
Labels (FS):
Pull request ID:

Description

2018-10-10T11:22:58.275 INFO:tasks.mds_thrash.fs.[cephfs]:mds.e-s has gained rank=0, replacing gid=4215
2018-10-10T11:22:58.275 INFO:tasks.mds_thrash.fs.[cephfs]:waiting for 7 secs before reviving mds.b
2018-10-10T11:22:59.800 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test: Cannot mkdir: Permission denied
2018-10-10T11:22:59.804 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/generate_index.pl: Cannot open: Permission denied
2018-10-10T11:23:00.811 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/coverage_report.pl: Cannot open: Permission denied
2018-10-10T11:23:00.811 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/testconformance.pl: Cannot open: Permission denied
2018-10-10T11:23:00.812 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/Makefile.in: Cannot open: Permission denied
2018-10-10T11:23:00.812 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/testcoverage.pl: Cannot open: Permission denied
2018-10-10T11:23:00.812 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/gsum2html.pl: Cannot open: Permission denied
2018-10-10T11:23:00.812 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/conformance_report.pl: Cannot open: Permission denied
2018-10-10T11:23:00.812 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/Makefile.am: Cannot open: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/test/gcov2html.pl: Cannot open: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts/Makefile.am: Cannot open: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/scripts: Cannot utime: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins: Cannot mkdir: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi: Cannot mkdir: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/ipmi_entity_event.c: Cannot open: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/atca_vshm_rdrs.c: Cannot open: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/ipmi_sensor_event.c: Cannot open: Permission denied
2018-10-10T11:23:00.813 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/ipmi_close.c: Cannot open: Permission denied
2018-10-10T11:23:00.814 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/hotswap.c: Cannot open: Permission denied
2018-10-10T11:23:00.814 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/ipmi_control_event.c: Cannot open: Permission denied
2018-10-10T11:23:00.814 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/ipmi_connection.c: Cannot open: Permission denied
2018-10-10T11:23:00.814 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/ekeyfru.h: Cannot open: Permission denied
2018-10-10T11:23:00.814 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite/plugins/ipmi/ipmi_controls.c: Cannot open: Permission denied
2018-10-10T11:23:00.814 INFO:tasks.workunit.client.0.smithi177.stderr:
2018-10-10T11:23:00.814 INFO:tasks.workunit.client.0.smithi177.stderr:gzip: stdin: Permission denied
2018-10-10T11:23:00.815 INFO:tasks.workunit.client.0.smithi177.stderr:tar: Unexpected EOF in archive
2018-10-10T11:23:00.815 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases/open_hpi_testsuite: Cannot utime: Permission denied
2018-10-10T11:23:00.815 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231/testcases: Cannot utime: Permission denied
2018-10-10T11:23:00.815 INFO:tasks.workunit.client.0.smithi177.stderr:tar: ltp-full-20091231: Cannot utime: Permission denied
2018-10-10T11:23:00.815 INFO:tasks.workunit.client.0.smithi177.stderr:tar: Error is not recoverable: exiting now
2018-10-10T11:23:00.815 DEBUG:teuthology.orchestra.run:got remote process result: 2
2018-10-10T11:23:00.815 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.0...
2018-10-10T11:23:00.816 INFO:teuthology.orchestra.run.smithi177:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'
2018-10-10T11:23:00.999 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20181008.224656/qa/tasks/workunit.py", line 136, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20181008.224656/qa/tasks/workunit.py", line 286, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20181008.224656/qa/tasks/workunit.py", line 409, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 429, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-pdonnell-testing/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test suites/fsstress.sh) on smithi177 with status 2: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=wip-pdonnell-testing-20181008.224656 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/fsstress.sh'

From: /ceph/teuthology-archive/pdonnell-2018-10-09_01:07:48-multimds-wip-pdonnell-testing-20181008.224656-distro-basic-smithi/3119273/teuthology.log

Branch: https://github.com/ceph/ceph-ci/tree/wip-pdonnell-testing-20181008.224656


Related issues

Related to Ceph - Bug #36540: msg: messages are queued but not sent New
Duplicated by fs - Bug #36387: "workunit test fs/snaps/snaptest-git-ceph.sh) on smithi031 with status 128" Duplicate

History

#1 Updated by Patrick Donnelly about 2 months ago

  • Description updated (diff)

#2 Updated by Zheng Yan about 2 months ago

The 'Permission denied' error was because client was blacklisted.

client did notice that mds.0 entered reconnect state

2018-10-10T11:19:03.816779+00:00 smithi177 kernel: libceph: mds0 172.21.15.143:6814 socket closed (con state CONNECTING)
2018-10-10T11:19:10.906802+00:00 smithi177 kernel: ceph: mds0 reconnect start
2018-10-10T11:19:10.926818+00:00 smithi177 kernel: libceph: mds0 172.21.15.143:6813 socket closed (con state OPEN)

2018-10-10 11:19:10.900 7f5c5e20c700  1 mds.0.14 handle_mds_map i am now mds.0.14
2018-10-10 11:19:10.900 7f5c5e20c700  1 mds.0.14 handle_mds_map state change up:resolve --> up:reconnect
2018-10-10 11:19:10.900 7f5c5e20c700  1 mds.0.14 reconnect_start
2018-10-10 11:19:10.900 7f5c5e20c700 10 mds.0.server apply_blacklist: killed 0
2018-10-10 11:19:10.900 7f5c5e20c700  4 mds.0.14 reconnect_start: killed 0 blacklisted sessions (1 blacklist entries, 1)
...
2018-10-10 11:20:00.182 7f5c5ba07700 10 mds.0.server reconnect timed out
2018-10-10 11:20:00.182 7f5c5ba07700  1 mds.0.server reconnect gives up on client.4502 172.21.15.177:0/550996098
2018-10-10 11:20:00.182 7f5c5ba07700  0 log_channel(cluster) log [WRN] : evicting unresponsive client smithi177: (4502), after waiting 45 seconds during MDS startup
2018-10-10 11:20:00.182 7f5c5ba07700  4 mds.0.14 Preparing blacklist command... (wait=0)
2018-10-10 11:20:00.183 7f5c5ba07700  4 mds.0.14 Sending mon blacklist command: {"prefix":"osd blacklist", "blacklistop":"add","addr":"172.21.15.177:0/550996098"}

It's likely that there is bug in messenger, which caused message loss. Another example is http://tracker.ceph.com/issues/36349

#3 Updated by Patrick Donnelly about 2 months ago

  • Duplicated by Bug #36387: "workunit test fs/snaps/snaptest-git-ceph.sh) on smithi031 with status 128" added

#4 Updated by Patrick Donnelly about 2 months ago

  • Related to Bug #36540: msg: messages are queued but not sent added

Also available in: Atom PDF