Project

General

Profile

Actions

Bug #23787

closed

luminous: "osd-scrub-repair.sh'" failures in rados

Added by Yuri Weinstein about 6 years ago. Updated almost 6 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is v12.2.5 QE validation

Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-smithi/
(see rerun verify http://pulpito.ceph.com/yuriw-2018-04-18_15:20:52-rados-luminous-distro-basic-smithi/)
Jobs: 2408616
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-smithi/2408616/teuthology.log

2018-04-18T02:14:41.275 INFO:tasks.workunit.client.0.smithi008.stdout:2018-04-18 02:13:59.163750 7f16d516f700 10 osd.1 pg_epoch: 116 pg[3.0s0( v 106'10 (0'0,106'10] local-lis/les=114/116 n=7 ec=26/26 lis/c 114/114 les/c/f 116/116/0 114/114/100) [1,2,0]p1(0) r=0 lpr=114 crt=106'10 lcod 106'9 mlcod 0'0 active+clean+inconsistent] _handle_message: osd_op(client.4354.0:1 3.0s0 3.0 (undecoded) ondisk+read+pgop+ignore_overlay+known_if_redirected e116) v8
2018-04-18T02:14:41.275 INFO:tasks.workunit.client.0.smithi008.stdout:2018-04-18 02:13:59.163768 7f16d516f700 20 osd.1 pg_epoch: 116 pg[3.0s0( v 106'10 (0'0,106'10] local-lis/les=114/116 n=7 ec=26/26 lis/c 114/114 les/c/f 116/116/0 114/114/100) [1,2,0]p1(0) r=0 lpr=114 crt=106'10 lcod 106'9 mlcod 0'0 active+clean+inconsistent] do_op: op osd_op(client.4354.0:1 3.0s0 3:00000000::::head [scrubls] snapc 0=[] ondisk+read+pgop+ignore_overlay+known_if_redirected e116) v8
2018-04-18T02:14:41.275 INFO:tasks.workunit.client.0.smithi008.stdout:2018-04-18 02:13:59.163783 7f16d516f700 20 osd.1 pg_epoch: 116 pg[3.0s0( v 106'10 (0'0,106'10] local-lis/les=114/116 n=7 ec=26/26 lis/c 114/114 les/c/f 116/116/0 114/114/100) [1,2,0]p1(0) r=0 lpr=114 crt=106'10 lcod 106'9 mlcod 0'0 active+clean+inconsistent] op_has_sufficient_caps session=0x55c73b9a0780 pool=3 (ecpool ) owner=0 need_read_cap=1 need_write_cap=0 classes=[] -> yes
2018-04-18T02:14:41.275 INFO:tasks.workunit.client.0.smithi008.stdout:2018-04-18 02:13:59.163794 7f16d516f700 10 osd.1 pg_epoch: 116 pg[3.0s0( v 106'10 (0'0,106'10] local-lis/les=114/116 n=7 ec=26/26 lis/c 114/114 les/c/f 116/116/0 114/114/100) [1,2,0]p1(0) r=0 lpr=114 crt=106'10 lcod 106'9 mlcod 0'0 active+clean+inconsistent] do_pg_op osd_op(client.4354.0:1 3.0s0 3:00000000::::head [scrubls] snapc 0=[] ondisk+read+pgop+ignore_overlay+known_if_redirected e116) v8
2018-04-18T02:14:41.275 INFO:tasks.workunit.client.0.smithi008.stdout:2018-04-18 02:13:59.164169 7f16d516f700 10 osd.1 116 dequeue_op 0x55c73b5d7dc0 finish
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stdout:2018-04-18 02:13:59.164184 7f16d516f700 20 osd.1 op_wq(0) _process empty q, waiting
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stdout:2018-04-18 02:13:59.164902 7f16e09ff700  2 osd.1 116 ms_handle_reset con 0x55c73b6e6800 session 0x55c73b9a0780
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1706: display_logs:  read file
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:184: teardown:  rm -fr td/osd-scrub-repair
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:185: teardown:  get_asok_dir
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:107: get_asok_dir:  '[' -n '' ']'
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:110: get_asok_dir:  echo /tmp/ceph-asok.37066
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:185: teardown:  rm -rf /tmp/ceph-asok.37066
2018-04-18T02:14:41.276 INFO:tasks.workunit.client.0.smithi008.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:186: teardown:  '[' no = yes ']'
2018-04-18T02:14:41.277 INFO:tasks.workunit.client.0.smithi008.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:193: teardown:  return 0
2018-04-18T02:14:41.277 INFO:tasks.workunit.client.0.smithi008.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1894: main:  return 1
2018-04-18T02:14:41.277 INFO:tasks.workunit:Stopping ['scrub'] on client.0...
2018-04-18T02:14:41.277 INFO:teuthology.orchestra.run.smithi008:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'
2018-04-18T02:14:41.421 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 201, in task
    config.get('subdir'), timeout=timeout)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 351, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 473, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 423, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 155, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 177, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test scrub/osd-scrub-repair.sh) on smithi008 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=luminous TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh'
2018-04-18T02:14:41.443 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/?q=0f2273c58ed646c2bf299f04fed66cdd
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 201, in task
    config.get('subdir'), timeout=timeout)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 351, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 473, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 423, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 155, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 177, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test scrub/osd-scrub-repair.sh) on smithi008 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=luminous TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh'
Actions #1

Updated by Patrick Donnelly about 6 years ago

  • Project changed from Ceph to RADOS
  • Target version set to v13.0.0
  • Backport set to luminous
Actions #2

Updated by Sage Weil almost 6 years ago

  • Subject changed from "osd-scrub-repair.sh'" failures in rados to luminous: "osd-scrub-repair.sh'" failures in rados
Actions #3

Updated by David Zafman almost 6 years ago

  • Status changed from New to Rejected

This is an incompatibility between the OSD version 64ffa817000d59d91379f7335439845930f58530 (luminous) and the version of qa/standalone/scrub/osd-scrub-repair.sh that it is using. The teuthology.log has a line number 3318 that doesn't exist in the 64ffa817000d59d91379f7335439845930f58530 version of the script.

2018-04-18T02:13:59.225 INFO:tasks.workunit.client.0.smithi008.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:3318: corrupt_scrub_erasure: diff -y td/osd-scrub-repair/checkcsjson td/osd-scrub-repair/csjson

[~/ceph/build] ((64ffa81...))
dzafman$ wc -l ../qa/standalone/scrub/osd-scrub-repair.sh

Actions

Also available in: Atom PDF