Project

General

Profile

Bug #23007

jewel integration testing: ceph pg scrub 1.0 fails in create_verify_lfn_objects

Added by Nathan Cutler over 1 year ago. Updated about 1 year ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
02/15/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

The following tests fail:

  • rados/singleton-nomsgr/{all/lfn-upgrade-infernalis.yaml rados.yaml}
  • rados/singleton-nomsgr/{all/lfn-upgrade-hammer.yaml rados.yaml}

In both cases we see:

2018-02-07T09:41:42.911 INFO:teuthology.orchestra.run.smithi060:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph pg scrub 1.0'
2018-02-07T09:41:43.048 INFO:teuthology.orchestra.run.smithi060.stderr:Error EAGAIN: pg 1.0 primary osd.1 not up
2018-02-07T09:41:43.057 INFO:tasks.create_verify_lfn_objects:ceph_verify_lfn_objects verifying...

Then ceph_verify_lfn_objects task runs for awhile, until squawking:

2018-02-07T09:35:56.690 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/sequential.py", line 46, in task
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=confg)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-jewel-backports/qa/tasks/ceph_manager.py", line 2057, in task
    fn(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-jewel-backports/qa/tasks/ceph_manager.py", line 1485, in do_pg_scrub
    self.raw_cluster_cmd('pg', stype, self.get_pgid(pool, pgnum))
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-jewel-backports/qa/tasks/ceph_manager.py", line 881, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 423, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 155, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 177, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi016 with status 11: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph pg scrub 1.0'

Logs:

After the first run, further attempts to reproduce fail in the same place with a different backtrace:

Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 437, in upgrade
    upgrade_common(ctx, config, upgrade_old_style)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 382, in upgrade_common
    deploy_style(ctx, node, remote, pkgs, system_type)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 265, in upgrade_old_style
    deb._upgrade_packages(ctx, node, remote, pkgs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/deb.py", line 183, in _upgrade_packages
    builder.install_repo()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/packaging.py", line 778, in install_repo
    self._install_deb_repo()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/packaging.py", line 976, in _install_deb_repo
    repo = self._get_repo()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/packaging.py", line 964, in _get_repo
    resp.raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/requests/models.py", line 893, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
HTTPError: 502 Server Error: Bad Gateway for url: https://4.chacra.ceph.com/repos/ceph/wip-jewel-backports/a95e2c4958b256d75ea1c732c2f2cce45f024081/ubuntu/trusty/flavors/default/repo

Logs:


Related issues

Related to RADOS - Bug #19737: EAGAIN encountered during pg scrub (jewel) Resolved 04/21/2017
Related to ceph-qa-suite - Backport #23066: jewel: ceph.restart + ceph_manager.wait_for_clean is racy Resolved

History

#1 Updated by Nathan Cutler over 1 year ago

@David And this one too? (Thanks!)

#2 Updated by Nathan Cutler over 1 year ago

  • Description updated (diff)

#3 Updated by Josh Durgin about 1 year ago

  • Related to Bug #19737: EAGAIN encountered during pg scrub (jewel) added

#4 Updated by Josh Durgin about 1 year ago

  • Status changed from New to Duplicate

#5 Updated by Nathan Cutler about 1 year ago

  • Related to Backport #23066: jewel: ceph.restart + ceph_manager.wait_for_clean is racy added

Also available in: Atom PDF