Project

General

Profile

Bug #22489

ceph_ansible installations may not work with rgw tasks.

Added by Anonymous over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Teuthology works fine when running the following yaml file:

branch: luminous
kernel:
  kdb: true
  sha1: distro
meta:
- desc: 3-node cluster
- desc: Build the ceph cluster using ceph-ansible
- desc: without dmcrypt
nuke-on-error: true
openstack:
- volumes:
    count: 3
    size: 10
os_type: ubuntu
os_version: '16.04'
overrides:
  admin_socket:
    branch: luminous
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 25
    log-whitelist:
    - slow request
    sha1: bf5f5ec7cf0e06125515866acedcc04c393f90b9
  ceph-deploy:
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        osd default pool size: 2
  ceph_ansible:
    vars:
      ceph_conf_overrides:
        global:
          mon pg warn min per osd: 2
          osd default pool size: 2
      ceph_origin: repository
      ceph_repository: dev
      ceph_stable_release: luminous
      ceph_test: true
      dmcrypt: false
      journal_size: 1024
      osd_auto_discovery: false
      osd_scenario: collocated
      cephfs_pools:
        - name: "cephfs_data" 
          pgs: "64" 
        - name: "cephfs_metadata" 
          pgs: "64" 
      osd pool default pg num: 64
      osd pool default pgp num: 64
      pg per osd: 1024
  install:
    ceph:
      sha1: bf5f5ec7cf0e06125515866acedcc04c393f90b9
priority: 100
repo: https://github.com/ceph/ceph.git
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
  - osd.2
  - mgr.w
- - mon.b
  - mgr.x
  - osd.3
  - osd.4
  - osd.5
- - mon.c
  - mgr.y
  - osd.6
  - osd.7
  - osd.8
  - client.0
sha1: bf5f5ec7cf0e06125515866acedcc04c393f90b9
suite: ceph-ansible
suite_path: /home/teuthworker/src/github.com_ceph_ceph_master/qa
suite_relpath: qa
suite_repo: https://github.com/ceph/ceph.git
suite_sha1: 25e60f042bd380afda62b494e47655a9830965e6
tasks:
- print : 'Test 1 -- ceph_ansible'
- ssh-keys: null
- ceph_ansible: null
- install.ship_utilities: null
- interactive:
- workunit:
    all:
      - true.sh
clients:teuthology_branch: wip-foryuri-wusui
verbose: true
targets:
  vpm019.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXVFBL+3jLXC21Y3CZQrzzSt82ayLwWu5tLzoN6/uUisJwcjLutoadczuwYfAdEDMza3pnk26MV2cr7gM3CMv6uxbk2kmrAZmgJwbYxvyGpDPsGvhlSfxtJvvBimxde+3Irqm9SjCcsiTH+naEBQVvO8brfyR8BGGAF72jeBpWtELqGtmL5NfIISb2cdt7hHbZh1gErFB6ihknnVvJhd6I7Ti1oGBP44z7mQCdDG3jRmFcqDdsr/zErvgsmkP9B9UUl8bXQBzrhn8wwowJ+7H/moH/nasVKNhSV7DUEwdRbZ5F6PKsEcb/9bBGGwIjlmUXOSTFlQ5Sq5a1rRVQtVel
  vpm083.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcOV2mJJO0QNS7TXxqEuN5SGcDhTNFzXwG4R8aY9sibs2xn2mzutLPam58hfGgX/HAOuOpwhLrTM9Ua8CaoPJOS00cTWTxkpkcOcy6vCd5Eh15Koy3rTWvvPiuWGzXUz2lKmdKUk1ySXhrTDeztL6wad6b/o0nAwu0ECLpO4r0KEo2dfWOo5QPTUDRYgNF59A7A467TfBx6KWHDTtiLuP+IiqW1hoF24wH7GnKHQa9VVjn7BeIS497r8nP6yWzvuS66EEXzw95vm6skaGZic3gZVArzVt1ILCbJjIu5fgi2nFVOWczwVNiujFKfaM+AGdxDuvxQdDaPf2cR5hBc+cP
  vpm091.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDMxlB7NLl309UnTbbTHxw05/t/cUsVyaYqkztte8HgkiF5ogRWniQVjMzoKUz4w/i0HS2d8ClbYOL5xANKQPxHNx4DW6mRwkkEkl9sN/F+mIcXw0xQqzpDb0bE6dD9aWDzFx5pSheL0RYvo8kqhyahlabBuD1NXmReZFWV+Fw858pWNqigHbQy2mthgU35rDnDxEqKD1nSQ+aNG/hEf9ujRwbJVBeEtXy39qP687xBtcWIA2Zc/pya5K39ZxJPJdK7/YI8Wvb5wAx/CwWj9CJbIbmvTV1qcx+fRtPfYz/DXFZk6TJk+eLohmIPsKRRnwhZaWM2pNPhY/Rb3NEsaONf

When teuthology dropped into interactive mode, I ssh'ed to each node and ran sudo ceph -s and verified that ceph was running. After ^D, the teuthology taks finished successfully.

I then changed the "- interactive:" role to "- rgw: [client.0]" If all went well, this should complete successfully and not do too much. However, this teuthology run failed with:

2017-12-19 19:30:34,147.147 DEBUG:tasks.rgw:In rgw.configure_regions_and_zones() and regions is None. Bailing
2017-12-19 19:30:34,147.147 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/wusui/teuthology/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 855, in configure_regions_and_zones
    configure_compression_in_default_zone(ctx, config)
  File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 825, in configure_compression_in_default_zone
    ceph_config = ctx.ceph['ceph'].conf.get('global', {})
  File "/home/wusui/teuthology/teuthology/config.py", line 241, in __getattr__
    raise AttributeError(name)
AttributeError: ceph
2017-12-19 19:30:34,147.147 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/wusui/teuthology/teuthology/run_tasks.py", line 89, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 1402, in task
    with contextutil.nested(*subtasks):
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/teuthology/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 855, in configure_regions_and_zones
    configure_compression_in_default_zone(ctx, config)
  File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 825, in configure_compression_in_default_zone
    ceph_config = ctx.ceph['ceph'].conf.get('global', {})
  File "/home/wusui/teuthology/teuthology/config.py", line 241, in __getattr__
    raise AttributeError(name)
AttributeError: ceph
2017-12-19 19:30:34,149.149 DEBUG:teuthology.run_tasks:Unwinding manager rgw

The attribute error on: ceph_config = ctx.ceph['ceph'].conf.get('global', {}) is because ctx.ceph did not exist (at least according to some logs that I added).

The ceph.py module has the following lines in cluster() (around line 490).

    if not hasattr(ctx, 'ceph'):
        ctx.ceph = {}
    ctx.ceph[cluster_name] = argparse.Namespace()
    ctx.ceph[cluster_name].conf = conf

This puts the ceph attribute into ctx so this works running ceph. When I try adding a ceph: task in front of
the rgw task, I get:

2017-12-19 19:58:43,240.240 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.236067 7f245e726e00 -1 journal do_read_entry(4096): bad header magic
2017-12-19 19:58:43,240.240 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.236081 7f245e726e00 -1 journal do_read_entry(4096): bad header magic
2017-12-19 19:58:43,241.241 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.236373 7f245e726e00 -1 read_settings error reading settings: (2) No such file or directory
2017-12-19 19:58:43,260.260 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.255979 7f245e726e00 -1 key
2017-12-19 19:58:43,272.272 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.266898 7f245e726e00 -1 created object store /var/lib/ceph/osd/ceph-0 for osd.0 fsid 98b4ec97-98ad-4c57-afa3-f7ca753a9685
2017-12-19 19:58:43,275.275 INFO:teuthology.orchestra.run.vpm019:Running: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --cluster ceph --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/ceph.monmap'
2017-12-19 19:58:43,311.311 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.306615 7f33dd498e00 -1 asok(0x5616eb9fd2c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.1.asok': (17) File exists
2017-12-19 19:58:43,311.311 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.306836 7f33dd498e00 -1 already have key in keyring /var/lib/ceph/osd/ceph-1/keyring
2017-12-19 19:58:43,314.314 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/wusui/teuthology/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/ceph-qa-suite/tasks/ceph.py", line 802, in cluster
    '--monmap', monmap_path,
  File "/home/wusui/teuthology/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 423, in run
    r.wait()
  File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 155, in wait
    self._raise_for_status()
  File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 177, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on vpm019 with status 1: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --cluster ceph --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/ceph.monmap'
2017-12-19 19:58:43,315.315 INFO:tasks.ceph:Checking for errors in any valgrind logs...
2017-12-19 19:58:43,315.315 INFO:teuthology.orchestra.run.vpm091:Running: "sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq" 
2017-12-19 19:58:43,319.319 INFO:teuthology.orchestra.run.vpm019:Running: "sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq" 
2017-12-19 19:58:43,335.335 INFO:teuthology.orchestra.run.vpm091.stderr:gzip: /var/log/ceph/valgrind/*.gz: No such file or directory
2017-12-19 19:58:43,388.388 INFO:teuthology.orchestra.run.vpm083:Running: "sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq" 
2017-12-19 19:58:43,402.402 INFO:teuthology.orchestra.run.vpm019.stderr:gzip: /var/log/ceph/valgrind/*.gz: No such file or directory
2017-12-19 19:58:43,409.409 INFO:teuthology.orchestra.run.vpm083.stderr:gzip: /var/log/ceph/valgrind/*.gz: No such file or directory
2017-12-19 19:58:43,415.415 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/wusui/teuthology/teuthology/run_tasks.py", line 89, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/ceph-qa-suite/tasks/ceph.py", line 1581, in task
    with contextutil.nested(*subtasks):
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/teuthology/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/ceph-qa-suite/tasks/ceph.py", line 802, in cluster
    '--monmap', monmap_path,
  File "/home/wusui/teuthology/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 423, in run
    r.wait()
  File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 155, in wait
    self._raise_for_status()
  File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 177, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on vpm019 with status 1: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --cluster ceph --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/ceph.monmap'
2017-12-19 19:58:43,417.417 DEBUG:teuthology.run_tasks:Unwinding manager ceph

The file existing and the keyring existing errors make sense because those were set by ceph_ansible. So I think that if I install ceph_ansible, I should not run ceph:
(or I need some magic parameters that I have not figured out yet).

So right now, I am thinking that the ctx.ceph setting code in ceph that I displayed previously should also be run by ceph_ansible. This sets ctx.ceph so that the AttributeError in the first log do not happen.

The conf value in this code is set in ceph.py also, so the fix is actually more complicated that what I described. If we go this way, it should probably be placed into a function (probably in misc),
and called from both ceph.py and ceph_ansible.py The call in ceph_ansible.py should probably go at the end of wait_for_ceph_health.

History

#1 Updated by Vasu Kulkarni over 6 years ago

rgw task doesn't work with ceph-ansible task because it sets up daemons later in the code and it also relies on all the other book keeping from ceph.py as you noticed. with ceph-ansible you dont have setup any daemon's as rgw task is currently doing.

https://github.com/ceph/ceph/blob/master/qa/tasks/rgw.py#L109

you will have to make quite a few changes to rgw task itself to use the systemd feature.

Also, here is a differnt rgw task using ceph-ansible: https://github.com/ceph/ceph/blob/master/qa/suites/rgw/hadoop-s3a/s3a-hadoop.yaml

Also available in: Atom PDF