Bug #22489
ceph_ansible installations may not work with rgw tasks.
0%
Description
Teuthology works fine when running the following yaml file:
branch: luminous kernel: kdb: true sha1: distro meta: - desc: 3-node cluster - desc: Build the ceph cluster using ceph-ansible - desc: without dmcrypt nuke-on-error: true openstack: - volumes: count: 3 size: 10 os_type: ubuntu os_version: '16.04' overrides: admin_socket: branch: luminous ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 25 log-whitelist: - slow request sha1: bf5f5ec7cf0e06125515866acedcc04c393f90b9 ceph-deploy: conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: osd default pool size: 2 ceph_ansible: vars: ceph_conf_overrides: global: mon pg warn min per osd: 2 osd default pool size: 2 ceph_origin: repository ceph_repository: dev ceph_stable_release: luminous ceph_test: true dmcrypt: false journal_size: 1024 osd_auto_discovery: false osd_scenario: collocated cephfs_pools: - name: "cephfs_data" pgs: "64" - name: "cephfs_metadata" pgs: "64" osd pool default pg num: 64 osd pool default pgp num: 64 pg per osd: 1024 install: ceph: sha1: bf5f5ec7cf0e06125515866acedcc04c393f90b9 priority: 100 repo: https://github.com/ceph/ceph.git roles: - - mon.a - mds.a - osd.0 - osd.1 - osd.2 - mgr.w - - mon.b - mgr.x - osd.3 - osd.4 - osd.5 - - mon.c - mgr.y - osd.6 - osd.7 - osd.8 - client.0 sha1: bf5f5ec7cf0e06125515866acedcc04c393f90b9 suite: ceph-ansible suite_path: /home/teuthworker/src/github.com_ceph_ceph_master/qa suite_relpath: qa suite_repo: https://github.com/ceph/ceph.git suite_sha1: 25e60f042bd380afda62b494e47655a9830965e6 tasks: - print : 'Test 1 -- ceph_ansible' - ssh-keys: null - ceph_ansible: null - install.ship_utilities: null - interactive: - workunit: all: - true.sh clients:teuthology_branch: wip-foryuri-wusui verbose: true targets: vpm019.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXVFBL+3jLXC21Y3CZQrzzSt82ayLwWu5tLzoN6/uUisJwcjLutoadczuwYfAdEDMza3pnk26MV2cr7gM3CMv6uxbk2kmrAZmgJwbYxvyGpDPsGvhlSfxtJvvBimxde+3Irqm9SjCcsiTH+naEBQVvO8brfyR8BGGAF72jeBpWtELqGtmL5NfIISb2cdt7hHbZh1gErFB6ihknnVvJhd6I7Ti1oGBP44z7mQCdDG3jRmFcqDdsr/zErvgsmkP9B9UUl8bXQBzrhn8wwowJ+7H/moH/nasVKNhSV7DUEwdRbZ5F6PKsEcb/9bBGGwIjlmUXOSTFlQ5Sq5a1rRVQtVel vpm083.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcOV2mJJO0QNS7TXxqEuN5SGcDhTNFzXwG4R8aY9sibs2xn2mzutLPam58hfGgX/HAOuOpwhLrTM9Ua8CaoPJOS00cTWTxkpkcOcy6vCd5Eh15Koy3rTWvvPiuWGzXUz2lKmdKUk1ySXhrTDeztL6wad6b/o0nAwu0ECLpO4r0KEo2dfWOo5QPTUDRYgNF59A7A467TfBx6KWHDTtiLuP+IiqW1hoF24wH7GnKHQa9VVjn7BeIS497r8nP6yWzvuS66EEXzw95vm6skaGZic3gZVArzVt1ILCbJjIu5fgi2nFVOWczwVNiujFKfaM+AGdxDuvxQdDaPf2cR5hBc+cP vpm091.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDMxlB7NLl309UnTbbTHxw05/t/cUsVyaYqkztte8HgkiF5ogRWniQVjMzoKUz4w/i0HS2d8ClbYOL5xANKQPxHNx4DW6mRwkkEkl9sN/F+mIcXw0xQqzpDb0bE6dD9aWDzFx5pSheL0RYvo8kqhyahlabBuD1NXmReZFWV+Fw858pWNqigHbQy2mthgU35rDnDxEqKD1nSQ+aNG/hEf9ujRwbJVBeEtXy39qP687xBtcWIA2Zc/pya5K39ZxJPJdK7/YI8Wvb5wAx/CwWj9CJbIbmvTV1qcx+fRtPfYz/DXFZk6TJk+eLohmIPsKRRnwhZaWM2pNPhY/Rb3NEsaONf
When teuthology dropped into interactive mode, I ssh'ed to each node and ran sudo ceph -s and verified that ceph was running. After ^D, the teuthology taks finished successfully.
I then changed the "- interactive:" role to "- rgw: [client.0]" If all went well, this should complete successfully and not do too much. However, this teuthology run failed with:
2017-12-19 19:30:34,147.147 DEBUG:tasks.rgw:In rgw.configure_regions_and_zones() and regions is None. Bailing 2017-12-19 19:30:34,147.147 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/wusui/teuthology/teuthology/contextutil.py", line 30, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 855, in configure_regions_and_zones configure_compression_in_default_zone(ctx, config) File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 825, in configure_compression_in_default_zone ceph_config = ctx.ceph['ceph'].conf.get('global', {}) File "/home/wusui/teuthology/teuthology/config.py", line 241, in __getattr__ raise AttributeError(name) AttributeError: ceph 2017-12-19 19:30:34,147.147 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/wusui/teuthology/teuthology/run_tasks.py", line 89, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 1402, in task with contextutil.nested(*subtasks): File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/teuthology/teuthology/contextutil.py", line 30, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 855, in configure_regions_and_zones configure_compression_in_default_zone(ctx, config) File "/home/wusui/ceph-qa-suite/tasks/rgw.py", line 825, in configure_compression_in_default_zone ceph_config = ctx.ceph['ceph'].conf.get('global', {}) File "/home/wusui/teuthology/teuthology/config.py", line 241, in __getattr__ raise AttributeError(name) AttributeError: ceph 2017-12-19 19:30:34,149.149 DEBUG:teuthology.run_tasks:Unwinding manager rgw
The attribute error on: ceph_config = ctx.ceph['ceph'].conf.get('global', {}) is because ctx.ceph did not exist (at least according to some logs that I added).
The ceph.py module has the following lines in cluster() (around line 490).
if not hasattr(ctx, 'ceph'): ctx.ceph = {} ctx.ceph[cluster_name] = argparse.Namespace() ctx.ceph[cluster_name].conf = conf
This puts the ceph attribute into ctx so this works running ceph. When I try adding a ceph: task in front of
the rgw task, I get:
2017-12-19 19:58:43,240.240 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.236067 7f245e726e00 -1 journal do_read_entry(4096): bad header magic 2017-12-19 19:58:43,240.240 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.236081 7f245e726e00 -1 journal do_read_entry(4096): bad header magic 2017-12-19 19:58:43,241.241 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.236373 7f245e726e00 -1 read_settings error reading settings: (2) No such file or directory 2017-12-19 19:58:43,260.260 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.255979 7f245e726e00 -1 key 2017-12-19 19:58:43,272.272 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.266898 7f245e726e00 -1 created object store /var/lib/ceph/osd/ceph-0 for osd.0 fsid 98b4ec97-98ad-4c57-afa3-f7ca753a9685 2017-12-19 19:58:43,275.275 INFO:teuthology.orchestra.run.vpm019:Running: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --cluster ceph --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/ceph.monmap' 2017-12-19 19:58:43,311.311 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.306615 7f33dd498e00 -1 asok(0x5616eb9fd2c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.1.asok': (17) File exists 2017-12-19 19:58:43,311.311 INFO:teuthology.orchestra.run.vpm019.stderr:2017-12-19 19:58:43.306836 7f33dd498e00 -1 already have key in keyring /var/lib/ceph/osd/ceph-1/keyring 2017-12-19 19:58:43,314.314 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/wusui/teuthology/teuthology/contextutil.py", line 30, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/ceph-qa-suite/tasks/ceph.py", line 802, in cluster '--monmap', monmap_path, File "/home/wusui/teuthology/teuthology/orchestra/remote.py", line 193, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 423, in run r.wait() File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 155, in wait self._raise_for_status() File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 177, in _raise_for_status node=self.hostname, label=self.label CommandFailedError: Command failed on vpm019 with status 1: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --cluster ceph --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/ceph.monmap' 2017-12-19 19:58:43,315.315 INFO:tasks.ceph:Checking for errors in any valgrind logs... 2017-12-19 19:58:43,315.315 INFO:teuthology.orchestra.run.vpm091:Running: "sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq" 2017-12-19 19:58:43,319.319 INFO:teuthology.orchestra.run.vpm019:Running: "sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq" 2017-12-19 19:58:43,335.335 INFO:teuthology.orchestra.run.vpm091.stderr:gzip: /var/log/ceph/valgrind/*.gz: No such file or directory 2017-12-19 19:58:43,388.388 INFO:teuthology.orchestra.run.vpm083:Running: "sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq" 2017-12-19 19:58:43,402.402 INFO:teuthology.orchestra.run.vpm019.stderr:gzip: /var/log/ceph/valgrind/*.gz: No such file or directory 2017-12-19 19:58:43,409.409 INFO:teuthology.orchestra.run.vpm083.stderr:gzip: /var/log/ceph/valgrind/*.gz: No such file or directory 2017-12-19 19:58:43,415.415 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/wusui/teuthology/teuthology/run_tasks.py", line 89, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/ceph-qa-suite/tasks/ceph.py", line 1581, in task with contextutil.nested(*subtasks): File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/teuthology/teuthology/contextutil.py", line 30, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/ceph-qa-suite/tasks/ceph.py", line 802, in cluster '--monmap', monmap_path, File "/home/wusui/teuthology/teuthology/orchestra/remote.py", line 193, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 423, in run r.wait() File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 155, in wait self._raise_for_status() File "/home/wusui/teuthology/teuthology/orchestra/run.py", line 177, in _raise_for_status node=self.hostname, label=self.label CommandFailedError: Command failed on vpm019 with status 1: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --cluster ceph --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/ceph.monmap' 2017-12-19 19:58:43,417.417 DEBUG:teuthology.run_tasks:Unwinding manager ceph
The file existing and the keyring existing errors make sense because those were set by ceph_ansible. So I think that if I install ceph_ansible, I should not run ceph:
(or I need some magic parameters that I have not figured out yet).
So right now, I am thinking that the ctx.ceph setting code in ceph that I displayed previously should also be run by ceph_ansible. This sets ctx.ceph so that the AttributeError in the first log do not happen.
The conf value in this code is set in ceph.py also, so the fix is actually more complicated that what I described. If we go this way, it should probably be placed into a function (probably in misc),
and called from both ceph.py and ceph_ansible.py The call in ceph_ansible.py should probably go at the end of wait_for_ceph_health.
History
#1 Updated by Vasu Kulkarni over 6 years ago
rgw task doesn't work with ceph-ansible task because it sets up daemons later in the code and it also relies on all the other book keeping from ceph.py as you noticed. with ceph-ansible you dont have setup any daemon's as rgw task is currently doing.
https://github.com/ceph/ceph/blob/master/qa/tasks/rgw.py#L109
you will have to make quite a few changes to rgw task itself to use the systemd feature.
Also, here is a differnt rgw task using ceph-ansible: https://github.com/ceph/ceph/blob/master/qa/suites/rgw/hadoop-s3a/s3a-hadoop.yaml