Bug #8311: No pool name error in ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps - rgw - Ceph

Actions

Copy link

Bug #8311

closed

No pool name error in ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps

Added by Yuri Weinstein almost 10 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Yuri Weinstein

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

firefly

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Logs rae in http://qa-proxy.ceph.com/teuthology/ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps/240939/

2014-05-07T14:09:48.822 DEBUG:teuthology.orchestra.run:Running [10.214.138.100]: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get data pg_num'
2014-05-07T14:09:49.040 DEBUG:teuthology.orchestra.run:Running [10.214.138.100]: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get metadata pg_num'
2014-05-07T14:09:49.205 DEBUG:teuthology.orchestra.run:Running [10.214.138.100]: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get rbd pg_num'
2014-05-07T14:09:49.405 DEBUG:teuthology.orchestra.run:Running [10.214.138.100]: "/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '' pg_num" 
2014-05-07T14:09:49.579 INFO:teuthology.task.rgw.client.0.out:[10.214.138.108]: 2014-05-07 21:09:49.580386 7fe09f6dc780 -1 shutting down
2014-05-07T14:09:49.595 INFO:teuthology.orchestra.run.err:[10.214.138.100]: Invalid command:  saw 0 of pool(<poolname>), expected 1
2014-05-07T14:09:49.595 INFO:teuthology.orchestra.run.err:[10.214.138.100]: Error EINVAL: invalid command
2014-05-07T14:09:49.615 ERROR:teuthology.run_tasks:Saw exception from tasks
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-dumpling/teuthology/run_tasks.py", line 27, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/thrashosds.py", line 155, in task
    logger=log.getChild('ceph_manager'),
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph_manager.py", line 322, in __init__
    self.pools[pool] = self.get_pool_property(pool, 'pg_num')
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph_manager.py", line 594, in get_pool_property
    prop)
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph_manager.py", line 335, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/teuthology-dumpling/teuthology/orchestra/remote.py", line 47, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/teuthology-dumpling/teuthology/orchestra/run.py", line 267, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/teuthology-dumpling/teuthology/orchestra/run.py", line 263, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.138.100 with status 22: "/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '' pg_num"

archive_path: /var/lib/teuthworker/archive/ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps/240939
branch: dumpling
description: upgrade/dumpling/rgw/{0-cluster/start.yaml 1-dumpling-install/v0.67.1.yaml
  2-workload/testrgw.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final/osdthrash.yaml}
email: null
job_id: '240939'
kernel: &id001
  kdb: true
  sha1: f74d66a3ec1b62a663451083091ccb8341d721ec
last_in_suite: false
machine_type: vps
name: ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: dumpling
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    fs: xfs
    log-whitelist:
    - slow request
    - scrub
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 14650b282ecd344025714a4e743b255ae01b3ce0
  ceph-deploy:
    branch:
      dev: dumpling
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 14650b282ecd344025714a4e743b255ae01b3ce0
  s3tests:
    branch: dumpling
  workunit:
    sha1: 14650b282ecd344025714a4e743b255ae01b3ce0
owner: scheduled_ubuntu@yw
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mon.c
  - osd.3
  - osd.4
  - osd.5
  - client.0
suite: upgrade:dumpling
targets:
  ubuntu@vpm036.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAQxsNepwAjUm//hG2UtfuaV3xnVAEIbUYQtF5avojA73oJFnznq+ombYQ1hCpZ0fOt3Dng4+Ef6uUsNs0k9k7Wx4S0UU9LZ/4fH1NHdfkM25Jtw8RLSi+rrR/tAs/fyAa9gQIWVMNFxvxPD+cXkWBzgR+jLyBxlRCgMTodyLDnhl2vj246gKnyBi1Vp1gtHl5DLxcuF1LW8f+tGU+Wj1rvXJCOtnQyiQy9O4L5UrIlaZhkffRGGKEQU2tOZlNTp4g4g3r9JzXcz94y9ZdHau/AiXRdaAZNipdjp3yx1y1Jg81z0MB3J4p9lb0k27rDM3yuPoQpeo5kAokZhrP/QwV
  ubuntu@vpm037.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuC+HJgGl/7kj50Zzbtx6khTHDY76vW0VvqwO+QE8B75uBiyBLR1UFeIO9H/p/wupeNDE7aGQfQEqLQ3suCytNVr2YKUgxoSz8hMcpSoGO9HW4qNxnC2AvWQGtoZ6aDHz9UuPbppSYp67D3nBhTO1r839DRPL/k8ea0QbWGWHLcO4idHHw7gTHLkobePbsSddnw5TZF6tj1Z+dNSMBlKYUYjwIg1Zdq6zX95kL5BqvZj2OlIs2G9PvVFrgsaVrMhW7L2ohd+XiSZzVa92zTShh6citT+/mm+V4jje/EAOdiuhPC07yuCw/zFME1SPmU22c2/eGFja6P8basy6IMCB/
tasks:
- internal.lock_machines:
  - 2
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    tag: v0.67.1
- ceph: null
- parallel:
  - workload
  - upgrade-sequence
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- swift:
    client.0:
      rgw_server: client.0
teuthology_branch: dumpling
upgrade-sequence:
  sequential:
  - install.upgrade:
      all:
        branch: dumpling
  - ceph.restart:
    - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.4
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.5
  - sleep:
      duration: 30
  - ceph.restart:
    - rgw.client.0
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.19359
workload:
  sequential:
  - rgw:
    - client.0
  - s3tests:
      client.0:
        rgw_server: client.0

description: upgrade/dumpling/rgw/{0-cluster/start.yaml 1-dumpling-install/v0.67.1.yaml
2-workload/testrgw.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final/osdthrash.yaml}
duration: 973.9422459602356
failure_reason: 'Command failed on 10.214.138.100 with status 22: "/home/ubuntu/cephtest/adjust-ulimits
ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '''' pg_num"'
flavor: basic
mon.a-kernel-sha1: f74d66a3ec1b62a663451083091ccb8341d721ec
mon.b-kernel-sha1: f74d66a3ec1b62a663451083091ccb8341d721ec
owner: scheduled_ubuntu@yw
success: false

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Sage Weil almost 10 years ago

Priority changed from Normal to Urgent
Source changed from other to Q/A

Actions

Copy link

Updated by Yuri Weinstein almost 10 years ago

Additional logs from fresh runs - http://pulpito.front.sepia.ceph.com/teuthology-2014-05-07_19:15:07-upgrade:dumpling-dumpling-testing-basic-plana/242060/

Actions

Copy link

Updated by Yuri Weinstein almost 10 years ago

Severity changed from 3 - minor to 2 - major

Actions

Copy link

Updated by Tamilarasi muthamizhan almost 10 years ago

Assignee set to Tamilarasi muthamizhan

Actions

Copy link

Updated by Yuri Weinstein almost 10 years ago

Here is the latest run results:
http://pulpito.front.sepia.ceph.com/teuthology-2014-05-19_19:15:03-upgrade:dumpling-dumpling-testing-basic-vps/

Actions

Copy link

Updated by Samuel Just almost 10 years ago

(05:04:35 PM) sjusthm: yehudasa: ping
(05:04:45 PM) sjusthm: in the mon log
(05:04:46 PM) sjusthm: I see
(05:04:48 PM) sjusthm: 2014-05-07 20:58:16.606953 7f60b57d9700 1 -- 10.214.138.100:6789/0 <== client.4155 10.214.138.108:0/1004734 9 ==== pool_op(create pool 0 auid 0 tid 18 name .rgw.root v7) v4 ==== 74+0+0 (1826629628 0 0) 0x2558200 con 0x283d580
(05:04:53 PM) sjusthm: 2014-05-07 20:58:15.382685 7f60b57d9700 1 -- 10.214.138.100:6789/0 <== client.4155 10.214.138.108:0/1004734 7 ==== pool_op(create pool 0 auid 0 tid 1 name v0) v4 ==== 65+0+0 (2517894246 0 0) 0x260d400 con 0x283d580
(05:04:59 PM) sjusthm: and that
(05:05:08 PM) sjusthm: so same client creates those two pools
(05:05:14 PM) sjusthm: .rgw.root and ''
(05:05:21 PM) sjusthm: does radosgw create pools?
(05:05:27 PM) joshd: yes
(05:05:36 PM) sjusthm: if so is there a plausible way for it have created a pool with an empty name?
(05:05:37 PM) joshd: if you give it mon w permission it will
(05:05:39 PM) joshd: yes
(05:05:42 PM) sjusthm: ah, how?
(05:05:43 PM) joshd: it's a known bug
(05:05:46 PM) sjusthm: ah
(05:05:56 PM) sjusthm: just needs to be backported to dumpling?
(05:05:59 PM) joshd: something about default something, I forget the details
(05:06:13 PM) joshd: don't remember if it was fixed in firefly or not

Actions

Copy link

Updated by Samuel Just almost 10 years ago

Project changed from teuthology to rgw

looks like something that needs to be backported to dumpling?

Actions

Copy link

Updated by Tamilarasi muthamizhan almost 10 years ago

Assignee changed from Tamilarasi muthamizhan to Yehuda Sadeh

Actions

Copy link

Updated by Yehuda Sadeh almost 10 years ago

Status changed from New to Fix Under Review
Assignee changed from Yehuda Sadeh to Josh Durgin

Actions

Copy link

#10

Updated by Tamilarasi muthamizhan almost 10 years ago

recent log [just for reference]:
ubuntu@teuthology:/a/teuthology-2014-06-01_19:15:10-upgrade:dumpling-dumpling-testing-basic-vps/284079/

Actions

Copy link

#11

Updated by Josh Durgin almost 10 years ago

Status changed from Fix Under Review to Pending Backport
Assignee changed from Josh Durgin to Yehuda Sadeh
Backport set to firefly, dumpling

commit:b300318113b162522759d4794b1cfa3d1d9398e4

Actions

Copy link

#12

Updated by Sylvain Munaut almost 10 years ago

Oh damn, I reported #8517 which turns out to be a duplicate of this.

However I used the bucket.get_data_extra_pool() helper in the fix rather than re-implementing the fallback logic at another place.

Actions

Copy link

#13

Updated by Sage Weil almost 10 years ago

this causes a similar failure in upgrade/firefly when mon_thrash tries to run:

2014-07-11T11:54:00.825 INFO:teuthology.orchestra.run.err:[10.214.132.29]: Invalid command:  missing required parameter pool(<poolname>)
2014-07-11T11:54:00.825 INFO:teuthology.orchestra.run.err:[10.214.132.29]: osd pool get <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid :  get pool parameter <var>
2014-07-11T11:54:00.825 INFO:teuthology.orchestra.run.err:[10.214.132.29]: Error EINVAL: invalid command
2014-07-11T11:54:00.838 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_firefly/teuthology/run_tasks.py", line 43, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/run_tasks.py", line 31, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/mon_thrash.py", line 331, in task
    logger=log.getChild('ceph_manager'),
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph_manager.py", line 422, in __init__
    self.pools[pool] = self.get_pool_property(pool, 'pg_num')
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph_manager.py", line 788, in get_pool_property
    prop)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph_manager.py", line 438, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/src/teuthology_firefly/teuthology/orchestra/remote.py", line 106, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/orchestra/run.py", line 330, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/orchestra/run.py", line 326, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.132.29 with status 22: "adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '' pg_num"

Actions

Copy link

#14

Updated by Yuri Weinstein almost 10 years ago

How come we did not see these errors for awhile, and now do in brand new upgrade/firefly?

Actions

Copy link

#15

Updated by Sage Weil almost 10 years ago

Status changed from Pending Backport to Resolved
Backport changed from firefly, dumpling to firefly

should be fixed now

Actions

Copy link

#16

Updated by Yuri Weinstein almost 10 years ago

I still see it on firefly;
http://pulpito.front.sepia.ceph.com/ubuntu-2014-07-15_21:01:54-upgrade:firefly-firefly-testing-basic-plana/363998/

"Command failed on 10.214.133.37 with status 22: "adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '' pg_num""

Actions

Copy link

#17

Updated by Yuri Weinstein almost 10 years ago

Status changed from Resolved to New

Marking as "new" as still see this problem in http://pulpito.front.sepia.ceph.com/teuthology-2014-07-16_19:12:01-upgrade:firefly-firefly-testing-basic-plana/

See for example - http://pulpito.front.sepia.ceph.com/teuthology-2014-07-16_19:12:01-upgrade:firefly-firefly-testing-basic-plana/365692/

Actions

Copy link

#18

Updated by Sage Weil almost 10 years ago

Ok, it installs dumpling, upgrades to v0.80.1, then runs radosgw. Is there a way to work around thsi bug (which is i think in 0.80.1) or do we need to remove this test from the upgrade tests?

Actions

Copy link

#19

Updated by Yuri Weinstein almost 10 years ago

Sage Weil wrote:

Ok, it installs dumpling, upgrades to v0.80.1, then runs radosgw. Is there a way to work around thsi bug (which is i think in 0.80.1) or do we need to remove this test from the upgrade tests?

Should I remove v0.80.1.yaml from ceph-qa-suite/suites/upgrade/firefly/1-install?

Actions

Copy link

#20