Project

General

Profile

Actions

Bug #8311

closed

No pool name error in ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps

Added by Yuri Weinstein almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
firefly
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs rae in http://qa-proxy.ceph.com/teuthology/ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps/240939/

2014-05-07T14:09:48.822 DEBUG:teuthology.orchestra.run:Running [10.214.138.100]: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get data pg_num'
2014-05-07T14:09:49.040 DEBUG:teuthology.orchestra.run:Running [10.214.138.100]: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get metadata pg_num'
2014-05-07T14:09:49.205 DEBUG:teuthology.orchestra.run:Running [10.214.138.100]: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get rbd pg_num'
2014-05-07T14:09:49.405 DEBUG:teuthology.orchestra.run:Running [10.214.138.100]: "/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '' pg_num" 
2014-05-07T14:09:49.579 INFO:teuthology.task.rgw.client.0.out:[10.214.138.108]: 2014-05-07 21:09:49.580386 7fe09f6dc780 -1 shutting down
2014-05-07T14:09:49.595 INFO:teuthology.orchestra.run.err:[10.214.138.100]: Invalid command:  saw 0 of pool(<poolname>), expected 1
2014-05-07T14:09:49.595 INFO:teuthology.orchestra.run.err:[10.214.138.100]: Error EINVAL: invalid command
2014-05-07T14:09:49.615 ERROR:teuthology.run_tasks:Saw exception from tasks
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-dumpling/teuthology/run_tasks.py", line 27, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/thrashosds.py", line 155, in task
    logger=log.getChild('ceph_manager'),
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph_manager.py", line 322, in __init__
    self.pools[pool] = self.get_pool_property(pool, 'pg_num')
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph_manager.py", line 594, in get_pool_property
    prop)
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph_manager.py", line 335, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/teuthology-dumpling/teuthology/orchestra/remote.py", line 47, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/teuthology-dumpling/teuthology/orchestra/run.py", line 267, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/teuthology-dumpling/teuthology/orchestra/run.py", line 263, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.138.100 with status 22: "/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '' pg_num" 
archive_path: /var/lib/teuthworker/archive/ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps/240939
branch: dumpling
description: upgrade/dumpling/rgw/{0-cluster/start.yaml 1-dumpling-install/v0.67.1.yaml
  2-workload/testrgw.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final/osdthrash.yaml}
email: null
job_id: '240939'
kernel: &id001
  kdb: true
  sha1: f74d66a3ec1b62a663451083091ccb8341d721ec
last_in_suite: false
machine_type: vps
name: ubuntu-2014-05-06_21:02:54-upgrade:dumpling-dumpling-testing-basic-vps
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: dumpling
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    fs: xfs
    log-whitelist:
    - slow request
    - scrub
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 14650b282ecd344025714a4e743b255ae01b3ce0
  ceph-deploy:
    branch:
      dev: dumpling
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 14650b282ecd344025714a4e743b255ae01b3ce0
  s3tests:
    branch: dumpling
  workunit:
    sha1: 14650b282ecd344025714a4e743b255ae01b3ce0
owner: scheduled_ubuntu@yw
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mon.c
  - osd.3
  - osd.4
  - osd.5
  - client.0
suite: upgrade:dumpling
targets:
  ubuntu@vpm036.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAQxsNepwAjUm//hG2UtfuaV3xnVAEIbUYQtF5avojA73oJFnznq+ombYQ1hCpZ0fOt3Dng4+Ef6uUsNs0k9k7Wx4S0UU9LZ/4fH1NHdfkM25Jtw8RLSi+rrR/tAs/fyAa9gQIWVMNFxvxPD+cXkWBzgR+jLyBxlRCgMTodyLDnhl2vj246gKnyBi1Vp1gtHl5DLxcuF1LW8f+tGU+Wj1rvXJCOtnQyiQy9O4L5UrIlaZhkffRGGKEQU2tOZlNTp4g4g3r9JzXcz94y9ZdHau/AiXRdaAZNipdjp3yx1y1Jg81z0MB3J4p9lb0k27rDM3yuPoQpeo5kAokZhrP/QwV
  ubuntu@vpm037.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuC+HJgGl/7kj50Zzbtx6khTHDY76vW0VvqwO+QE8B75uBiyBLR1UFeIO9H/p/wupeNDE7aGQfQEqLQ3suCytNVr2YKUgxoSz8hMcpSoGO9HW4qNxnC2AvWQGtoZ6aDHz9UuPbppSYp67D3nBhTO1r839DRPL/k8ea0QbWGWHLcO4idHHw7gTHLkobePbsSddnw5TZF6tj1Z+dNSMBlKYUYjwIg1Zdq6zX95kL5BqvZj2OlIs2G9PvVFrgsaVrMhW7L2ohd+XiSZzVa92zTShh6citT+/mm+V4jje/EAOdiuhPC07yuCw/zFME1SPmU22c2/eGFja6P8basy6IMCB/
tasks:
- internal.lock_machines:
  - 2
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    tag: v0.67.1
- ceph: null
- parallel:
  - workload
  - upgrade-sequence
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- swift:
    client.0:
      rgw_server: client.0
teuthology_branch: dumpling
upgrade-sequence:
  sequential:
  - install.upgrade:
      all:
        branch: dumpling
  - ceph.restart:
    - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.4
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.5
  - sleep:
      duration: 30
  - ceph.restart:
    - rgw.client.0
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.19359
workload:
  sequential:
  - rgw:
    - client.0
  - s3tests:
      client.0:
        rgw_server: client.0

description: upgrade/dumpling/rgw/{0-cluster/start.yaml 1-dumpling-install/v0.67.1.yaml
2-workload/testrgw.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final/osdthrash.yaml}
duration: 973.9422459602356
failure_reason: 'Command failed on 10.214.138.100 with status 22: "/home/ubuntu/cephtest/adjust-ulimits
ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '''' pg_num"'
flavor: basic
mon.a-kernel-sha1: f74d66a3ec1b62a663451083091ccb8341d721ec
mon.b-kernel-sha1: f74d66a3ec1b62a663451083091ccb8341d721ec
owner: scheduled_ubuntu@yw
success: false


Related issues 3 (0 open3 closed)

Related to rgw - Bug #8213: RGW is creating empty pool namesDuplicate04/25/2014

Actions
Related to rgw - Bug #9899: Error "coverage ceph osd pool get '' pg_num" in upgrade:dumpling-dumpling-distro-basic-multi runResolvedSage Weil10/26/2014

Actions
Has duplicate rgw - Bug #8517: Can't initiate multipart upload in firefly branch.Duplicate06/03/2014

Actions
Actions #1

Updated by Sage Weil almost 10 years ago

  • Priority changed from Normal to Urgent
  • Source changed from other to Q/A
Actions #3

Updated by Yuri Weinstein almost 10 years ago

  • Severity changed from 3 - minor to 2 - major
Actions #4

Updated by Tamilarasi muthamizhan almost 10 years ago

  • Assignee set to Tamilarasi muthamizhan
Actions #6

Updated by Samuel Just almost 10 years ago

(05:04:35 PM) sjusthm: yehudasa: ping
(05:04:45 PM) sjusthm: in the mon log
(05:04:46 PM) sjusthm: I see
(05:04:48 PM) sjusthm: 2014-05-07 20:58:16.606953 7f60b57d9700 1 -- 10.214.138.100:6789/0 <== client.4155 10.214.138.108:0/1004734 9 ==== pool_op(create pool 0 auid 0 tid 18 name .rgw.root v7) v4 ==== 74+0+0 (1826629628 0 0) 0x2558200 con 0x283d580
(05:04:53 PM) sjusthm: 2014-05-07 20:58:15.382685 7f60b57d9700 1 -- 10.214.138.100:6789/0 <== client.4155 10.214.138.108:0/1004734 7 ==== pool_op(create pool 0 auid 0 tid 1 name v0) v4 ==== 65+0+0 (2517894246 0 0) 0x260d400 con 0x283d580
(05:04:59 PM) sjusthm: and that
(05:05:08 PM) sjusthm: so same client creates those two pools
(05:05:14 PM) sjusthm: .rgw.root and ''
(05:05:21 PM) sjusthm: does radosgw create pools?
(05:05:27 PM) joshd: yes
(05:05:36 PM) sjusthm: if so is there a plausible way for it have created a pool with an empty name?
(05:05:37 PM) joshd: if you give it mon w permission it will
(05:05:39 PM) joshd: yes
(05:05:42 PM) sjusthm: ah, how?
(05:05:43 PM) joshd: it's a known bug
(05:05:46 PM) sjusthm: ah
(05:05:56 PM) sjusthm: just needs to be backported to dumpling?
(05:05:59 PM) joshd: something about default something, I forget the details
(05:06:13 PM) joshd: don't remember if it was fixed in firefly or not

Actions #7

Updated by Samuel Just almost 10 years ago

  • Project changed from teuthology to rgw

looks like something that needs to be backported to dumpling?

Actions #8

Updated by Tamilarasi muthamizhan almost 10 years ago

  • Assignee changed from Tamilarasi muthamizhan to Yehuda Sadeh
Actions #9

Updated by Yehuda Sadeh almost 10 years ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Yehuda Sadeh to Josh Durgin
Actions #10

Updated by Tamilarasi muthamizhan almost 10 years ago

recent log [just for reference]:
ubuntu@teuthology:/a/teuthology-2014-06-01_19:15:10-upgrade:dumpling-dumpling-testing-basic-vps/284079/

Actions #11

Updated by Josh Durgin almost 10 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee changed from Josh Durgin to Yehuda Sadeh
  • Backport set to firefly, dumpling

commit:b300318113b162522759d4794b1cfa3d1d9398e4

Actions #12

Updated by Sylvain Munaut almost 10 years ago

Oh damn, I reported #8517 which turns out to be a duplicate of this.

However I used the bucket.get_data_extra_pool() helper in the fix rather than re-implementing the fallback logic at another place.

Actions #13

Updated by Sage Weil almost 10 years ago

this causes a similar failure in upgrade/firefly when mon_thrash tries to run:

2014-07-11T11:54:00.825 INFO:teuthology.orchestra.run.err:[10.214.132.29]: Invalid command:  missing required parameter pool(<poolname>)
2014-07-11T11:54:00.825 INFO:teuthology.orchestra.run.err:[10.214.132.29]: osd pool get <poolname> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid :  get pool parameter <var>
2014-07-11T11:54:00.825 INFO:teuthology.orchestra.run.err:[10.214.132.29]: Error EINVAL: invalid command
2014-07-11T11:54:00.838 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_firefly/teuthology/run_tasks.py", line 43, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/run_tasks.py", line 31, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/mon_thrash.py", line 331, in task
    logger=log.getChild('ceph_manager'),
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph_manager.py", line 422, in __init__
    self.pools[pool] = self.get_pool_property(pool, 'pg_num')
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph_manager.py", line 788, in get_pool_property
    prop)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph_manager.py", line 438, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/src/teuthology_firefly/teuthology/orchestra/remote.py", line 106, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/orchestra/run.py", line 330, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/orchestra/run.py", line 326, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.132.29 with status 22: "adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '' pg_num" 

Actions #14

Updated by Yuri Weinstein almost 10 years ago

How come we did not see these errors for awhile, and now do in brand new upgrade/firefly?

Actions #15

Updated by Sage Weil almost 10 years ago

  • Status changed from Pending Backport to Resolved
  • Backport changed from firefly, dumpling to firefly

should be fixed now

Actions #16

Updated by Yuri Weinstein almost 10 years ago

I still see it on firefly;
http://pulpito.front.sepia.ceph.com/ubuntu-2014-07-15_21:01:54-upgrade:firefly-firefly-testing-basic-plana/363998/

"Command failed on 10.214.133.37 with status 22: "adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get '' pg_num""

Actions #18

Updated by Sage Weil almost 10 years ago

Ok, it installs dumpling, upgrades to v0.80.1, then runs radosgw. Is there a way to work around thsi bug (which is i think in 0.80.1) or do we need to remove this test from the upgrade tests?

Actions #19

Updated by Yuri Weinstein almost 10 years ago

Sage Weil wrote:

Ok, it installs dumpling, upgrades to v0.80.1, then runs radosgw. Is there a way to work around thsi bug (which is i think in 0.80.1) or do we need to remove this test from the upgrade tests?

Should I remove v0.80.1.yaml from ceph-qa-suite/suites/upgrade/firefly/1-install?

Actions #20

Updated by Sage Weil over 9 years ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Yehuda Sadeh to Yuri Weinstein
Actions #21

Updated by Yuri Weinstein over 9 years ago

  • Status changed from Fix Under Review to 7
Actions #22

Updated by Sage Weil over 9 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF