Bug #8021: osd: ENOENT on clone on dumpling - Ceph - Ceph

Actions

Copy link

Bug #8021

closed

osd: ENOENT on clone on dumpling

Added by Yuri Weinstein about 10 years ago. Updated almost 10 years ago.

Status:

Duplicate

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-04-06_22:35:23-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/175609/

2014-04-07T08:17:12.102 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/run_tasks.py", line 92, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/thrashosds.py", line 172, in task
    thrash_proc.do_join()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph_manager.py", line 153, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.3 restart
2014-04-07T08:17:12.103 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2014-04-07T08:17:12.104 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade
2014-04-07T08:17:12.104 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2014-04-07T08:17:12.104 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json'
2014-04-07T08:17:12.515 INFO:teuthology.orchestra.run.err:[10.214.138.110]: dumped all in format json
2014-04-07T08:17:13.538 INFO:teuthology.task.ceph:Scrubbing osd osd.0
2014-04-07T08:17:13.538 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.0'
2014-04-07T08:17:13.926 INFO:teuthology.orchestra.run.err:[10.214.138.110]: osd.0 instructed to scrub
2014-04-07T08:17:13.931 INFO:teuthology.task.ceph:Scrubbing osd osd.1
2014-04-07T08:17:13.931 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.1'
2014-04-07T08:17:14.282 INFO:teuthology.orchestra.run.err:[10.214.138.110]: osd.1 instructed to scrub
2014-04-07T08:17:14.293 INFO:teuthology.task.ceph:Scrubbing osd osd.2
2014-04-07T08:17:14.293 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.2'
2014-04-07T08:17:14.473 INFO:teuthology.orchestra.run.err:[10.214.138.110]: osd.2 instructed to scrub
2014-04-07T08:17:14.479 INFO:teuthology.task.ceph:Scrubbing osd osd.3
2014-04-07T08:17:14.479 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.3'
2014-04-07T08:17:14.649 INFO:teuthology.orchestra.run.err:[10.214.138.110]: Error EAGAIN: osd.3 is not up
2014-04-07T08:17:14.655 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/contextutil.py", line 29, in nested
    yield vars
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1458, in task
    osd_scrub_pgs(ctx, config)
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1090, in osd_scrub_pgs
    'ceph', 'osd', 'scrub', role])
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/remote.py", line 106, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 330, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 326, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.138.110 with status 11: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.3'
2014-04-07T08:17:14.656 INFO:teuthology.misc:Shutting down mds daemons...

archive_path: /var/lib/teuthworker/archive/teuthology-2014-04-06_22:35:23-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/175609
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/debian_7.0.yaml}
email: null
job_id: '175609'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-04-06_22:35:23-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps
nuke-on-error: true
os_type: debian
os_version: '7.0'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 4aef403dbc2ba3dd572d13c43b5192f04941dc07
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 4aef403dbc2ba3dd572d13c43b5192f04941dc07
  s3tests:
    branch: master
  workunit:
    sha1: 4aef403dbc2ba3dd572d13c43b5192f04941dc07
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
targets:
  ubuntu@vpm028.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCzkbBqAhv++OhEi25eMniDRQ/lFDa0wUt7LSUvSByR/WEFJCI3k2gGrNtLatzup/KD5EG2rX3caBDgkvrHBT+VhxOaQmJ9ZLObPth/nOTlBadph+V3KEhk3tY2TKohQLxdJwHA9k2tw3eXNawGYMTkZVfDX0+VNG+dmJj54kgsDOLwJbxCl+cNXdTL03Hi/t77O1V7A+8bKi3ekDkvYyeCKW7Os805YcW71LE+CFHA0nuFCpuFDeaigyskSJHaFuTyXsMrPw1W6wYQIfp4gaee85h0Ck4za6kew4uQhnurIiZA9SThx3vIcJ4tgLw5kXP5jh8jyLA+r8VhVqzqbnHr
  ubuntu@vpm043.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDaAeOPM25456Wc8ABsVqIauazuzI7L5clljnl8Elt1/LnetDvTMXBb/1x1eWGSLcTUf7ry2MS77ML+iosNWfSMJxlo4DtdPwNt6omzrAjO34bZYx1V/uTTjZedRGadcPJcNcEl+ep7ewH+For/2pw1N6U5WkYsFcTkULPzmASSPzUFoghD84yAQSGDDjPBMGkIkd1xImUA04ZrbhD0MvyPAq2MfIk4w20oB6nDHr4D9X3L3Bk2VhEBbZGx07BkUVv6lNI1BGzTDPwEhZqjdFRSeEAwoOkqk6pCBjiDB4rcs8nEQ6zufPWnxHPLPzWosKJdSbceOFHvKCwjfi8OQvdV
  ubuntu@vpm048.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbt7K61Yi+MIUBkvBqsPV91YKh35IrHZNlomafOKy4kdViKmg7KjkP1dvnF2xF1s8BOaV7fx+nMzWl5IQhgSWXqtuG+OnvXpTpRoynLeoIrcJUPZZGNBbtVpcav5lhtSXhQB/NEv/OHCKbLT8QN/3rLkg5M+r6KpgEjspui8moMXQrhMI69xHgqXqEVma5lo6CVQP0bbmYEUasOSxipygMCI6SxpuOIDMNhfF5MHP3z3eTqbQQyEQkMTeNCE3S9wS00OPqJtGTS7oXJIh8Y78qpk1dkuXz4kyfOaIOenynLT/lW4hwGRxmcivsyCYvOApTLYvzyhoby0jcrK3GKtfP
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd.sh
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0:
      idle_timeout: 120
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.17032

description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/debian_7.0.yaml}
duration: 4277.9789888858795
failure_reason: timed out waiting for admin_socket to appear after osd.3 restart
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Sage Weil about 10 years ago

Subject changed from "Exception:" ... "after osd.3 restart" in upgrade:dumpling-x:stress-split-firefly-distro-basic-vps to osd: ENOENT on clone on dumpling
Priority changed from Normal to High
Source changed from other to Q/A

i think this has been fixed on firefly but not dumpling.

  -296> 2014-04-07 14:22:03.666754 7f7ef8048700  0 filestore(/var/lib/ceph/osd/ceph-3) ENOENT on clone suggests osd bug
    -1> 2014-04-07 14:22:03.666755 7f7ef8048700  0 filestore(/var/lib/ceph/osd/ceph-3)  transaction dump:
{ "ops": [
        { "op_num": 0,
          "op_name": "clone",
          "collection": "3.12_head",
          "src_oid": "abaf32\/vpm04315965-29\/head\/\/3",
          "dst_oid": "abaf32\/vpm04315965-29\/a5\/\/3"},
        { "op_num": 1,
          "op_name": "setattr",
          "collection": "3.12_head",
          "oid": "abaf32\/vpm04315965-29\/a5\/\/3",
          "name": "_",
          "length": 226},
        { "op_num": 2,
          "op_name": "rmattr",
          "collection": "3.12_head",
          "oid": "abaf32\/vpm04315965-29\/a5\/\/3",
          "name": "snapset"},
        { "op_num": 3,
          "op_name": "setattr",
          "collection": "3.12_head",
          "oid": "abaf32\/vpm04315965-29\/head\/\/3",
          "name": "_",
          "length": 233},
        { "op_num": 4,
          "op_name": "setattr",
          "collection": "3.12_head",
          "oid": "abaf32\/vpm04315965-29\/head\/\/3",
          "name": "snapset",
          "length": 99}]}
     0> 2014-04-07 14:22:03.897748 7f7ef8048700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)' thread 7f7ef8048700 time 2014-04-07 14:22:03.679077
os/FileStore.cc: 2906: FAILED assert(0 == "unexpected error")

 ceph version 0.67.7-56-gc66b61f (c66b61f9dcad217429e4876d27881d9fb2e7666f)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int)+0x94f) [0x8caf4f]
 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x69) [0x8ce419]
 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x184) [0x8ce5b4]
 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9cbe3a]
 5: (ThreadPool::WorkThread::entry()+0x10) [0x9cd090]
 6: (()+0x6b50) [0x7f7f02175b50]
 7: (clone()+0x6d) [0x7f7f00552a7d]

Actions

Copy link