Actions
Bug #8021
closedosd: ENOENT on clone on dumpling
Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2014-04-07T08:17:12.102 ERROR:teuthology.run_tasks:Manager failed: thrashosds Traceback (most recent call last): File "/home/teuthworker/teuthology-firefly/teuthology/run_tasks.py", line 92, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/home/teuthworker/teuthology-firefly/teuthology/task/thrashosds.py", line 172, in task thrash_proc.do_join() File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph_manager.py", line 153, in do_join self.thread.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception Exception: timed out waiting for admin_socket to appear after osd.3 restart 2014-04-07T08:17:12.103 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart 2014-04-07T08:17:12.104 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade 2014-04-07T08:17:12.104 DEBUG:teuthology.run_tasks:Unwinding manager ceph 2014-04-07T08:17:12.104 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json' 2014-04-07T08:17:12.515 INFO:teuthology.orchestra.run.err:[10.214.138.110]: dumped all in format json 2014-04-07T08:17:13.538 INFO:teuthology.task.ceph:Scrubbing osd osd.0 2014-04-07T08:17:13.538 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.0' 2014-04-07T08:17:13.926 INFO:teuthology.orchestra.run.err:[10.214.138.110]: osd.0 instructed to scrub 2014-04-07T08:17:13.931 INFO:teuthology.task.ceph:Scrubbing osd osd.1 2014-04-07T08:17:13.931 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.1' 2014-04-07T08:17:14.282 INFO:teuthology.orchestra.run.err:[10.214.138.110]: osd.1 instructed to scrub 2014-04-07T08:17:14.293 INFO:teuthology.task.ceph:Scrubbing osd osd.2 2014-04-07T08:17:14.293 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.2' 2014-04-07T08:17:14.473 INFO:teuthology.orchestra.run.err:[10.214.138.110]: osd.2 instructed to scrub 2014-04-07T08:17:14.479 INFO:teuthology.task.ceph:Scrubbing osd osd.3 2014-04-07T08:17:14.479 DEBUG:teuthology.orchestra.run:Running [10.214.138.110]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.3' 2014-04-07T08:17:14.649 INFO:teuthology.orchestra.run.err:[10.214.138.110]: Error EAGAIN: osd.3 is not up 2014-04-07T08:17:14.655 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/teuthology-firefly/teuthology/contextutil.py", line 29, in nested yield vars File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1458, in task osd_scrub_pgs(ctx, config) File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1090, in osd_scrub_pgs 'ceph', 'osd', 'scrub', role]) File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/remote.py", line 106, in run r = self._runner(client=self.ssh, **kwargs) File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 330, in run r.exitstatus = _check_status(r.exitstatus) File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 326, in _check_status raise CommandFailedError(command=r.command, exitstatus=status, node=host) CommandFailedError: Command failed on 10.214.138.110 with status 11: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.3' 2014-04-07T08:17:14.656 INFO:teuthology.misc:Shutting down mds daemons...
archive_path: /var/lib/teuthworker/archive/teuthology-2014-04-06_22:35:23-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/175609 description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml 6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/debian_7.0.yaml} email: null job_id: '175609' kernel: &id001 kdb: true sha1: distro last_in_suite: false machine_type: vps name: teuthology-2014-04-06_22:35:23-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps nuke-on-error: true os_type: debian os_version: '7.0' overrides: admin_socket: branch: firefly ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - wrongly marked me down - objects unfound and apparently lost - log bound mismatch sha1: 4aef403dbc2ba3dd572d13c43b5192f04941dc07 ceph-deploy: branch: dev: firefly conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 4aef403dbc2ba3dd572d13c43b5192f04941dc07 s3tests: branch: master workunit: sha1: 4aef403dbc2ba3dd572d13c43b5192f04941dc07 owner: scheduled_teuthology@teuthology roles: - - mon.a - mon.b - mds.a - osd.0 - osd.1 - osd.2 - - osd.3 - osd.4 - osd.5 - mon.c - - client.0 targets: ubuntu@vpm028.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCzkbBqAhv++OhEi25eMniDRQ/lFDa0wUt7LSUvSByR/WEFJCI3k2gGrNtLatzup/KD5EG2rX3caBDgkvrHBT+VhxOaQmJ9ZLObPth/nOTlBadph+V3KEhk3tY2TKohQLxdJwHA9k2tw3eXNawGYMTkZVfDX0+VNG+dmJj54kgsDOLwJbxCl+cNXdTL03Hi/t77O1V7A+8bKi3ekDkvYyeCKW7Os805YcW71LE+CFHA0nuFCpuFDeaigyskSJHaFuTyXsMrPw1W6wYQIfp4gaee85h0Ck4za6kew4uQhnurIiZA9SThx3vIcJ4tgLw5kXP5jh8jyLA+r8VhVqzqbnHr ubuntu@vpm043.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDaAeOPM25456Wc8ABsVqIauazuzI7L5clljnl8Elt1/LnetDvTMXBb/1x1eWGSLcTUf7ry2MS77ML+iosNWfSMJxlo4DtdPwNt6omzrAjO34bZYx1V/uTTjZedRGadcPJcNcEl+ep7ewH+For/2pw1N6U5WkYsFcTkULPzmASSPzUFoghD84yAQSGDDjPBMGkIkd1xImUA04ZrbhD0MvyPAq2MfIk4w20oB6nDHr4D9X3L3Bk2VhEBbZGx07BkUVv6lNI1BGzTDPwEhZqjdFRSeEAwoOkqk6pCBjiDB4rcs8nEQ6zufPWnxHPLPzWosKJdSbceOFHvKCwjfi8OQvdV ubuntu@vpm048.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbt7K61Yi+MIUBkvBqsPV91YKh35IrHZNlomafOKy4kdViKmg7KjkP1dvnF2xF1s8BOaV7fx+nMzWl5IQhgSWXqtuG+OnvXpTpRoynLeoIrcJUPZZGNBbtVpcav5lhtSXhQB/NEv/OHCKbLT8QN/3rLkg5M+r6KpgEjspui8moMXQrhMI69xHgqXqEVma5lo6CVQP0bbmYEUasOSxipygMCI6SxpuOIDMNhfF5MHP3z3eTqbQQyEQkMTeNCE3S9wS00OPqJtGTS7oXJIh8Y78qpk1dkuXz4kyfOaIOenynLT/lW4hwGRxmcivsyCYvOApTLYvzyhoby0jcrK3GKtfP tasks: - internal.lock_machines: - 3 - vps - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: dumpling - ceph: fs: xfs - install.upgrade: osd.0: null - ceph.restart: daemons: - osd.0 - osd.1 - osd.2 - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 thrash_primary_affinity: false timeout: 1200 - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - rbd/test_librbd.sh - install.upgrade: mon.c: null - ceph.restart: daemons: - mon.c wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: - a - b - c - workunit: branch: dumpling clients: client.0: - rados/test-upgrade-firefly.sh - workunit: branch: dumpling clients: client.0: - rbd/test_librbd_python.sh - rgw: client.0: idle_timeout: 120 - swift: client.0: rgw_server: client.0 - rados: clients: - client.0 objects: 500 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 teuthology_branch: firefly verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.17032
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml 6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/debian_7.0.yaml} duration: 4277.9789888858795 failure_reason: timed out waiting for admin_socket to appear after osd.3 restart flavor: basic owner: scheduled_teuthology@teuthology success: false
Updated by Sage Weil about 10 years ago
- Subject changed from "Exception:" ... "after osd.3 restart" in upgrade:dumpling-x:stress-split-firefly-distro-basic-vps to osd: ENOENT on clone on dumpling
- Priority changed from Normal to High
- Source changed from other to Q/A
i think this has been fixed on firefly but not dumpling.
-296> 2014-04-07 14:22:03.666754 7f7ef8048700 0 filestore(/var/lib/ceph/osd/ceph-3) ENOENT on clone suggests osd bug -1> 2014-04-07 14:22:03.666755 7f7ef8048700 0 filestore(/var/lib/ceph/osd/ceph-3) transaction dump: { "ops": [ { "op_num": 0, "op_name": "clone", "collection": "3.12_head", "src_oid": "abaf32\/vpm04315965-29\/head\/\/3", "dst_oid": "abaf32\/vpm04315965-29\/a5\/\/3"}, { "op_num": 1, "op_name": "setattr", "collection": "3.12_head", "oid": "abaf32\/vpm04315965-29\/a5\/\/3", "name": "_", "length": 226}, { "op_num": 2, "op_name": "rmattr", "collection": "3.12_head", "oid": "abaf32\/vpm04315965-29\/a5\/\/3", "name": "snapset"}, { "op_num": 3, "op_name": "setattr", "collection": "3.12_head", "oid": "abaf32\/vpm04315965-29\/head\/\/3", "name": "_", "length": 233}, { "op_num": 4, "op_name": "setattr", "collection": "3.12_head", "oid": "abaf32\/vpm04315965-29\/head\/\/3", "name": "snapset", "length": 99}]} 0> 2014-04-07 14:22:03.897748 7f7ef8048700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)' thread 7f7ef8048700 time 2014-04-07 14:22:03.679077 os/FileStore.cc: 2906: FAILED assert(0 == "unexpected error") ceph version 0.67.7-56-gc66b61f (c66b61f9dcad217429e4876d27881d9fb2e7666f) 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int)+0x94f) [0x8caf4f] 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x69) [0x8ce419] 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x184) [0x8ce5b4] 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9cbe3a] 5: (ThreadPool::WorkThread::entry()+0x10) [0x9cd090] 6: (()+0x6b50) [0x7f7f02175b50] 7: (clone()+0x6d) [0x7f7f00552a7d]
Updated by Yuri Weinstein about 10 years ago
I think I see it on firefly if it's the same issue.
See logs here:
http://qa-proxy.ceph.com/teuthology/teuthology-2014-04-09_22:35:02-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/182710/
Updated by Sage Weil almost 10 years ago
- Status changed from New to Duplicate
Actions