Bug #8315
closedosd: watch callback vs callback funky
0%
Description
coredump from ceph-osd.5.log.gz in /a/teuthology-2014-05-07_19:15:07-upgrade:dumpling-dumpling-testing-basic-plana/242015/remote/ubuntu@plana47.front.sepia.ceph.com/log
-2> 2014-05-08 09:42:01.703766 7f9e458f8700 5 --OSD::tracker-- reqid: unknown.0.0:0, seq: 198, time: 2014-05-08 09:42:01.703766, event: reached_pg, request: MOSDPGPush(1.d 69 [PushOp(d0c18e1d/605.00000000/head//1, version: 5'1, data_included: [0~230], data_size: 230, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(d0c18e1d/605.00000000/head//1@5'1, copy_subset: [0~230], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:230, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(d90270ad/mds_snaptable/head//1, version: 5'2, data_included: [0~46], data_size: 46, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(d90270ad/mds_snaptable/head//1@5'2, copy_subset: [0~46], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:46, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v1 -1> 2014-05-08 09:42:01.703787 7f9e458f8700 10 osd.5 pg_epoch: 69 pg[1.d( v 5'2 lc 0'0 (0'0,5'2] local-les=69 n=2 ec=1 les/c 54/54 68/68/53) [1,5] r=1 lpr=69 pi=53-67/1 luod=0'0 active m=2] do_pending_flush doing pending flush 0> 2014-05-08 09:42:01.711999 7f9e450f7700 -1 *** Caught signal (Aborted) ** in thread 7f9e450f7700 ceph version 0.67.8-5-g0e685c6 (0e685c68aa6a500aa7fa433cd9b8246f70c5383e) 1: ceph-osd() [0x7fd33a] 2: (()+0xfcb0) [0x7f9e59205cb0] 3: (gsignal()+0x35) [0x7f9e5728b425] 4: (abort()+0x17b) [0x7f9e5728eb8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f9e57bde69d] 6: (()+0xb5846) [0x7f9e57bdc846] 7: (()+0xb5873) [0x7f9e57bdc873] 8: (()+0xb596e) [0x7f9e57bdc96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x8c6f5f] 10: (Watch::get_delayed_cb()+0xc3) [0x6ddad3] 11: (ReplicatedPG::handle_watch_timeout(std::tr1::shared_ptr<Watch>)+0x9e9) [0x5f0789] 12: (ReplicatedPG::check_blacklisted_obc_watchers(ObjectContext*)+0x3ba) [0x5f0dfa] 13: (ReplicatedPG::populate_obc_watchers(ObjectContext*)+0x60b) [0x5f16db] 14: (ReplicatedPG::get_object_context(hobject_t const&, bool)+0x1c4) [0x5f2084] 15: (ReplicatedPG::prep_object_replica_pushes(hobject_t const&, eversion_t, int, std::map<int, std::vector<PushOp, std::allocator<PushOp> >, std::less<int>, std::allocator<std::pair<int const, std::vector<PushOp, std::allocator<PushOp> > > > >*)+0x10e) [0x6022fe] 16: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0x657) [0x603847] 17: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&)+0x736) [0x621ff6] 18: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x1b8) [0x68a578] 19: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x11) [0x6c9ce1] 20: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b7796] 21: (ThreadPool::WorkThread::entry()+0x10) [0x8b95a0] 22: (()+0x7e9a) [0x7f9e591fde9a] 23: (clone()+0x6d) [0x7f9e573493fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-05-08T09:43:23.799 ERROR:teuthology.misc:Saw exception from osd.5 Traceback (most recent call last): File "/home/teuthworker/teuthology-dumpling/teuthology/misc.py", line 831, in stop_daemons_of_type daemon.stop() File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph.py", line 36, in stop run.wait([self.proc]) File "/home/teuthworker/teuthology-dumpling/teuthology/orchestra/run.py", line 282, in wait proc.exitstatus.get() File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 207, in get raise self._exception CommandFailedError: Command failed on 10.214.132.31 with status 1: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage sudo /home/ubuntu/cephtest/daemon-helper kill ceph-osd -f -i 5'
archive_path: /var/lib/teuthworker/archive/teuthology-2014-05-07_19:15:07-upgrade:dumpling-dumpling-testing-basic-plana/242015 branch: dumpling description: upgrade/dumpling/rbd/{0-cluster/start.yaml 1-dumpling-install/v0.67.4.yaml 2-workload/rbd.yaml 3-upgrade-sequence/upgrade-mds-mon-osd.yaml 4-final/osdthrash.yaml} email: null job_id: '242015' kernel: &id001 kdb: true sha1: d35a5d13758c7a7aeb19dbd17e51b4c9a712ca2c last_in_suite: false machine_type: plana name: teuthology-2014-05-07_19:15:07-upgrade:dumpling-dumpling-testing-basic-plana nuke-on-error: true os_type: ubuntu overrides: admin_socket: branch: dumpling ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 fs: xfs log-whitelist: - slow request - scrub - wrongly marked me down - objects unfound and apparently lost - log bound mismatch sha1: 0e685c68aa6a500aa7fa433cd9b8246f70c5383e ceph-deploy: branch: dev: dumpling conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 0e685c68aa6a500aa7fa433cd9b8246f70c5383e s3tests: branch: dumpling workunit: sha1: 0e685c68aa6a500aa7fa433cd9b8246f70c5383e owner: scheduled_teuthology@teuthology roles: - - mon.a - mds.a - osd.0 - osd.1 - osd.2 - - mon.b - mon.c - osd.3 - osd.4 - osd.5 - client.0 suite: upgrade:dumpling targets: ubuntu@plana47.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDIDs+Sl9mfdFX0VZy41zfHkMJlD1OtErcQsSPLq3wrQ5HPPflg8IRtS45LHsYBAyBcAl4fmeG1q45jOrHj6yibJhT/HzxO6TOikYKUgr3uWy804NsTAP/A2SJ4wA8Y+1WyiAjBs0EAnISCnIl5Arj17NYPBroDeclKqZEJpL5UbPCmMibqEmjIPNVNulukox/Dh9GJZWlxqD0aKfTRnIOM+6RWHv2h9mCYVtU2ZbMx+QR0sxji5IJzg4ITlGaR9g+U3DnOU95pkxb3x6mE+H8udB58FONty/DHaAC/rXGfjkg056R3iq8FJUpF3HKKjILLRHjLb3ueK3tN1zrEueCt ubuntu@plana49.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8zmyBYSlUYcY+7hO2WPa2uECTc3U4k/Sm7DY/YZAEtc0Z8i1FpvFa1MA1PV3g2DAzo+E6JIBFzbFSrgXeDsEMFl1Qy2jdrvnq66veVOEyDk4y7I4wseE/eTSnhTQE720cnWq1DuVyFgcyRvYGJo3HHoFTgxLwvPYdhfLPq8elBNJmuOllsqCgsmX5rwqZmsQcP29aus/1s3IVF1TEHYmJLTwRb8qW6W3r55C2ZbSwrvvNXb7zFyydCgAfPC+8dfqiMfp3T7Gc56HUHAWvXU9EnxDINHQmLhrdccFyq0BVd+i2m1ubuXULK2RMImFojhEnkLTFH2Qa5hSegvLLffab tasks: - internal.lock_machines: - 2 - plana - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: tag: v0.67.4 - ceph: null - parallel: - workload - upgrade-sequence - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - workunit: clients: client.0: - rbd/test_lock_fence.sh teuthology_branch: dumpling upgrade-sequence: sequential: - install.upgrade: all: branch: dumpling - ceph.restart: - mds.a - sleep: duration: 60 - ceph.restart: - mon.a - sleep: duration: 60 - ceph.restart: - mon.b - sleep: duration: 60 - ceph.restart: - mon.c - sleep: duration: 60 - ceph.restart: - osd.0 - sleep: duration: 30 - ceph.restart: - osd.1 - sleep: duration: 30 - ceph.restart: - osd.2 - sleep: duration: 30 - ceph.restart: - osd.3 - sleep: duration: 30 - ceph.restart: - osd.4 - sleep: duration: 30 - ceph.restart: - osd.5 verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.19218 workload: sequential: - workunit: clients: client.0: - rbd/import_export.sh env: RBD_CREATE_ARGS: --new-format - workunit: clients: client.0: - cls/test_cls_rbd.sh
description: upgrade/dumpling/rbd/{0-cluster/start.yaml 1-dumpling-install/v0.67.4.yaml 2-workload/rbd.yaml 3-upgrade-sequence/upgrade-mds-mon-osd.yaml 4-final/osdthrash.yaml} duration: 818.8523149490356 failure_reason: 'Command failed on 10.214.132.31 with status 1: ''/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage sudo /home/ubuntu/cephtest/daemon-helper kill ceph-osd -f -i 5''' flavor: basic mon.a-kernel-sha1: d35a5d13758c7a7aeb19dbd17e51b4c9a712ca2c mon.b-kernel-sha1: d35a5d13758c7a7aeb19dbd17e51b4c9a712ca2c owner: scheduled_teuthology@teuthology success: false
Updated by Yuri Weinstein almost 10 years ago
I could not reproduce it on re-run logs are in /a/teuthology-2014-05-07_19:15:07-upgrade:dumpling-dumpling-testing-basic-plana/242015/
Updated by Samuel Just almost 10 years ago
- Priority changed from Normal to Urgent
Updated by Samuel Just almost 10 years ago
Tricky... check_blacklisted_obc_watchers does not clear watch->cb before calling handle_watch_timeout. Watch.cc suggests that watch->cb should be cleared (because it is currently in the process of being invoked) before handle_watch_timeout (which might replace cb with a DelayedWatchTimeout) is called. Seems like check_blacklisted_obc_watchers needs to "cancel", in some sense, the scheduled Handle[Delayed]WatchTimeout currently occupying cb.
Updated by Samuel Just almost 10 years ago
Also, the bug manifested in dumpling, but appears to exist in the same form in master as well.
Updated by Sage Weil almost 10 years ago
- Subject changed from osd.5 crashed in upgrade:dumpling-dumpling-testing-basic-plana suite to osd: watch callback vs callback funky
- Status changed from New to 12
Updated by Samuel Just over 9 years ago
- Status changed from 7 to Fix Under Review
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Pending Backport