Bug #9788
closed"Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" issues
0%
Description
Error from 'scrap':
Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.67.11-22-gddc8a82 (ddc8a827d1baabc0bcb1df9ded37edc9820d8cac) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x107) [0x816bb7] 2: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, long, long)+0x8e) [0x81705e] 3: (ThreadPool::worker(ThreadPool::WorkThread*)+0x471) [0x8b6ae1] 4: (ThreadPool::WorkThread::entry()+0x10) [0x8b8b70] 5: (()+0x7e9a) [0x7f8b876b5e9a] 6: (clone()+0x6d) [0x7f8b859a53fd] ['546345']
2014-10-15T00:16:27.453 ERROR:teuthology.run_tasks:Manager failed: radosbench Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 117, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/radosbench.py", line 92, in task run.wait(radosbench.itervalues(), timeout=timeout) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 381, in wait check_time() File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 127, in __call__ raise MaxWhileTries(error_msg) MaxWhileTries: reached maximum tries (1500) after waiting for 9000 seconds
archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-13_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi/546345 branch: giant description: upgrade:dumpling-firefly-x:stress-split/{00-cluster/start.yaml 01-dumpling-install/dumpling.yaml 02-partial-upgrade-firefly/firsthalf.yaml 03-thrash/default.yaml 04-mona-upgrade-firefly/mona.yaml 05-workload/rbd-cls.yaml 06-monb-upgrade-firefly/monb.yaml 07-workload/radosbench.yaml 08-monc-upgrade-firefly/monc.yaml 09-workload/{rbd-python.yaml rgw-s3tests.yaml} 10-osds-upgrade-firefly/secondhalf.yaml 11-workload/snaps-few-objects.yaml 12-partial-upgrade-x/first.yaml 13-thrash/default.yaml 14-mona-upgrade-x/mona.yaml 15-workload/rbd-import-export.yaml 16-monb-upgrade-x/monb.yaml 17-workload/readwrite.yaml 18-monc-upgrade-x/monc.yaml 19-workload/radosbench.yaml 20-osds-upgrade-x/osds_secondhalf.yaml 21-final-workload/rados_stress_watch.yaml distros/ubuntu_12.04.yaml} email: ceph-qa@ceph.com job_id: '546345' kernel: &id001 kdb: true sha1: distro last_in_suite: false machine_type: plana,burnupi,mira name: teuthology-2014-10-13_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi nuke-on-error: true os_type: ubuntu os_version: '12.04' overrides: admin_socket: branch: giant ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - wrongly marked me down - objects unfound and apparently lost - log bound mismatch - wrongly marked me down - objects unfound and apparently lost - log bound mismatch sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec ceph-deploy: branch: dev: giant conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec s3tests: branch: giant workunit: sha1: 674781960b8856ae684520c3b0e9a6b8c2bc7bec owner: scheduled_teuthology@teuthology priority: 1000 roles: - - mon.a - mon.b - mds.a - osd.0 - osd.1 - osd.2 - mon.c - - osd.3 - osd.4 - osd.5 - - client.0 suite: upgrade:dumpling-firefly-x:stress-split suite_branch: giant suite_path: /var/lib/teuthworker/src/ceph-qa-suite_giant targets: ubuntu@mira076.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCnEuRqBgd2DrRVNhCSPfldQUUJ4HeQKbPVLUtbC8wNRrv2Nk9ZuVUn5cb1LBJQreJM/p17q4fIO8bZyApZ6RZu+Q9pW70WIE3U+Z6xtINgi9xq6/mqnMauuqkDYiePhR9CDCbVVfBp/zVDOJVeCdV9TG5AZ0Xt2YciQkaVmmvxdRr4v5zhdw6vDumnfZsI5K+J0p2hII8e2HUrUkMTVKO0mu1rXzIqGQFOSArPTfCLAOgQfUG5s/e6QMC4NI+BOy2cVp/8yCzKv6FPDDvdEknmLh9tQ9HbS8SyOGPtdj9wfoIKo7UbOnJiDSu2KOliyljEB3YUTrzNClM7W/pWpobV ubuntu@plana15.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDA0GHi/HAXxKVnAdyh6NHBoqEq2Qk7z6Hb3SFt5+mljUWThTAkVPTf4QdpSshH/D+5v4VJHXp7lHYhZZJCS50z3w+af8cmREqwUgnA0zEjKKaXaVIdkAfDkh7LH3vllIGah3PlMPKF6njfvuocJ1pr1QneCLTmbHVCYsdWTGgRW7te1fn7vhXDJbGZMumHL5k/HO7iRDaw9cNuozWuqI5/d8UwdvQ/rhhbSKNef3w2hh2C4CU/nCkOGXFVyJZdo2pSJ2k/jBcPWSh+V3qNtIpthDqzTDmmpD8BFdW9MXxO5pfFRDsInWdgTsxZOrWtPuQy9+an20KbU2N5F4JoQX6N ubuntu@plana78.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8m+86JHGSyRkSWj9p/K6JUbRcPjB7TtLZ9OBudXAGZNgReiOJoCU5kkpwejl0uXXCOHe/DB/bH81JCQbqY3XCJjU5JZ1wBsL/owaErPSfbbaouNV2k1FQjiSXYtPzx+qwEOeOZtEBPQ4p04npai6NzPLX43OGx/UiAwpyEGfVxZedmci0VBtC7QdCQkP3sNJqSxFYdoVGjU5jv6BarPqV8LM4v00f8TmD1GdP51bfLGSKii6UU1IKXXR78ifb+9QUX4p/Clkl6Qgz8CJ70Iu+mcBZclJaGoAyuoKBhXE2oi2W1cQVquPqloxbN+VbbjoOL5OHbGg2euxyohZhgJaF tasks: - internal.lock_machines: - 3 - plana,burnupi,mira - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.push_inventory: null - internal.serialize_remote_roles: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: dumpling - ceph: fs: xfs - install.upgrade: osd.0: branch: firefly - ceph.restart: daemons: - osd.0 - osd.1 - osd.2 - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 thrash_primary_affinity: false timeout: 1200 - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - cls/test_cls_rbd.sh - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - radosbench: clients: - client.0 time: 1800 - install.upgrade: mon.c: null - ceph.restart: daemons: - mon.c wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: - a - b - c - workunit: clients: client.0: - rbd/test_librbd_python.sh - rgw: client.0: null default_idle_timeout: 300 - s3tests: client.0: rgw_server: client.0 - install.upgrade: osd.3: branch: firefly - ceph.restart: daemons: - osd.3 - osd.4 - osd.5 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 - install.upgrade: osd.0: null - ceph.restart: daemons: - osd.0 - osd.1 - osd.2 - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 thrash_primary_affinity: false timeout: 1200 - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - workunit: clients: client.0: - rbd/import_export.sh env: RBD_CREATE_ARGS: --new-format - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - rados: clients: - client.0 objects: 500 op_weights: delete: 10 read: 45 write: 45 ops: 4000 - ceph.restart: daemons: - mon.c wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: - a - b - c - radosbench: clients: - client.0 time: 1800 - install.upgrade: osd.3: null - ceph.restart: daemons: - osd.3 - osd.4 - osd.5 - workunit: clients: client.0: - rados/stress_watch.sh teuthology_branch: master tube: multi verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.3124
description: upgrade:dumpling-firefly-x:stress-split/{00-cluster/start.yaml 01-dumpling-install/dumpling.yaml 02-partial-upgrade-firefly/firsthalf.yaml 03-thrash/default.yaml 04-mona-upgrade-firefly/mona.yaml 05-workload/rbd-cls.yaml 06-monb-upgrade-firefly/monb.yaml 07-workload/radosbench.yaml 08-monc-upgrade-firefly/monc.yaml 09-workload/{rbd-python.yaml rgw-s3tests.yaml} 10-osds-upgrade-firefly/secondhalf.yaml 11-workload/snaps-few-objects.yaml 12-partial-upgrade-x/first.yaml 13-thrash/default.yaml 14-mona-upgrade-x/mona.yaml 15-workload/rbd-import-export.yaml 16-monb-upgrade-x/monb.yaml 17-workload/readwrite.yaml 18-monc-upgrade-x/monc.yaml 19-workload/radosbench.yaml 20-osds-upgrade-x/osds_secondhalf.yaml 21-final-workload/rados_stress_watch.yaml distros/ubuntu_12.04.yaml} duration: 20570.046264886856 failure_reason: 'Command failed on plana78 with status 124: ''mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=674781960b8856ae684520c3b0e9a6b8c2bc7bec TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.0/rbd/test_librbd_python.sh''' flavor: basic owner: scheduled_teuthology@teuthology success: false
Updated by Yuri Weinstein over 9 years ago
suite:upgrade:dumpling
run: http://pulpito.front.sepia.ceph.com/teuthology-2014-10-14_17:00:01-upgrade:dumpling-dumpling-distro-basic-vps/
Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.67.11-22-gddc8a82 (ddc8a827d1baabc0bcb1df9ded37edc9820d8cac) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x107) [0x816bb7] 2: (ceph::HeartbeatMap::is_healthy()+0xa7) [0x817567] 3: (ceph::HeartbeatMap::check_touch_file()+0x23) [0x817b13] 4: (CephContextServiceThread::entry()+0x55) [0x8d3e65] 5: (()+0x7e9a) [0x7fa3343cde9a] 6: (clone()+0x6d) [0x7fa3326d631d] ['548219']
Updated by Samuel Just over 9 years ago
- Status changed from New to Rejected
Two osds, both on mira076 timed out:
osd5: a stat in the op_tp took 3 minutes (completed, surprisingly, right before the suicide)
2014-10-14 19:12:42.233398 - 2014-10-14 19:15:17.213734
osd3: flush took 3 minutes (also, wierdly, completed right before the suicide)
2014-10-14 19:12:33.548599 - 2014-10-14 19:15:17.352885
I think it's safe to blame this on something environmental.
Updated by Yuri Weinstein over 9 years ago
- Status changed from Rejected to New
suite:upgrade:firefly-x
next
Same issues on two jobs ['584644', '584647']:
Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.80.7-82-g1a9d000 (1a9d000bb679a7392b9dd115373c3827c9626694) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x12b) [0xa116ab] 2: (ceph::HeartbeatMap::is_healthy()+0xa7) [0xa11fd7] 3: (OSD::handle_osd_ping(MOSDPing*)+0x7e8) [0x63c568] 4: (OSD::heartbeat_dispatch(Message*)+0x563) [0x64e353] 5: (DispatchQueue::entry()+0x5a2) [0xb64212] 6: (DispatchQueue::DispatchThread::entry()+0xd) [0xb2fb2d] 7: (()+0x79d1) [0x7f84148059d1] 8: (clone()+0x6d) [0x7f841379586d] ['584644', '584647']
Updated by Yuri Weinstein over 9 years ago
Also seeing in run http://pulpito.front.sepia.ceph.com/teuthology-2014-11-04_19:00:01-rados-dumpling-distro-basic-multi/
Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.67.11-28-gea73bf5 (ea73bf5b6f8d9f2ec04bd2eb9809b62011fd66e0) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x107) [0x816bb7] 2: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, long, long)+0x8e) [0x81705e] 3: (ThreadPool::worker(ThreadPool::WorkThread*)+0x8bc) [0x8b6f2c] 4: (ThreadPool::WorkThread::entry()+0x10) [0x8b8b70] 5: (()+0x7e9a) [0x7fa23dd06e9a] 6: (clone()+0x6d) [0x7fa23bff53fd] ['586835']
Updated by Samuel Just over 9 years ago
584644 and 584647 both stuck in sync, probably environmental.
Updated by Samuel Just over 9 years ago
- Status changed from New to Rejected
2014-11-05 09:29:31.507827 7fa236d5b700 10 filestore(/var/lib/ceph/osd/ceph-3) sync_entry commit took 150.696754, interval was 154.277379
2014-11-05 09:29:31.507842 7fa236d5b700 10 journal commit_finish thru 6538
2014-11-05 09:29:31.507844 7fa236d5b700 5 journal committed_thru 6538 (last_committed_seq 6505)
2014-11-05 09:29:31.507848 7fa236d5b700 10 journal header: block_size 4096 alignment 4096 max_size 104857600
2014-11-05 09:29:31.507850 7fa236d5b700 10 journal header: start 79454208
2014-11-05 09:29:31.507852 7fa236d5b700 10 journal write_pos 101711872
2014-11-05 09:29:31.507853 7fa236d5b700 10 journal committed_thru done
2014-11-05 09:29:31.507901 7fa236d5b700 15 filestore(/var/lib/ceph/osd/ceph-3) sync_entry committed to op_seq 6538
2014-11-05 09:29:31.507904 7fa232d53700 10 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard 6553.0.3 done
2014-11-05 09:29:31.507915 7fa236d5b700 20 filestore(/var/lib/ceph/osd/ceph-3) sync_entry waiting for max_interval 5.000000
Another sync took >2min -- long enough for suicide timeout on things waiting for filestore.
Updated by Yuri Weinstein over 9 years ago
- Status changed from Rejected to New
Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.87-27-gccfd241 (ccfd2414c68afda55bf4cefa2441ea6d53d87cc6) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb8249b] 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2a9) [0xac0f99] 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xac1826] 4: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xac1f07] 5: (CephContextServiceThread::entry()+0x154) [0xb969e4] 6: (()+0x8182) [0x7fd903a1b182] 7: (clone()+0x6d) [0x7fd901f85fbd] ['600134']
Updated by Samuel Just over 9 years ago
- Status changed from New to Rejected
I think this one is the giant messenger deadlock, #9921, updated 9921, closing this ticket again.
Updated by Yuri Weinstein over 9 years ago
- Subject changed from "Assertion: common/HeartbeatMap.cc: 79" in upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi run to "Assertion: common/HeartbeatMap.cc: 79" placeholder for "hit suicide timeout" issues
- Status changed from Rejected to New
Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.88-230-g9ba17a3 (9ba17a321db06d3d76c9295e411c76842194b25c) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x307) [0xa9a5d7] 2: (ceph::HeartbeatMap::is_healthy()+0xbf) [0xa9ae3f] 3: (OSD::handle_osd_ping(MOSDPing*)+0x751) [0x66e9b1] 4: (OSD::heartbeat_dispatch(Message*)+0x42b) [0x66fb5b] 5: (DispatchQueue::entry()+0x4fa) [0xae387a] 6: (DispatchQueue::DispatchThread::entry()+0xd) [0xacf62d] 7: (()+0x79d1) [0x7f86cbc3b9d1] 8: (clone()+0x6d) [0x7f86ca9c3b6d] ['619498']
Updated by Yuri Weinstein over 9 years ago
One more in run http://pulpito.ceph.com/teuthology-2014-12-01_18:18:01-upgrade:firefly-x-giant-distro-basic-vps/
Assertion: common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.87-40-g65f6814 (65f6814847fe8644f5d77a9021fbf13043b76dbe) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x307) [0xa58007] 2: (ceph::HeartbeatMap::is_healthy()+0xbf) [0xa5886f] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xa58e58] 4: (CephContextServiceThread::entry()+0x136) [0xaa5a76] 5: (()+0x79d1) [0x7fa6fd07b9d1] 6: (clone()+0x6d) [0x7fa6fc00b86d]
Updated by Yuri Weinstein about 9 years ago
On vps again
Run: http://pulpito.ceph.com/teuthology-2015-01-28_17:05:01-upgrade:giant-x-next-distro-basic-vps/
Job: 728083
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-28_17:05:01-upgrade:giant-x-next-distro-basic-vps/728083/
2015-01-28T22:24:44.499 INFO:tasks.rados.rados.0.vpm101.stdout:update_object_version oid 492 v 288 (ObjNum 1343 snap 0 seq_num 1343) dirty exists 2015-01-28T22:24:44.500 INFO:tasks.rados.rados.0.vpm101.stdout:2050: expect (ObjNum 1106 snap 0 seq_num 1106) 2015-01-28T22:24:45.555 INFO:tasks.ceph.osd.4.vpm011.stderr:common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fbcbd374700 time 2015-01-29 01:24:38.219770 2015-01-28T22:24:45.555 INFO:tasks.ceph.osd.4.vpm011.stderr:common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") 2015-01-28T22:24:46.936 INFO:tasks.rados.rados.0.vpm101.stdout:2047: done (8 left) 2015-01-28T22:24:46.937 INFO:tasks.rados.rados.0.vpm101.stdout:2049: done (7 left) 2015-01-28T22:24:46.937 INFO:tasks.rados.rados.0.vpm101.stdout:2050: done (6 left) 2015-01-28T22:24:46.937 INFO:tasks.rados.rados.0.vpm101.stdout:2051: done (5 left) 2015-01-28T22:24:46.937 INFO:tasks.rados.rados.0.vpm101.stdout:2056: read oid 108 snap -1 2015-01-28T22:24:46.937 INFO:tasks.rados.rados.0.vpm101.stdout:2057: write oid 166 current snap is 0 2015-01-28T22:24:46.937 INFO:tasks.rados.rados.0.vpm101.stdout:2057: seq_num 1346 ranges {634782=550181,1734880=618064,2634781=1} 2015-01-28T22:24:46.965 INFO:tasks.rados.rados.0.vpm101.stdout:2057: writing vpm1015289-166 from 634782 to 1184963 tid 1 2015-01-28T22:24:46.983 INFO:tasks.ceph.osd.4.vpm011.stderr: ceph version 0.91-388-g5064787 (50647876971a2fe65a02e4de3c0bc62fec4887c4) 2015-01-28T22:24:46.983 INFO:tasks.ceph.osd.4.vpm011.stderr: 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x307) [0xbefd97] 2015-01-28T22:24:46.983 INFO:tasks.ceph.osd.4.vpm011.stderr: 2: (ceph::HeartbeatMap::is_healthy()+0xbf) [0xbf05ff] 2015-01-28T22:24:46.984 INFO:tasks.ceph.osd.4.vpm011.stderr: 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xbf0be8] 2015-01-28T22:24:46.984 INFO:tasks.ceph.osd.4.vpm011.stderr: 4: (CephContextServiceThread::entry()+0x136) [0xa3ff06] 2015-01-28T22:24:46.984 INFO:tasks.ceph.osd.4.vpm011.stderr: 5: (()+0x79d1) [0x7fbcc1e9a9d1] 2015-01-28T22:24:46.984 INFO:tasks.ceph.osd.4.vpm011.stderr: 6: (clone()+0x6d) [0x7fbcc0c22b6d] 2015-01-28T22:24:46.984 INFO:tasks.ceph.osd.4.vpm011.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2015-01-28T22:24:46.984 INFO:tasks.ceph.osd.4.vpm011.stderr:2015-01-29 01:24:46.975908 7fbcbd374700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fbcbd374700 time 2015-01-29 01:24:38.219770 2015-01-28T22:24:46.984 INFO:tasks.ceph.osd.4.vpm011.stderr:common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") 2015-01-28T22:24:46.984 INFO:tasks.ceph.osd.4.vpm011.stderr: 2015-01-28T22:24:46.985 INFO:tasks.ceph.osd.4.vpm011.stderr: ceph version 0.91-388-g5064787 (50647876971a2fe65a02e4de3c0bc62fec4887c4) 2015-01-28T22:24:46.985 INFO:tasks.ceph.osd.4.vpm011.stderr: 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x307) [0xbefd97] 2015-01-28T22:24:46.985 INFO:tasks.ceph.osd.4.vpm011.stderr: 2: (ceph::HeartbeatMap::is_healthy()+0xbf) [0xbf05ff] 2015-01-28T22:24:46.985 INFO:tasks.ceph.osd.4.vpm011.stderr: 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xbf0be8] 2015-01-28T22:24:46.985 INFO:tasks.ceph.osd.4.vpm011.stderr: 4: (CephContextServiceThread::entry()+0x136) [0xa3ff06] 2015-01-28T22:24:46.985 INFO:tasks.ceph.osd.4.vpm011.stderr: 5: (()+0x79d1) [0x7fbcc1e9a9d1] 2015-01-28T22:24:46.985 INFO:tasks.ceph.osd.4.vpm011.stderr: 6: (clone()+0x6d) [0x7fbcc0c22b6d] 2015-01-28T22:24:46.985 INFO:tasks.ceph.osd.4.vpm011.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2015-01-28T22:24:46.986 INFO:tasks.ceph.osd.4.vpm011.stderr: 2015-01-28T22:24:46.993 INFO:tasks.rados.rados.0.vpm101.stdout:2057: writing vpm1015289-166 from 1734880 to 2352944 tid 2 2015-01-28T22:24:46.998 INFO:tasks.rados.rados.0.vpm101.stdout:2057: writing vpm1015289-166 from 2634781 to 2634782 tid 3 2015-01-28T22:24:46.999 INFO:tasks.rados.rados.0.vpm101.stdout:2053: expect (ObjNum 1301 snap 0 seq_num 1301) 2015-01-28T22:24:47.097 INFO:tasks.ceph.osd.4.vpm011.stderr: 0> 2015-01-29 01:24:46.975908 7fbcbd374700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fbcbd374700 time 2015-01-29 01:24:38.219770 2015-01-28T22:24:47.097 INFO:tasks.ceph.osd.4.vpm011.stderr:common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") 2015-01-28T22:24:47.097 INFO:tasks.ceph.osd.4.vpm011.stderr: 2015-01-28T22:24:47.097 INFO:tasks.ceph.osd.4.vpm011.stderr: ceph version 0.91-388-g5064787 (50647876971a2fe65a02e4de3c0bc62fec4887c4) 2015-01-28T22:24:47.097 INFO:tasks.ceph.osd.4.vpm011.stderr: 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x307) [0xbefd97] 2015-01-28T22:24:47.097 INFO:tasks.ceph.osd.4.vpm011.stderr: 2: (ceph::HeartbeatMap::is_healthy()+0xbf) [0xbf05ff] 2015-01-28T22:24:47.097 INFO:tasks.ceph.osd.4.vpm011.stderr: 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xbf0be8] 2015-01-28T22:24:47.097 INFO:tasks.ceph.osd.4.vpm011.stderr: 4: (CephContextServiceThread::entry()+0x136) [0xa3ff06] 2015-01-28T22:24:47.098 INFO:tasks.ceph.osd.4.vpm011.stderr: 5: (()+0x79d1) [0x7fbcc1e9a9d1] 2015-01-28T22:24:47.098 INFO:tasks.ceph.osd.4.vpm011.stderr: 6: (clone()+0x6d) [0x7fbcc0c22b6d] 2015-01-28T22:24:47.098 INFO:tasks.ceph.osd.4.vpm011.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2015-01-28T22:24:47.098 INFO:tasks.ceph.osd.4.vpm011.stderr: 2015-01-28T22:24:47.622 INFO:tasks.ceph.osd.4.vpm011.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion' 2015-01-28T22:24:47.959 INFO:tasks.ceph.osd.4.vpm011.stderr:*** Caught signal (Aborted) ** 2015-01-28T22:24:47.959 INFO:tasks.ceph.osd.4.vpm011.stderr: in thread 7fbcbd374700 2015-01-28T22:24:49.058 INFO:tasks.rados.rados.0.vpm101.stdout:2053: done (6 left) 2015-01-28T22:24:49.058 INFO:tasks.rados.rados.0.vpm101.stdout:2058: delete oid 478 current snap is 0 2015-01-28T22:24:49.058 INFO:tasks.rados.rados.0.vpm101.stdout:2052: expect (ObjNum 847 snap 0 seq_num 847) 2015-01-28T22:24:49.358 INFO:tasks.ceph.osd.4.vpm011.stderr: ceph version 0.91-388-g5064787 (50647876971a2fe65a02e4de3c0bc62fec4887c4) 2015-01-28T22:24:49.358 INFO:tasks.ceph.osd.4.vpm011.stderr: 1: ceph-osd() [0xa39a55] 2015-01-28T22:24:49.359 INFO:tasks.ceph.osd.4.vpm011.stderr: 2: (()+0xf710) [0x7fbcc1ea2710] 2015-01-28T22:24:49.359 INFO:tasks.ceph.osd.4.vpm011.stderr: 3: (gsignal()+0x35) [0x7fbcc0b6c925] 2015-01-28T22:24:49.359 INFO:tasks.ceph.osd.4.vpm011.stderr: 4: (abort()+0x175) [0x7fbcc0b6e105] 2015-01-28T22:24:49.359 INFO:tasks.ceph.osd.4.vpm011.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fbcc1426a5d] 2015-01-28T22:24:49.360 INFO:tasks.ceph.osd.4.vpm011.stderr: 6: (()+0xbcbe6) [0x7fbcc1424be6] 2015-01-28T22:24:49.360 INFO:tasks.ceph.osd.4.vpm011.stderr: 7: (()+0xbcc13) [0x7fbcc1424c13] 2015-01-28T22:24:49.360 INFO:tasks.ceph.osd.4.vpm011.stderr: 8: (()+0xbcd0e) [0x7fbcc1424d0e] 2015-01-28T22:24:49.360 INFO:tasks.ceph.osd.4.vpm011.stderr: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x57a) [0xb270fa] 2015-01-28T22:24:49.361 INFO:tasks.ceph.osd.4.vpm011.stderr: 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x307) [0xbefd97] 2015-01-28T22:24:49.361 INFO:tasks.ceph.osd.4.vpm011.stderr: 11: (ceph::HeartbeatMap::is_healthy()+0xbf) [0xbf05ff] 2015-01-28T22:24:49.361 INFO:tasks.ceph.osd.4.vpm011.stderr: 12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xbf0be8] 2015-01-28T22:24:49.361 INFO:tasks.ceph.osd.4.vpm011.stderr: 13: (CephContextServiceThread::entry()+0x136) [0xa3ff06] 2015-01-28T22:24:49.362 INFO:tasks.ceph.osd.4.vpm011.stderr: 14: (()+0x79d1) [0x7fbcc1e9a9d1] 2015-01-28T22:24:49.362 INFO:tasks.ceph.osd.4.vpm011.stderr: 15: (clone()+0x6d) [0x7fbcc0c22b6d] 2015-01-28T22:24:49.676 INFO:tasks.ceph.osd.4.vpm011.stderr:2015-01-29 01:24:49.355266 7fbcbd374700 -1 *** Caught signal (Aborted) ** 2015-01-28T22:24:49.676 INFO:tasks.ceph.osd.4.vpm011.stderr: in thread 7fbcbd374700 2015-01-28T22:24:49.676 INFO:tasks.ceph.osd.4.vpm011.stderr: 2015-01-28T22:24:49.676 INFO:tasks.ceph.osd.4.vpm011.stderr: ceph version 0.91-388-g5064787 (50647876971a2fe65a02e4de3c0bc62fec4887c4) 2015-01-28T22:24:49.676 INFO:tasks.ceph.osd.4.vpm011.stderr: 1: ceph-osd() [0xa39a55] 2015-01-28T22:24:49.676 INFO:tasks.ceph.osd.4.vpm011.stderr: 2: (()+0xf710) [0x7fbcc1ea2710] 2015-01-28T22:24:49.677 INFO:tasks.ceph.osd.4.vpm011.stderr: 3: (gsignal()+0x35) [0x7fbcc0b6c925] 2015-01-28T22:24:49.677 INFO:tasks.ceph.osd.4.vpm011.stderr: 4: (abort()+0x175) [0x7fbcc0b6e105] 2015-01-28T22:24:49.677 INFO:tasks.ceph.osd.4.vpm011.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fbcc1426a5d] 2015-01-28T22:24:49.677 INFO:tasks.ceph.osd.4.vpm011.stderr: 6: (()+0xbcbe6) [0x7fbcc1424be6] 2015-01-28T22:24:49.677 INFO:tasks.ceph.osd.4.vpm011.stderr: 7: (()+0xbcc13) [0x7fbcc1424c13] 2015-01-28T22:24:49.677 INFO:tasks.ceph.osd.4.vpm011.stderr: 8: (()+0xbcd0e) [0x7fbcc1424d0e] 2015-01-28T22:24:49.677 INFO:tasks.ceph.osd.4.vpm011.stderr: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x57a) [0xb270fa] 2015-01-28T22:24:49.677 INFO:tasks.ceph.osd.4.vpm011.stderr: 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x307) [0xbefd97] 2015-01-28T22:24:49.678 INFO:tasks.ceph.osd.4.vpm011.stderr: 11: (ceph::HeartbeatMap::is_healthy()+0xbf) [0xbf05ff] 2015-01-28T22:24:49.678 INFO:tasks.ceph.osd.4.vpm011.stderr: 12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xbf0be8] 2015-01-28T22:24:49.678 INFO:tasks.ceph.osd.4.vpm011.stderr: 13: (CephContextServiceThread::entry()+0x136) [0xa3ff06] 2015-01-28T22:24:49.678 INFO:tasks.ceph.osd.4.vpm011.stderr: 14: (()+0x79d1) [0x7fbcc1e9a9d1] 2015-01-28T22:24:49.678 INFO:tasks.ceph.osd.4.vpm011.stderr: 15: (clone()+0x6d) [0x7fbcc0c22b6d] 2015-01-28T22:24:49.678 INFO:tasks.ceph.osd.4.vpm011.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Sage Weil about 9 years ago
- Status changed from New to Closed
if you ever see this on vps it is generally the vm's fault. let's only reopen this if we see it on bare metal.