Actions
Bug #8725
closedmds crashed in upgrade:dumpling-x:stress-split-master-testing-basic-plana
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
coredump info from /*remote/plana57/log/ceph-mds.a.log.gz :
2014-07-01 12:05:07.682897 7f81a917f700 0 mds.0.cache creating system inode with ino:200 2014-07-01 12:05:22.660113 7f81a917f700 1 mds.0.1 creating_done 2014-07-01 12:05:22.797318 7f81a917f700 1 mds.0.1 handle_mds_map i am now mds.0.1 2014-07-01 12:05:22.797331 7f81a917f700 1 mds.0.1 handle_mds_map state change up:creating --> up:active 2014-07-01 12:05:22.797335 7f81a917f700 1 mds.0.1 active_start 2014-07-01 12:06:36.026362 7f81a566e700 0 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6808/4560 pipe(0x2d6e780 sd=22 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d2c0).fault 2014-07-01 12:06:42.028396 7f81a5870700 0 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6800/4558 pipe(0x2cfac80 sd=22 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d160).fault 2014-07-01 12:06:48.048396 7f81a566e700 0 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6804/4559 pipe(0x2cfaa00 sd=21 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d000).fault 2014-07-01 12:12:39.418941 7f81a917f700 0 monclient: hunting for new mon 2014-07-01 12:12:40.538635 7f81a917f700 -1 *** Caught signal (Aborted) ** in thread 7f81a917f700 ceph version 0.67.9-20-g583e6e3 (583e6e3ef7f28bf34fe038e8a2391f9325a69adf) 1: ceph-mds() [0x98ea4a] 2: (()+0xfcb0) [0x7f81acefccb0] 3: (gsignal()+0x35) [0x7f81ab3d3425] 4: (abort()+0x17b) [0x7f81ab3d6b8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f81abd2669d] 6: (()+0xb5846) [0x7f81abd24846] 7: (()+0xb5873) [0x7f81abd24873] 8: (()+0xb596e) [0x7f81abd2496e] 9: (MDSMap::decode(ceph::buffer::list::iterator&)+0xe53) [0x801f43] 10: (MDS::handle_mds_map(MMDSMap*)+0x5aa) [0x588f8a] 11: (MDS::handle_core_message(Message*)+0x5bb) [0x58d67b] 12: (MDS::_dispatch(Message*)+0x2f) [0x58ddaf] 13: (MDS::ms_dispatch(Message*)+0x1d3) [0x58f843] 14: (DispatchQueue::entry()+0x549) [0x95abe9] 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x87b50d] 16: (()+0x7e9a) [0x7f81acef4e9a] 17: (clone()+0x6d) [0x7f81ab4913fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -1018> 2014-07-01 12:05:07.551553 7f81ad318780 5 asok(0x2cda000) register_command perfcounters_dump hook 0x2ccf010 -1017> 2014-07-01 12:05:07.551587 7f81ad318780 5 asok(0x2cda000) register_command 1 hook 0x2ccf010 -1016> 2014-07-01 12:05:07.551598 7f81ad318780 5 asok(0x2cda000) register_command perf dump hook 0x2ccf010 -1015> 2014-07-01 12:05:07.551605 7f81ad318780 5 asok(0x2cda000) register_command perfcounters_schema hook 0x2ccf010 -1014> 2014-07-01 12:05:07.551608 7f81ad318780 5 asok(0x2cda000) register_command 2 hook 0x2ccf010 -1013> 2014-07-01 12:05:07.551611 7f81ad318780 5 asok(0x2cda000) register_command perf schema hook 0x2ccf010 -1012> 2014-07-01 12:05:07.551614 7f81ad318780 5 asok(0x2cda000) register_command config show hook 0x2ccf010 -- -30> 2014-07-01 12:12:37.436373 7f81a5870700 2 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6800/4558 pipe(0x2cfac80 sd=18 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d160).connect error 10.214.132.21:6800/4558, 111: Connection refused -29> 2014-07-01 12:12:37.436424 7f81a5870700 2 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6800/4558 pipe(0x2cfac80 sd=18 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d160).fault 111: Connection refused -28> 2014-07-01 12:12:37.572533 7f81a707a700 5 mds.0.1 is_laggy 18.893766 > 15 since last acked beacon -27> 2014-07-01 12:12:37.572557 7f81a707a700 5 mds.0.1 tick bailing out since we seem laggy -26> 2014-07-01 12:12:38.679548 7f81a707a700 10 monclient: _send_mon_message to mon.c at 10.214.131.19:6789/0 -25> 2014-07-01 12:12:38.679574 7f81a707a700 1 -- 10.214.132.21:6812/4830 --> 10.214.131.19:6789/0 -- mdsbeacon(4107/a up:active seq 114 v5) v2 -- ?+0 0x2d4a340 con 0x2cf3580 -24> 2014-07-01 12:12:39.418625 7f81a717b700 2 -- 10.214.132.21:6812/4830 >> 10.214.131.19:6789/0 pipe(0x2cfa500 sd=8 :36066 s=2 pgs=8 cs=1 l=1 c=0x2cf3580).reader couldn't read tag, Success -23> 2014-07-01 12:12:39.418683 7f81a717b700 2 -- 10.214.132.21:6812/4830 >> 10.214.131.19:6789/0 pipe(0x2cfa500 sd=8 :36066 s=2 pgs=8 cs=1 l=1 c=0x2cf3580).fault 0: Success -22> 2014-07-01 12:12:39.418920 7f81a917f700 10 monclient: ms_handle_reset current mon 10.214.131.19:6789/0 -21> 2014-07-01 12:12:39.418941 7f81a917f700 0 monclient: hunting for new mon -20> 2014-07-01 12:12:39.418944 7f81a917f700 10 monclient: _reopen_session rank -1 name -19> 2014-07-01 12:12:39.418950 7f81a917f700 1 -- 10.214.132.21:6812/4830 mark_down 0x2cf3580 -- pipe dne -18> 2014-07-01 12:12:39.419013 7f81a917f700 10 monclient: picked mon.b con 0x2d3d580 addr 10.214.132.21:6790/0 -17> 2014-07-01 12:12:39.419039 7f81a917f700 10 monclient(hunting): _send_mon_message to mon.b at 10.214.132.21:6790/0 -16> 2014-07-01 12:12:39.419047 7f81a917f700 1 -- 10.214.132.21:6812/4830 --> 10.214.132.21:6790/0 -- auth(proto 0 26 bytes epoch 1) v1 -- ?+0 0x2d646c0 con 0x2d3d580 -15> 2014-07-01 12:12:39.419063 7f81a917f700 10 monclient(hunting): renew_subs -14> 2014-07-01 12:12:39.419068 7f81a917f700 5 mds.0.1 ms_handle_reset on 10.214.131.19:6789/0 -13> 2014-07-01 12:12:39.420144 7f81a917f700 5 mds.0.1 ms_handle_connect on 10.214.132.21:6790/0 -12> 2014-07-01 12:12:39.421258 7f81a917f700 1 -- 10.214.132.21:6812/4830 <== mon.2 10.214.132.21:6790/0 1 ==== auth_reply(proto 2 0 Success) v1 ==== 33+0+0 (2748226891 0 0) 0x2cdf600 con 0x2d3d580 -11> 2014-07-01 12:12:39.421363 7f81a917f700 10 monclient(hunting): _send_mon_message to mon.b at 10.214.132.21:6790/0 -10> 2014-07-01 12:12:39.421372 7f81a917f700 1 -- 10.214.132.21:6812/4830 --> 10.214.132.21:6790/0 -- auth(proto 2 128 bytes epoch 0) v1 -- ?+0 0x2d0c000 con 0x2d3d580 -9> 2014-07-01 12:12:39.422600 7f81a917f700 1 -- 10.214.132.21:6812/4830 <== mon.2 10.214.132.21:6790/0 2 ==== auth_reply(proto 2 0 Success) v1 ==== 225+0+0 (310302456 0 0) 0x2d5b400 con 0x2d3d580 -8> 2014-07-01 12:12:39.422676 7f81a917f700 1 monclient(hunting): found mon.b -7> 2014-07-01 12:12:39.422688 7f81a917f700 10 monclient: _send_mon_message to mon.b at 10.214.132.21:6790/0 -6> 2014-07-01 12:12:39.422703 7f81a917f700 1 -- 10.214.132.21:6812/4830 --> 10.214.132.21:6790/0 -- mon_subscribe({mdsmap=6+,monmap=2+}) v2 -- ?+0 0x2d51000 con 0x2d3d580 -5> 2014-07-01 12:12:39.422738 7f81a917f700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2014-07-01 12:12:09.422737) -4> 2014-07-01 12:12:39.423907 7f81a917f700 1 -- 10.214.132.21:6812/4830 <== mon.2 10.214.132.21:6790/0 3 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (131601739 0 0) 0x2cfbe00 con 0x2d3d580 -3> 2014-07-01 12:12:39.423940 7f81a917f700 10 monclient: handle_subscribe_ack sent 2014-07-01 12:12:39.419065 renew after 2014-07-01 12:15:09.419065 -2> 2014-07-01 12:12:40.535756 7f81a917f700 1 -- 10.214.132.21:6812/4830 <== mon.2 10.214.132.21:6790/0 4 ==== mdsmap(e 6) v1 ==== 598+0+0 (1390759494 0 0) 0x2d5ba00 con 0x2d3d580 -1> 2014-07-01 12:12:40.535799 7f81a917f700 5 mds.0.1 handle_mds_map epoch 6 from mon.2 0> 2014-07-01 12:12:40.538635 7f81a917f700 -1 *** Caught signal (Aborted) ** in thread 7f81a917f700 ceph version 0.67.9-20-g583e6e3 (583e6e3ef7f28bf34fe038e8a2391f9325a69adf) 1: ceph-mds() [0x98ea4a] 2: (()+0xfcb0) [0x7f81acefccb0] 3: (gsignal()+0x35) [0x7f81ab3d3425] 4: (abort()+0x17b) [0x7f81ab3d6b8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f81abd2669d] 6: (()+0xb5846) [0x7f81abd24846] 7: (()+0xb5873) [0x7f81abd24873] 8: (()+0xb596e) [0x7f81abd2496e] 9: (MDSMap::decode(ceph::buffer::list::iterator&)+0xe53) [0x801f43] 10: (MDS::handle_mds_map(MMDSMap*)+0x5aa) [0x588f8a] 11: (MDS::handle_core_message(Message*)+0x5bb) [0x58d67b] 12: (MDS::_dispatch(Message*)+0x2f) [0x58ddaf] 13: (MDS::ms_dispatch(Message*)+0x1d3) [0x58f843] 14: (DispatchQueue::entry()+0x549) [0x95abe9] 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x87b50d] 16: (()+0x7e9a) [0x7f81acef4e9a] 17: (clone()+0x6d) [0x7f81ab4913fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-07-01T13:44:23.549 INFO:teuthology.orchestra.run.plana57.stderr:dumped all in format json 2014-07-01T13:44:23.659 INFO:teuthology.misc:Shutting down mds daemons... 2014-07-01T13:44:23.659 ERROR:teuthology.misc:Saw exception from mds.a Traceback (most recent call last): File "/home/teuthworker/teuthology-master/teuthology/misc.py", line 1093, in stop_daemons_of_type daemon.stop() File "/home/teuthworker/teuthology-master/teuthology/task/ceph.py", line 61, in stop run.wait([self.proc], timeout=timeout) File "/home/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 424, in wait proc.wait() File "/home/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 102, in wait exitstatus=status, node=self.hostname) CommandFailedError: Command failed on plana57 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f -i a' 2014-07-01T13:44:23.684 INFO:teuthology.misc:Shutting down osd daemons...
archive_path: /var/lib/teuthworker/archive/ubuntu-2014-07-01_11:38:37-upgrade:dumpling-x:stress-split-master-testing-basic-plana/337407 branch: master description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-cls.yaml 6-next-mon/monb.yaml 7-workload/rados_api_tests.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml} email: null job_id: '337407' kernel: &id001 kdb: true sha1: 8362a1290d075f376ba68521ffb3b42ecaaecfea last_in_suite: false machine_type: plana name: ubuntu-2014-07-01_11:38:37-upgrade:dumpling-x:stress-split-master-testing-basic-plana nuke-on-error: true os_type: ubuntu os_version: '14.04' overrides: admin_socket: branch: master ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - wrongly marked me down - objects unfound and apparently lost - log bound mismatch sha1: 1eca89df3586e07409773ff6797095bfc6ec2dcc ceph-deploy: branch: dev: master conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 1eca89df3586e07409773ff6797095bfc6ec2dcc s3tests: branch: master workunit: sha1: 1eca89df3586e07409773ff6797095bfc6ec2dcc owner: yuriw priority: 10 roles: - - mon.a - mon.b - mds.a - osd.0 - osd.1 - osd.2 - - osd.3 - osd.4 - osd.5 - mon.c - - client.0 suite: upgrade:dumpling-x:stress-split targets: ubuntu@plana21.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVbQbk5qpDD2687Wu8iZt7sHmIQbrLr4Esj4NzOzmkkwvhj0p7GmO825mdpu/YP25mXQkrvuKlfuKHZ9QyxfyiCy051FeuPqhSk0IqYYaTVRslrvQ9uSa+IhqE23LxFhWQt7Kgl9DqG7377qqgEXTqBCj/LMD2ix4ugXYRTVFQIXibvZlTjEsNlcPD61R80ZcWa6Jd1jm4XPtqKlr5Sfe4DfWb/VomgHC/frSdmAQTRwikaMpHOonLAo2Hx6WQ/6TOgeDfXgla7wZzIVD3aHTAXVFzkqVb/V6brLn7hMP2Ok3dpo1nDFRTY7Q3/PTJFKeqVkZZgYv0GMTzFDD4NNKN ubuntu@plana34.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDhl4nIoX/Xy21FdSNkHIKvw+VnRxEXBW+XW4ES9FJSNkAQ3fwmZxHyA71PIjzb/pFLyWKroR/QlcQth76U9Kj3OmU1zdPgtgTHeMY6nXoY+4moEbFPJdkQyJq3oarBc1J2UXl5msnQAsK0k0AjOwLEDcpdAVuzztKry6hIKGiNlGs8Eueo0MFfI710HJZGB6HyDr51NmMfP8SqS6KAonacLyxwd8F71ygT0Y9p4LE1dPPVkS8bJ9eov6qx401O9ZvCVC2wjce9g7p15wHbQroPVRz2gm/GeIeCSTCvmm+08BOxuKS3gSEoUZOJO00BxmMWbJMyregrVNcMt563swgp ubuntu@plana57.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDFHCeGWMPGOLyScKFkduv7aJL9bpMUPZQATO9lxpWu1NtzYndPJtWcyUxgWlItu75SJwpXx/l2GhPYcDKrR1Nl37+dbgs5TeDTbr9YdQBuLPbkbIZMQqO4GqUjurEwLU3vFUZ0X7PTlUqn6qwpT+I2YJua19eF2cRQFIGYVZMzaezm47uh67cdKFh0RTA1pSJ2qM/WMn91boRWcsRQrmn4BeOzfpGfSPDRjrHXHiPx3Br4zcOi/3lOxNFcEeoBrA47PMxvxVIlbmxKDfNjHpQQT18VFWb+qcTAzf+zdBy3iDRFFS45fPrqlWjGn9sK74EbRQanDrZlrFkg2a/HIe5T tasks: - internal.lock_machines: - 3 - plana - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.serialize_remote_roles: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: dumpling - ceph: fs: xfs - install.upgrade: osd.0: null - ceph.restart: daemons: - osd.0 - osd.1 - osd.2 - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 thrash_primary_affinity: false timeout: 1200 - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - cls/test_cls_rbd.sh - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - rados/test-upgrade-firefly.sh - install.upgrade: mon.c: null - ceph.restart: daemons: - mon.c wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: - a - b - c - workunit: branch: dumpling clients: client.0: - rados/test-upgrade-firefly.sh - workunit: branch: dumpling clients: client.0: - rbd/test_librbd_python.sh - rgw: client.0: idle_timeout: 300 - swift: client.0: rgw_server: client.0 - rados: clients: - client.0 objects: 500 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 teuthology_branch: master tube: plana verbose: false worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.12155
client.0-kernel-sha1: 8362a1290d075f376ba68521ffb3b42ecaaecfea description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-cls.yaml 6-next-mon/monb.yaml 7-workload/rados_api_tests.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml} duration: 6957.407259941101 failure_reason: 'Command failed on plana57 with status 1: ''sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f -i a''' flavor: basic mon.a-kernel-sha1: 8362a1290d075f376ba68521ffb3b42ecaaecfea osd.3-kernel-sha1: 8362a1290d075f376ba68521ffb3b42ecaaecfea owner: yuriw success: false
Updated by Sage Weil almost 10 years ago
- Project changed from Ceph to CephFS
- Priority changed from Normal to High
Updated by Loïc Dachary over 9 years ago
Looks like a similar problem at upgrade:firefly-x:stress-split
2014-08-08T10:17:01.182 INFO:tasks.radosbench.radosbench.0.vpm182.stdout: 305 16 84 68 0.729339 0 - 24.8856 2014-08-08T10:17:03.110 INFO:tasks.radosbench.radosbench.0.vpm182.stdout: 306 16 84 68 0.725589 0 - 24.8856 2014-08-08T10:17:03.616 INFO:tasks.thrashosds.ceph_manager:no progress seen, keeping timeout for now 2014-08-08T10:17:03.617 INFO:teuthology.orchestra.run.vpm180:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format=json' 2014-08-08T10:17:04.179 INFO:tasks.radosbench.radosbench.0.vpm182.stdout: 307 16 84 68 0.723525 0 - 24.8856 2014-08-08T10:17:05.179 INFO:tasks.radosbench.radosbench.0.vpm182.stdout: 308 16 84 68 0.721606 0 - 24.8856 2014-08-08T10:17:06.210 INFO:tasks.radosbench.radosbench.0.vpm182.stdout: 309 16 84 68 0.719637 0 - 24.8856 2014-08-08T10:17:06.830 INFO:teuthology.orchestra.run.vpm180.stderr:dumped all in format json 2014-08-08T10:17:07.754 INFO:tasks.radosbench.radosbench.0.vpm182.stdout: 310 16 84 68 0.716708 0 - 24.8856 2014-08-08T10:17:08.777 INFO:tasks.ceph.mon.c.vpm181.stderr: 0> 2014-08-08 17:16:39.477530 7f903999a700 -1 *** Caught signal (Aborted) ** 2014-08-08T10:17:08.778 INFO:tasks.ceph.mon.c.vpm181.stderr: in thread 7f903999a700 2014-08-08T10:17:08.778 INFO:tasks.ceph.mon.c.vpm181.stderr: 2014-08-08T10:17:08.778 INFO:tasks.ceph.mon.c.vpm181.stderr: ceph version 0.80.5-9-gb65cef6 (b65cef678777c1b87d25385595bf0df96168703e) 2014-08-08T10:17:08.778 INFO:tasks.ceph.mon.c.vpm181.stderr: 1: ceph-mon() [0x862b0f] 2014-08-08T10:17:08.779 INFO:tasks.ceph.mon.c.vpm181.stderr: 2: (()+0x10340) [0x7f903f127340] 2014-08-08T10:17:08.779 INFO:tasks.ceph.mon.c.vpm181.stderr: 3: (gsignal()+0x39) [0x7f903d7cef89] 2014-08-08T10:17:08.779 INFO:tasks.ceph.mon.c.vpm181.stderr: 4: (abort()+0x148) [0x7f903d7d2398] 2014-08-08T10:17:08.779 INFO:tasks.ceph.mon.c.vpm181.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f903e0da6b5] 2014-08-08T10:17:08.779 INFO:tasks.ceph.mon.c.vpm181.stderr: 6: (()+0x5e836) [0x7f903e0d8836] 2014-08-08T10:17:08.779 INFO:tasks.ceph.mon.c.vpm181.stderr: 7: (()+0x5e863) [0x7f903e0d8863] 2014-08-08T10:17:08.779 INFO:tasks.ceph.mon.c.vpm181.stderr: 8: (()+0x5eaa2) [0x7f903e0d8aa2] 2014-08-08T10:17:08.779 INFO:tasks.ceph.mon.c.vpm181.stderr: 9: (MDSMap::decode(ceph::buffer::list::iterator&)+0xc7c) [0x6c354c] 2014-08-08T10:17:08.780 INFO:tasks.ceph.mon.c.vpm181.stderr: 10: (MDSMonitor::update_from_paxos(bool*)+0x645) [0x5fbb35] 2014-08-08T10:17:08.780 INFO:tasks.ceph.mon.c.vpm181.stderr: 11: (PaxosService::refresh(bool*)+0x19a) [0x5a2ada] 2014-08-08T10:17:08.780 INFO:tasks.ceph.mon.c.vpm181.stderr: 12: (Monitor::refresh_from_paxos(bool*)+0x6f) [0x54168f] 2014-08-08T10:17:08.780 INFO:tasks.ceph.mon.c.vpm181.stderr: 13: (Paxos::do_refresh()+0x24) [0x590354] 2014-08-08T10:17:08.780 INFO:tasks.ceph.mon.c.vpm181.stderr: 14: (Paxos::handle_commit(MMonPaxos*)+0x1b9) [0x5966f9] 2014-08-08T10:17:08.780 INFO:tasks.ceph.mon.c.vpm181.stderr: 15: (Paxos::dispatch(PaxosServiceMessage*)+0x1cb) [0x59d8ab] 2014-08-08T10:17:08.781 INFO:tasks.ceph.mon.c.vpm181.stderr: 16: (Monitor::dispatch(MonSession*, Message*, bool)+0x5a6) [0x571736] 2014-08-08T10:17:08.781 INFO:tasks.ceph.mon.c.vpm181.stderr: 17: (Monitor::_ms_dispatch(Message*)+0x215) [0x571ad5] 2014-08-08T10:17:08.781 INFO:tasks.ceph.mon.c.vpm181.stderr: 18: (Monitor::ms_dispatch(Message*)+0x20) [0x58f780] 2014-08-08T10:17:08.781 INFO:tasks.ceph.mon.c.vpm181.stderr: 19: (DispatchQueue::entry()+0x57a) [0x830eaa] 2014-08-08T10:17:08.781 INFO:tasks.ceph.mon.c.vpm181.stderr: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x74937d] 2014-08-08T10:17:08.781 INFO:tasks.ceph.mon.c.vpm181.stderr: 21: (()+0x8182) [0x7f903f11f182] 2014-08-08T10:17:08.781 INFO:tasks.ceph.mon.c.vpm181.stderr: 22: (clone()+0x6d) [0x7f903d89338d] 2014-08-08T10:17:08.781 INFO:tasks.ceph.mon.c.vpm181.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2014-08-08T10:17:08.782 INFO:tasks.ceph.mon.c.vpm181.stderr: 2014-08-08T10:17:09.297 INFO:tasks.radosbench.radosbench.0.vpm182.stdout: 311 16 84 68 0.713808 0 - 24.8856
Updated by Loïc Dachary over 9 years ago
And the same trace at upgrade:firefly-x:stress-split
2014-08-08T10:31:55.917 INFO:tasks.ceph.mds.a.vpm130.stderr: ceph version 0.80.5-9-gb65cef6 (b65cef678777c1b87d25385595bf0df96168703e) 2014-08-08T10:31:55.917 INFO:tasks.ceph.mds.a.vpm130.stderr: 1: ceph-mds() [0x7f777f] 2014-08-08T10:31:55.917 INFO:tasks.ceph.mds.a.vpm130.stderr: 2: (()+0x10340) [0x7f40b45c8340] 2014-08-08T10:31:55.917 INFO:tasks.ceph.mds.a.vpm130.stderr: 3: (gsignal()+0x39) [0x7f40b2e73f89] 2014-08-08T10:31:55.917 INFO:tasks.ceph.mds.a.vpm130.stderr: 4: (abort()+0x148) [0x7f40b2e77398] 2014-08-08T10:31:55.917 INFO:tasks.ceph.mds.a.vpm130.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f40b377f6b5] 2014-08-08T10:31:55.917 INFO:tasks.ceph.mds.a.vpm130.stderr: 6: (()+0x5e836) [0x7f40b377d836] 2014-08-08T10:31:55.918 INFO:tasks.ceph.mds.a.vpm130.stderr: 7: (()+0x5e863) [0x7f40b377d863] 2014-08-08T10:31:55.918 INFO:tasks.ceph.mds.a.vpm130.stderr: 8: (()+0x5eaa2) [0x7f40b377daa2] 2014-08-08T10:31:55.918 INFO:tasks.ceph.mds.a.vpm130.stderr: 9: (MDSMap::decode(ceph::buffer::list::iterator&)+0xc7c) [0x83247c] 2014-08-08T10:31:55.918 INFO:tasks.ceph.mds.a.vpm130.stderr: 10: (MDS::handle_mds_map(MMDSMap*)+0x2f2) [0x58aaa2] 2014-08-08T10:31:55.918 INFO:tasks.ceph.mds.a.vpm130.stderr: 11: (MDS::handle_core_message(Message*)+0xb03) [0x58eed3] 2014-08-08T10:31:55.918 INFO:tasks.ceph.mds.a.vpm130.stderr: 12: (MDS::_dispatch(Message*)+0x32) [0x58f0f2] 2014-08-08T10:31:55.918 INFO:tasks.ceph.mds.a.vpm130.stderr: 13: (MDS::ms_dispatch(Message*)+0xa3) [0x590ad3] 2014-08-08T10:31:55.918 INFO:tasks.ceph.mds.a.vpm130.stderr: 14: (DispatchQueue::entry()+0x57a) [0x99d62a] 2014-08-08T10:31:55.919 INFO:tasks.ceph.mds.a.vpm130.stderr: 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x8beb5d] 2014-08-08T10:31:55.919 INFO:tasks.ceph.mds.a.vpm130.stderr: 16: (()+0x8182) [0x7f40b45c0182] 2014-08-08T10:31:55.919 INFO:tasks.ceph.mds.a.vpm130.stderr: 17: (clone()+0x6d) [0x7f40b2f3838d] 2014-08-08T10:31:55.919 INFO:tasks.ceph.mds.a.vpm130.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Loïc Dachary over 9 years ago
Another similar crash
2014-08-08T10:04:38.098 INFO:tasks.rados.rados.0.vpm184.stdout:190: writing vpm1845876-190 from 600791 to 600792 tid 1 2014-08-08T10:04:38.098 INFO:tasks.rados.rados.0.vpm184.stdout: waiting on 16 2014-08-08T10:04:38.428 INFO:tasks.rados.rados.0.vpm184.stdout:186: finishing write tid 1 to vpm1845876-186 2014-08-08T10:04:38.428 INFO:tasks.rados.rados.0.vpm184.stdout:186: finishing write tid 2 to vpm1845876-186 2014-08-08T10:04:38.432 INFO:tasks.rados.rados.0.vpm184.stdout:186: finishing write tid 3 to vpm1845876-186 2014-08-08T10:04:38.433 INFO:tasks.rados.rados.0.vpm184.stdout:186: finishing write tid 5 to vpm1845876-186 2014-08-08T10:04:39.211 INFO:tasks.ceph.mon.c.vpm183.stderr:terminate called after throwing an instance of 'ceph::buffer::malformed_input' 2014-08-08T10:04:39.211 INFO:tasks.ceph.mon.c.vpm183.stderr: what(): buffer::malformed_input: __PRETTY_FUNCTION__ unknown encoding version > 4 2014-08-08T10:04:39.212 INFO:tasks.ceph.mon.c.vpm183.stderr:*** Caught signal (Aborted) ** 2014-08-08T10:04:39.212 INFO:tasks.ceph.mon.c.vpm183.stderr: in thread 7f5aaa140700 2014-08-08T10:04:39.441 INFO:tasks.rados.rados.0.vpm184.stdout:175: finishing write tid 3 to vpm1845876-175 2014-08-08T10:04:39.915 INFO:tasks.ceph.mon.c.vpm183.stderr: ceph version 0.80.5-9-gb65cef6 (b65cef678777c1b87d25385595bf0df96168703e) 2014-08-08T10:04:39.915 INFO:tasks.ceph.mon.c.vpm183.stderr: 1: ceph-mon() [0x862b0f] 2014-08-08T10:04:39.915 INFO:tasks.ceph.mon.c.vpm183.stderr: 2: (()+0x10340) [0x7f5aaf8cd340] 2014-08-08T10:04:39.915 INFO:tasks.ceph.mon.c.vpm183.stderr: 3: (gsignal()+0x39) [0x7f5aadf74f89] 2014-08-08T10:04:39.916 INFO:tasks.ceph.mon.c.vpm183.stderr: 4: (abort()+0x148) [0x7f5aadf78398] 2014-08-08T10:04:39.916 INFO:tasks.ceph.mon.c.vpm183.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f5aae8806b5] 2014-08-08T10:04:39.916 INFO:tasks.ceph.mon.c.vpm183.stderr: 6: (()+0x5e836) [0x7f5aae87e836] 2014-08-08T10:04:39.916 INFO:tasks.ceph.mon.c.vpm183.stderr: 7: (()+0x5e863) [0x7f5aae87e863] 2014-08-08T10:04:39.916 INFO:tasks.ceph.mon.c.vpm183.stderr: 8: (()+0x5eaa2) [0x7f5aae87eaa2] 2014-08-08T10:04:39.916 INFO:tasks.ceph.mon.c.vpm183.stderr: 9: (MDSMap::decode(ceph::buffer::list::iterator&)+0xc7c) [0x6c354c] 2014-08-08T10:04:39.916 INFO:tasks.ceph.mon.c.vpm183.stderr: 10: (MDSMonitor::update_from_paxos(bool*)+0x645) [0x5fbb35] 2014-08-08T10:04:39.916 INFO:tasks.ceph.mon.c.vpm183.stderr: 11: (PaxosService::refresh(bool*)+0x19a) [0x5a2ada] 2014-08-08T10:04:39.917 INFO:tasks.ceph.mon.c.vpm183.stderr: 12: (Monitor::refresh_from_paxos(bool*)+0x6f) [0x54168f] 2014-08-08T10:04:39.917 INFO:tasks.ceph.mon.c.vpm183.stderr: 13: (Paxos::do_refresh()+0x24) [0x590354] 2014-08-08T10:04:39.917 INFO:tasks.ceph.mon.c.vpm183.stderr: 14: (Paxos::handle_commit(MMonPaxos*)+0x1b9) [0x5966f9] 2014-08-08T10:04:39.917 INFO:tasks.ceph.mon.c.vpm183.stderr: 15: (Paxos::dispatch(PaxosServiceMessage*)+0x1cb) [0x59d8ab] 2014-08-08T10:04:39.917 INFO:tasks.ceph.mon.c.vpm183.stderr: 16: (Monitor::dispatch(MonSession*, Message*, bool)+0x5a6) [0x571736] 2014-08-08T10:04:39.917 INFO:tasks.ceph.mon.c.vpm183.stderr: 17: (Monitor::_ms_dispatch(Message*)+0x215) [0x571ad5] 2014-08-08T10:04:39.917 INFO:tasks.ceph.mon.c.vpm183.stderr: 18: (Monitor::ms_dispatch(Message*)+0x20) [0x58f780] 2014-08-08T10:04:39.917 INFO:tasks.ceph.mon.c.vpm183.stderr: 19: (DispatchQueue::entry()+0x57a) [0x830eaa] 2014-08-08T10:04:39.918 INFO:tasks.ceph.mon.c.vpm183.stderr: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x74937d] 2014-08-08T10:04:39.918 INFO:tasks.ceph.mon.c.vpm183.stderr: 21: (()+0x8182) [0x7f5aaf8c5182] 2014-08-08T10:04:39.918 INFO:tasks.ceph.mon.c.vpm183.stderr: 22: (clone()+0x6d) [0x7f5aae03938d] 2014-08-08T10:04:39.918 INFO:tasks.ceph.mon.c.vpm183.stderr:2014-08-08 17:04:39.915353 7f5aaa140700 -1 *** Caught signal (Aborted) ** 2014-08-08T10:04:39.918 INFO:tasks.ceph.mon.c.vpm183.stderr: in thread 7f5aaa140700
Updated by Sage Weil over 9 years ago
we probably have to do a reencoding trick like we do in MOSDMap?
Updated by Sage Weil over 9 years ago
- Status changed from New to Fix Under Review
- Assignee set to John Spray
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Resolved
Actions