Actions
Bug #8725
closedmds crashed in upgrade:dumpling-x:stress-split-master-testing-basic-plana
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
coredump info from /*remote/plana57/log/ceph-mds.a.log.gz :
2014-07-01 12:05:07.682897 7f81a917f700 0 mds.0.cache creating system inode with ino:200 2014-07-01 12:05:22.660113 7f81a917f700 1 mds.0.1 creating_done 2014-07-01 12:05:22.797318 7f81a917f700 1 mds.0.1 handle_mds_map i am now mds.0.1 2014-07-01 12:05:22.797331 7f81a917f700 1 mds.0.1 handle_mds_map state change up:creating --> up:active 2014-07-01 12:05:22.797335 7f81a917f700 1 mds.0.1 active_start 2014-07-01 12:06:36.026362 7f81a566e700 0 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6808/4560 pipe(0x2d6e780 sd=22 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d2c0).fault 2014-07-01 12:06:42.028396 7f81a5870700 0 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6800/4558 pipe(0x2cfac80 sd=22 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d160).fault 2014-07-01 12:06:48.048396 7f81a566e700 0 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6804/4559 pipe(0x2cfaa00 sd=21 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d000).fault 2014-07-01 12:12:39.418941 7f81a917f700 0 monclient: hunting for new mon 2014-07-01 12:12:40.538635 7f81a917f700 -1 *** Caught signal (Aborted) ** in thread 7f81a917f700 ceph version 0.67.9-20-g583e6e3 (583e6e3ef7f28bf34fe038e8a2391f9325a69adf) 1: ceph-mds() [0x98ea4a] 2: (()+0xfcb0) [0x7f81acefccb0] 3: (gsignal()+0x35) [0x7f81ab3d3425] 4: (abort()+0x17b) [0x7f81ab3d6b8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f81abd2669d] 6: (()+0xb5846) [0x7f81abd24846] 7: (()+0xb5873) [0x7f81abd24873] 8: (()+0xb596e) [0x7f81abd2496e] 9: (MDSMap::decode(ceph::buffer::list::iterator&)+0xe53) [0x801f43] 10: (MDS::handle_mds_map(MMDSMap*)+0x5aa) [0x588f8a] 11: (MDS::handle_core_message(Message*)+0x5bb) [0x58d67b] 12: (MDS::_dispatch(Message*)+0x2f) [0x58ddaf] 13: (MDS::ms_dispatch(Message*)+0x1d3) [0x58f843] 14: (DispatchQueue::entry()+0x549) [0x95abe9] 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x87b50d] 16: (()+0x7e9a) [0x7f81acef4e9a] 17: (clone()+0x6d) [0x7f81ab4913fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -1018> 2014-07-01 12:05:07.551553 7f81ad318780 5 asok(0x2cda000) register_command perfcounters_dump hook 0x2ccf010 -1017> 2014-07-01 12:05:07.551587 7f81ad318780 5 asok(0x2cda000) register_command 1 hook 0x2ccf010 -1016> 2014-07-01 12:05:07.551598 7f81ad318780 5 asok(0x2cda000) register_command perf dump hook 0x2ccf010 -1015> 2014-07-01 12:05:07.551605 7f81ad318780 5 asok(0x2cda000) register_command perfcounters_schema hook 0x2ccf010 -1014> 2014-07-01 12:05:07.551608 7f81ad318780 5 asok(0x2cda000) register_command 2 hook 0x2ccf010 -1013> 2014-07-01 12:05:07.551611 7f81ad318780 5 asok(0x2cda000) register_command perf schema hook 0x2ccf010 -1012> 2014-07-01 12:05:07.551614 7f81ad318780 5 asok(0x2cda000) register_command config show hook 0x2ccf010 -- -30> 2014-07-01 12:12:37.436373 7f81a5870700 2 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6800/4558 pipe(0x2cfac80 sd=18 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d160).connect error 10.214.132.21:6800/4558, 111: Connection refused -29> 2014-07-01 12:12:37.436424 7f81a5870700 2 -- 10.214.132.21:6812/4830 >> 10.214.132.21:6800/4558 pipe(0x2cfac80 sd=18 :0 s=1 pgs=0 cs=0 l=1 c=0x2d3d160).fault 111: Connection refused -28> 2014-07-01 12:12:37.572533 7f81a707a700 5 mds.0.1 is_laggy 18.893766 > 15 since last acked beacon -27> 2014-07-01 12:12:37.572557 7f81a707a700 5 mds.0.1 tick bailing out since we seem laggy -26> 2014-07-01 12:12:38.679548 7f81a707a700 10 monclient: _send_mon_message to mon.c at 10.214.131.19:6789/0 -25> 2014-07-01 12:12:38.679574 7f81a707a700 1 -- 10.214.132.21:6812/4830 --> 10.214.131.19:6789/0 -- mdsbeacon(4107/a up:active seq 114 v5) v2 -- ?+0 0x2d4a340 con 0x2cf3580 -24> 2014-07-01 12:12:39.418625 7f81a717b700 2 -- 10.214.132.21:6812/4830 >> 10.214.131.19:6789/0 pipe(0x2cfa500 sd=8 :36066 s=2 pgs=8 cs=1 l=1 c=0x2cf3580).reader couldn't read tag, Success -23> 2014-07-01 12:12:39.418683 7f81a717b700 2 -- 10.214.132.21:6812/4830 >> 10.214.131.19:6789/0 pipe(0x2cfa500 sd=8 :36066 s=2 pgs=8 cs=1 l=1 c=0x2cf3580).fault 0: Success -22> 2014-07-01 12:12:39.418920 7f81a917f700 10 monclient: ms_handle_reset current mon 10.214.131.19:6789/0 -21> 2014-07-01 12:12:39.418941 7f81a917f700 0 monclient: hunting for new mon -20> 2014-07-01 12:12:39.418944 7f81a917f700 10 monclient: _reopen_session rank -1 name -19> 2014-07-01 12:12:39.418950 7f81a917f700 1 -- 10.214.132.21:6812/4830 mark_down 0x2cf3580 -- pipe dne -18> 2014-07-01 12:12:39.419013 7f81a917f700 10 monclient: picked mon.b con 0x2d3d580 addr 10.214.132.21:6790/0 -17> 2014-07-01 12:12:39.419039 7f81a917f700 10 monclient(hunting): _send_mon_message to mon.b at 10.214.132.21:6790/0 -16> 2014-07-01 12:12:39.419047 7f81a917f700 1 -- 10.214.132.21:6812/4830 --> 10.214.132.21:6790/0 -- auth(proto 0 26 bytes epoch 1) v1 -- ?+0 0x2d646c0 con 0x2d3d580 -15> 2014-07-01 12:12:39.419063 7f81a917f700 10 monclient(hunting): renew_subs -14> 2014-07-01 12:12:39.419068 7f81a917f700 5 mds.0.1 ms_handle_reset on 10.214.131.19:6789/0 -13> 2014-07-01 12:12:39.420144 7f81a917f700 5 mds.0.1 ms_handle_connect on 10.214.132.21:6790/0 -12> 2014-07-01 12:12:39.421258 7f81a917f700 1 -- 10.214.132.21:6812/4830 <== mon.2 10.214.132.21:6790/0 1 ==== auth_reply(proto 2 0 Success) v1 ==== 33+0+0 (2748226891 0 0) 0x2cdf600 con 0x2d3d580 -11> 2014-07-01 12:12:39.421363 7f81a917f700 10 monclient(hunting): _send_mon_message to mon.b at 10.214.132.21:6790/0 -10> 2014-07-01 12:12:39.421372 7f81a917f700 1 -- 10.214.132.21:6812/4830 --> 10.214.132.21:6790/0 -- auth(proto 2 128 bytes epoch 0) v1 -- ?+0 0x2d0c000 con 0x2d3d580 -9> 2014-07-01 12:12:39.422600 7f81a917f700 1 -- 10.214.132.21:6812/4830 <== mon.2 10.214.132.21:6790/0 2 ==== auth_reply(proto 2 0 Success) v1 ==== 225+0+0 (310302456 0 0) 0x2d5b400 con 0x2d3d580 -8> 2014-07-01 12:12:39.422676 7f81a917f700 1 monclient(hunting): found mon.b -7> 2014-07-01 12:12:39.422688 7f81a917f700 10 monclient: _send_mon_message to mon.b at 10.214.132.21:6790/0 -6> 2014-07-01 12:12:39.422703 7f81a917f700 1 -- 10.214.132.21:6812/4830 --> 10.214.132.21:6790/0 -- mon_subscribe({mdsmap=6+,monmap=2+}) v2 -- ?+0 0x2d51000 con 0x2d3d580 -5> 2014-07-01 12:12:39.422738 7f81a917f700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2014-07-01 12:12:09.422737) -4> 2014-07-01 12:12:39.423907 7f81a917f700 1 -- 10.214.132.21:6812/4830 <== mon.2 10.214.132.21:6790/0 3 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (131601739 0 0) 0x2cfbe00 con 0x2d3d580 -3> 2014-07-01 12:12:39.423940 7f81a917f700 10 monclient: handle_subscribe_ack sent 2014-07-01 12:12:39.419065 renew after 2014-07-01 12:15:09.419065 -2> 2014-07-01 12:12:40.535756 7f81a917f700 1 -- 10.214.132.21:6812/4830 <== mon.2 10.214.132.21:6790/0 4 ==== mdsmap(e 6) v1 ==== 598+0+0 (1390759494 0 0) 0x2d5ba00 con 0x2d3d580 -1> 2014-07-01 12:12:40.535799 7f81a917f700 5 mds.0.1 handle_mds_map epoch 6 from mon.2 0> 2014-07-01 12:12:40.538635 7f81a917f700 -1 *** Caught signal (Aborted) ** in thread 7f81a917f700 ceph version 0.67.9-20-g583e6e3 (583e6e3ef7f28bf34fe038e8a2391f9325a69adf) 1: ceph-mds() [0x98ea4a] 2: (()+0xfcb0) [0x7f81acefccb0] 3: (gsignal()+0x35) [0x7f81ab3d3425] 4: (abort()+0x17b) [0x7f81ab3d6b8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f81abd2669d] 6: (()+0xb5846) [0x7f81abd24846] 7: (()+0xb5873) [0x7f81abd24873] 8: (()+0xb596e) [0x7f81abd2496e] 9: (MDSMap::decode(ceph::buffer::list::iterator&)+0xe53) [0x801f43] 10: (MDS::handle_mds_map(MMDSMap*)+0x5aa) [0x588f8a] 11: (MDS::handle_core_message(Message*)+0x5bb) [0x58d67b] 12: (MDS::_dispatch(Message*)+0x2f) [0x58ddaf] 13: (MDS::ms_dispatch(Message*)+0x1d3) [0x58f843] 14: (DispatchQueue::entry()+0x549) [0x95abe9] 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x87b50d] 16: (()+0x7e9a) [0x7f81acef4e9a] 17: (clone()+0x6d) [0x7f81ab4913fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-07-01T13:44:23.549 INFO:teuthology.orchestra.run.plana57.stderr:dumped all in format json 2014-07-01T13:44:23.659 INFO:teuthology.misc:Shutting down mds daemons... 2014-07-01T13:44:23.659 ERROR:teuthology.misc:Saw exception from mds.a Traceback (most recent call last): File "/home/teuthworker/teuthology-master/teuthology/misc.py", line 1093, in stop_daemons_of_type daemon.stop() File "/home/teuthworker/teuthology-master/teuthology/task/ceph.py", line 61, in stop run.wait([self.proc], timeout=timeout) File "/home/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 424, in wait proc.wait() File "/home/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 102, in wait exitstatus=status, node=self.hostname) CommandFailedError: Command failed on plana57 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f -i a' 2014-07-01T13:44:23.684 INFO:teuthology.misc:Shutting down osd daemons...
archive_path: /var/lib/teuthworker/archive/ubuntu-2014-07-01_11:38:37-upgrade:dumpling-x:stress-split-master-testing-basic-plana/337407 branch: master description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-cls.yaml 6-next-mon/monb.yaml 7-workload/rados_api_tests.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml} email: null job_id: '337407' kernel: &id001 kdb: true sha1: 8362a1290d075f376ba68521ffb3b42ecaaecfea last_in_suite: false machine_type: plana name: ubuntu-2014-07-01_11:38:37-upgrade:dumpling-x:stress-split-master-testing-basic-plana nuke-on-error: true os_type: ubuntu os_version: '14.04' overrides: admin_socket: branch: master ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - wrongly marked me down - objects unfound and apparently lost - log bound mismatch sha1: 1eca89df3586e07409773ff6797095bfc6ec2dcc ceph-deploy: branch: dev: master conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 1eca89df3586e07409773ff6797095bfc6ec2dcc s3tests: branch: master workunit: sha1: 1eca89df3586e07409773ff6797095bfc6ec2dcc owner: yuriw priority: 10 roles: - - mon.a - mon.b - mds.a - osd.0 - osd.1 - osd.2 - - osd.3 - osd.4 - osd.5 - mon.c - - client.0 suite: upgrade:dumpling-x:stress-split targets: ubuntu@plana21.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVbQbk5qpDD2687Wu8iZt7sHmIQbrLr4Esj4NzOzmkkwvhj0p7GmO825mdpu/YP25mXQkrvuKlfuKHZ9QyxfyiCy051FeuPqhSk0IqYYaTVRslrvQ9uSa+IhqE23LxFhWQt7Kgl9DqG7377qqgEXTqBCj/LMD2ix4ugXYRTVFQIXibvZlTjEsNlcPD61R80ZcWa6Jd1jm4XPtqKlr5Sfe4DfWb/VomgHC/frSdmAQTRwikaMpHOonLAo2Hx6WQ/6TOgeDfXgla7wZzIVD3aHTAXVFzkqVb/V6brLn7hMP2Ok3dpo1nDFRTY7Q3/PTJFKeqVkZZgYv0GMTzFDD4NNKN ubuntu@plana34.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDhl4nIoX/Xy21FdSNkHIKvw+VnRxEXBW+XW4ES9FJSNkAQ3fwmZxHyA71PIjzb/pFLyWKroR/QlcQth76U9Kj3OmU1zdPgtgTHeMY6nXoY+4moEbFPJdkQyJq3oarBc1J2UXl5msnQAsK0k0AjOwLEDcpdAVuzztKry6hIKGiNlGs8Eueo0MFfI710HJZGB6HyDr51NmMfP8SqS6KAonacLyxwd8F71ygT0Y9p4LE1dPPVkS8bJ9eov6qx401O9ZvCVC2wjce9g7p15wHbQroPVRz2gm/GeIeCSTCvmm+08BOxuKS3gSEoUZOJO00BxmMWbJMyregrVNcMt563swgp ubuntu@plana57.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDFHCeGWMPGOLyScKFkduv7aJL9bpMUPZQATO9lxpWu1NtzYndPJtWcyUxgWlItu75SJwpXx/l2GhPYcDKrR1Nl37+dbgs5TeDTbr9YdQBuLPbkbIZMQqO4GqUjurEwLU3vFUZ0X7PTlUqn6qwpT+I2YJua19eF2cRQFIGYVZMzaezm47uh67cdKFh0RTA1pSJ2qM/WMn91boRWcsRQrmn4BeOzfpGfSPDRjrHXHiPx3Br4zcOi/3lOxNFcEeoBrA47PMxvxVIlbmxKDfNjHpQQT18VFWb+qcTAzf+zdBy3iDRFFS45fPrqlWjGn9sK74EbRQanDrZlrFkg2a/HIe5T tasks: - internal.lock_machines: - 3 - plana - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.serialize_remote_roles: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: dumpling - ceph: fs: xfs - install.upgrade: osd.0: null - ceph.restart: daemons: - osd.0 - osd.1 - osd.2 - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 thrash_primary_affinity: false timeout: 1200 - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - cls/test_cls_rbd.sh - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - rados/test-upgrade-firefly.sh - install.upgrade: mon.c: null - ceph.restart: daemons: - mon.c wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: - a - b - c - workunit: branch: dumpling clients: client.0: - rados/test-upgrade-firefly.sh - workunit: branch: dumpling clients: client.0: - rbd/test_librbd_python.sh - rgw: client.0: idle_timeout: 300 - swift: client.0: rgw_server: client.0 - rados: clients: - client.0 objects: 500 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 teuthology_branch: master tube: plana verbose: false worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.12155
client.0-kernel-sha1: 8362a1290d075f376ba68521ffb3b42ecaaecfea description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-cls.yaml 6-next-mon/monb.yaml 7-workload/rados_api_tests.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml} duration: 6957.407259941101 failure_reason: 'Command failed on plana57 with status 1: ''sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f -i a''' flavor: basic mon.a-kernel-sha1: 8362a1290d075f376ba68521ffb3b42ecaaecfea osd.3-kernel-sha1: 8362a1290d075f376ba68521ffb3b42ecaaecfea owner: yuriw success: false
Actions