Actions
Bug #21466
closedqa: fs.get_config on stopped MDS
% Done:
0%
Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2017-09-16T23:29:00.018 INFO:teuthology.orchestra.run.smithi179:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.a.asok config get mds_tick_interval' 2017-09-16T23:29:00.128 INFO:teuthology.orchestra.run.smithi179.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused 2017-09-16T23:29:00.145 INFO:tasks.cephfs_test_runner:test_replicated_delete_speed (tasks.cephfs.test_strays.TestStrays) ... ERROR
From: http://pulpito.ceph.com/yuriw-2017-09-16_21:36:24-kcephfs-luminous-testing-basic-smithi/1641261/
Oddly, we're missing debug messages for the respawn when the MDS is failed:
2017-09-16 23:28:12.823778 7f65f7073700 10 mds.beacon.a handle_mds_beacon up:active seq 7 rtt 0.000646 2017-09-16 23:28:28.563449 7fef1b9cb180 0 ceph version 12.2.0-250-gddf8424 (ddf84249fa8a8ec3655c39bac5331ab81c0307b1) luminous (stable), process (unknown), pid 9945 2017-09-16 23:28:28.565632 7fef1b9cb180 1 -- 0.0.0.0:6805/1093516685 _finish_bind bind my_inst.addr is 0.0.0.0:6805/1093516685 2017-09-16 23:28:28.568777 7fef1b9cb180 1 -- 0.0.0.0:6805/1093516685 start start
From: /ceph/teuthology-archive/yuriw-2017-09-16_21:36:24-kcephfs-luminous-testing-basic-smithi/1641261/remote/smithi179/log/ceph-mds.a.log.gz
The `mds fail` happened here:
2017-09-16T23:28:42.355 DEBUG:tasks.ceph.mds.a:waiting for process to exit 2017-09-16T23:28:42.355 INFO:teuthology.orchestra.run:waiting for 300 2017-09-16T23:28:42.400 INFO:tasks.ceph.mds.a:Stopped 2017-09-16T23:28:42.400 INFO:teuthology.orchestra.run.smithi179:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mds fail a' 2017-09-16T23:28:43.096 INFO:teuthology.orchestra.run.smithi179.stderr:failed mds gid 5160
The actual failure of the test seems to be that the respawned MDS in up:standby didn't respond to the admin socket:
2017-09-16T23:29:00.128 INFO:teuthology.orchestra.run.smithi179.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
And the MDS log shows:
2017-09-16 23:28:36.572507 7fef15596700 10 mds.beacon.a handle_mds_beacon up:standby seq 3 rtt 0.000431 2017-09-16 23:28:38.573949 7fef13592700 1 -- 172.21.15.179:6805/1093516685 --> 172.21.15.112:6800/14978 -- mgrreport(unknown.a +0-0 packed 358) v4 -- 0x564540eaeb00 con 0 2017-09-16 23:28:40.572177 7fef12590700 10 mds.beacon.a _send up:standby seq 4 2017-09-16 23:28:40.572210 7fef12590700 1 -- 172.21.15.179:6805/1093516685 --> 172.21.15.179:6789/0 -- mdsbeacon(5160/a up:standby seq 4 v208) v7 -- 0x564540ec2680 con 0 2017-09-16 23:28:40.572742 7fef15596700 1 -- 172.21.15.179:6805/1093516685 <== mon.0 172.21.15.179:6789/0 12 ==== mdsbeacon(5160/a up:standby seq 4 v208) v7 ==== 126+0+0 (1487111955 0 0) 0x564540ec2680 con 0x564540e57800 2017-09-16 23:28:40.572791 7fef15596700 10 mds.beacon.a handle_mds_beacon up:standby seq 4 rtt 0.000595
Actions