Bug #14684
closedtest_scrub_checks fails
0%
Description
2016-02-03T22:32:28.995 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2016-02-03T22:32:28.995 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 39, in test_scrub_checks 2016-02-03T22:32:28.995 INFO:tasks.cephfs_test_runner: self._checks(1) 2016-02-03T22:32:28.995 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 123, in _checks 2016-02-03T22:32:28.995 INFO:tasks.cephfs_test_runner: self.asok_command(mds_rank, command, success_validator) 2016-02-03T22:32:28.995 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 149, in asok_command 2016-02-03T22:32:28.996 INFO:tasks.cephfs_test_runner: mds_id = self.fs.get_active_names()[mds_rank] 2016-02-03T22:32:28.996 INFO:tasks.cephfs_test_runner:IndexError: list index out of range
Updated by Greg Farnum about 8 years ago
- Priority changed from Normal to High
http://pulpito.ceph.com/teuthology-2016-02-10_14:03:02-fs-jewel---basic-smithi/4160/
2016-02-10T21:04:11.934 INFO:tasks.cephfs_test_runner:====================================================================== 2016-02-10T21:04:11.934 INFO:tasks.cephfs_test_runner:ERROR: test_scrub_checks (tasks.cephfs.test_scrub_checks.TestScrubChecks) 2016-02-10T21:04:11.935 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2016-02-10T21:04:11.935 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2016-02-10T21:04:11.935 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 39, in test_scrub_checks 2016-02-10T21:04:11.935 INFO:tasks.cephfs_test_runner: self._checks(1) 2016-02-10T21:04:11.935 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 133, in _checks 2016-02-10T21:04:11.935 INFO:tasks.cephfs_test_runner: lambda j, r: json_validator(j, r, "return_code", -errno.ENODATA)) 2016-02-10T21:04:11.936 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 156, in asok_command 2016-02-10T21:04:11.936 INFO:tasks.cephfs_test_runner: jout = json.loads(sout) 2016-02-10T21:04:11.936 INFO:tasks.cephfs_test_runner: File "/usr/lib/python2.7/json/__init__.py", line 338, in loads 2016-02-10T21:04:11.936 INFO:tasks.cephfs_test_runner: return _default_decoder.decode(s) 2016-02-10T21:04:11.936 INFO:tasks.cephfs_test_runner: File "/usr/lib/python2.7/json/decoder.py", line 369, in decode 2016-02-10T21:04:11.936 INFO:tasks.cephfs_test_runner: raise ValueError(errmsg("Extra data", s, end, len(s))) 2016-02-10T21:04:11.936 INFO:tasks.cephfs_test_runner:ValueError: Extra data: line 2 column 1 - line 3 column 1 (char 17 - 24)
Updated by Greg Farnum about 8 years ago
http://pulpito.ceph.com/teuthology-2016-02-12_14:03:01-fs-jewel---basic-smithi/8245/
2016-02-12T23:59:13.690 INFO:tasks.cephfs_test_runner:ValueError: Extra data: line 2 column 1 - line 3 column 1 (char 17 - 24)
We haven't seen it on master branch yet, strangely?
Updated by Greg Farnum about 8 years ago
http://pulpito.ceph.com/teuthology-2016-02-05_14:03:02-fs-jewel---basic-smithi/8806/
Same as the original case
Updated by Greg Farnum about 8 years ago
gregf-2016-02-15_13:33:12-fs-greg-fs-testing-215-sure---basic-smithi/10011/
original; and here we have it on master(+)
Updated by Greg Farnum about 8 years ago
- Priority changed from High to Urgent
And again: http://pulpito.ceph.com/gregf-2016-02-15_18:08:49-fs-greg-fs-testing-215-1---basic-smithi/10771/
It doesn't seem to be consistent, although all these examples are on smithi, so maybe we've got a lovely timing bug?
Updated by Zheng Yan about 8 years ago
http://pulpito.ceph.com/teuthology-2016-02-19_14:03:01-fs-jewel---basic-smithi/17838/
2016-02-19T19:49:01.019 INFO:tasks.cephfs_test_runner:====================================================================== 2016-02-19T19:49:01.019 INFO:tasks.cephfs_test_runner:ERROR: test_scrub_checks (tasks.cephfs.test_scrub_checks.TestScrubChecks) 2016-02-19T19:49:01.019 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2016-02-19T19:49:01.020 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2016-02-19T19:49:01.020 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 39, in test_scrub_checks 2016-02-19T19:49:01.020 INFO:tasks.cephfs_test_runner: self._checks(1) 2016-02-19T19:49:01.020 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 123, in _checks 2016-02-19T19:49:01.020 INFO:tasks.cephfs_test_runner: self.asok_command(mds_rank, command, success_validator) 2016-02-19T19:49:01.021 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_jewel/tasks/cephfs/test_scrub_checks.py", line 149, in asok_command 2016-02-19T19:49:01.021 INFO:tasks.cephfs_test_runner: mds_id = self.fs.get_active_names()[mds_rank] 2016-02-19T19:49:01.021 INFO:tasks.cephfs_test_runner:IndexError: list index out of range
Updated by Zheng Yan about 8 years ago
- Status changed from New to Fix Under Review
- Assignee set to John Spray
2016-02-20 03:49:00.515417 mon.0 172.21.15.7:6789/0 145 : cluster [INF] osdmap e16: 4 osds: 4 up, 4 in 2016-02-20 03:49:00.565680 mon.0 172.21.15.7:6789/0 146 : cluster [INF] mdsmap e19: 1/1/1 up {0=a=up:replay} 2016-02-20 03:49:00.565745 mon.0 172.21.15.7:6789/0 147 : cluster [INF] pgmap v36: 32 pgs: 32 active+clean; 12393 kB data, 593 MB used, 368 GB / 372 GB avail; 0 B/s rd, 0 B/s wr, 323 op/s 2016-02-20 03:49:00.334717 mon.0 172.21.15.7:6789/0 142 : cluster [INF] pgmap v35: 32 pgs: 32 active+clean; 12393 kB data, 593 MB used, 368 GB / 372 GB avail; 354 B/s rd, 96511 B/s wr, 400 op/s 2016-02-20 03:49:00.334819 mon.0 172.21.15.7:6789/0 143 : cluster [INF] mds.0 172.21.15.7:6808/19849 up:active 2016-02-20 03:49:00.392920 mon.0 172.21.15.7:6789/0 144 : cluster [INF] mdsmap e18: 1/1/1 up {0=a-s=up:active}, 1 up:standby 2016-02-20 03:49:00.515417 mon.0 172.21.15.7:6789/0 145 : cluster [INF] osdmap e16: 4 osds: 4 up, 4 in 2016-02-20 03:49:00.565680 mon.0 172.21.15.7:6789/0 146 : cluster [INF] mdsmap e19: 1/1/1 up {0=a=up:replay} 2016-02-20 03:49:00.565745 mon.0 172.21.15.7:6789/0 147 : cluster [INF] pgmap v36: 32 pgs: 32 active+clean; 12393 kB data, 593 MB used, 368 GB / 372 GB avail; 0 B/s rd, 0 B/s wr, 323 op/s 2016-02-20 03:49:01.599201 mon.0 172.21.15.7:6789/0 150 : cluster [INF] mds.? 172.21.15.7:6809/19849 up:boot 2016-02-20 03:49:01.599342 mon.0 172.21.15.7:6789/0 151 : cluster [INF] mds.0 172.21.15.18:6808/29561 up:reconnect 2016-02-20 03:49:01.599467 mon.0 172.21.15.7:6789/0 152 : cluster [INF] mdsmap e20: 1/1/1 up {0=a=up:reconnect}, 1 up:standby 2016-02-20 03:49:01.599201 mon.0 172.21.15.7:6789/0 150 : cluster [INF] mds.? 172.21.15.7:6809/19849 up:boot 2016-02-20 03:49:01.599342 mon.0 172.21.15.7:6789/0 151 : cluster [INF] mds.0 172.21.15.18:6808/29561 up:reconnect 2016-02-20 03:49:01.599467 mon.0 172.21.15.7:6789/0 152 : cluster [INF] mdsmap e20: 1/1/1 up {0=a=up:reconnect}, 1 up:standby
there were mds takeovers even we didn't enable mds thrash.
2016-02-20 03:48:59.809408 7fd4c8036700 1 -- 172.21.15.7:6789/0 <== mds.0 172.21.15.7:6808/19849 20 ==== mdsbeacon(4179/a-s up:active seq 12 v17) v4 ==== 355+0+0 (3921659038 0 0) 0x7fd4db429c00 con 0x7fd4db7cb080 2016-02-20 03:48:59.809480 7fd4c8036700 20 mon.b@0(leader) e1 _ms_dispatch existing session 0x7fd4dba8cc40 for mds.? 172.21.15.7:6808/19849 2016-02-20 03:48:59.809489 7fd4c8036700 20 mon.b@0(leader) e1 caps allow * 2016-02-20 03:48:59.809500 7fd4c8036700 10 mon.b@0(leader).paxosservice(mdsmap 1..17) dispatch 0x7fd4db429c00 mdsbeacon(4179/a-s up:active seq 12 v17) v4 from mds.0 172.21.15.7:6808/19849 con 0x7fd4db7cb080 2016-02-20 03:48:59.809510 7fd4c8036700 5 mon.b@0(leader).paxos(paxos active c 1..124) is_readable = 1 - now=2016-02-20 03:48:59.809510 lease_expire=2016-02-20 03:49:04.074688 has v0 lc 124 2016-02-20 03:48:59.809526 7fd4c8036700 10 mon.b@0(leader).mds e17 preprocess_query mdsbeacon(4179/a-s up:active seq 12 v17) v4 from mds.0 172.21.15.7:6808/19849 2016-02-20 03:48:59.809541 7fd4c8036700 20 is_capable service=mds command= exec on cap allow * 2016-02-20 03:48:59.809545 7fd4c8036700 20 allow so far , doing grant allow * 2016-02-20 03:48:59.809546 7fd4c8036700 20 allow all 2016-02-20 03:48:59.809548 7fd4c8036700 12 mon.b@0(leader).mds e17 preprocess_beacon mdsbeacon(4179/a-s up:active seq 12 v17) v4 from mds.0 172.21.15.7:6808/19849 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table} 2016-02-20 03:48:59.809562 7fd4c8036700 20 mon.b@0(leader).mds e17 preprocess_beacon health metrics for gid 4179 were updated
2016-02-20 03:49:00.334984 7fd4c9839700 10 mon.b@0(leader).mds e18 e18: 1/1/1 up {0=a-s=up:active}, 1 up:standby 2016-02-20 03:49:00.335006 7fd4c9839700 10 mon.b@0(leader).mds e18 no beacon from 4179 172.21.15.7:6808/19849 mds.0.2 up:active since 2016-02-20 03:48:43.808993 2016-02-20 03:49:00.335022 7fd4c9839700 10 mon.b@0(leader).mds e18 replacing 4179 172.21.15.7:6808/19849 mds.0.2 up:active with 4136/a 172.21.15.18:6808/29561
looks like monitor does note beacon if health metrics changes.
Updated by Sage Weil about 8 years ago
- Status changed from Fix Under Review to Resolved