Bug #44384
qa: FAIL: test_evicted_caps (tasks.cephfs.test_client_recovery.TestClientRecovery)
0%
Description
2020-02-29T11:32:29.347 INFO:tasks.cephfs_test_runner:====================================================================== 2020-02-29T11:32:29.347 INFO:tasks.cephfs_test_runner:FAIL: test_evicted_caps (tasks.cephfs.test_client_recovery.TestClientRecovery) 2020-02-29T11:32:29.347 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2020-02-29T11:32:29.347 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2020-02-29T11:32:29.348 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200229.001503/qa/tasks/cephfs/test_client_recovery.py", line 315, in test_evicted_caps 2020-02-29T11:32:29.348 INFO:tasks.cephfs_test_runner: self.assertFalse(cap_waiter.finished) 2020-02-29T11:32:29.348 INFO:tasks.cephfs_test_runner:AssertionError: True is not false
From: /ceph/teuthology-archive/pdonnell-2020-02-29_02:56:38-kcephfs-wip-pdonnell-testing-20200229.001503-distro-basic-smithi/4811085/teuthology.log
See also
Failure: Test failure: test_evicted_caps (tasks.cephfs.test_client_recovery.TestClientRecovery) 2 jobs: ['4811085', '4811122'] suites intersection: ['clusters/1-mds-4-client.yaml', 'conf/{client.yaml', 'k-testing.yaml}', 'kcephfs/recovery/{begin.yaml', 'kclient/{mount.yaml', 'log-config.yaml', 'mds.yaml', 'mon.yaml', 'ms-die-on-skipped.yaml}}', 'osd-asserts.yaml', 'osd.yaml}', 'overrides/{distro/testing/{flavor/ubuntu_latest.yaml', 'overrides/{frag_enable.yaml', 'tasks/client-recovery.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}'] suites union: ['clusters/1-mds-4-client.yaml', 'conf/{client.yaml', 'k-testing.yaml}', 'kcephfs/recovery/{begin.yaml', 'kclient/{mount.yaml', 'log-config.yaml', 'mds.yaml', 'mon.yaml', 'ms-die-on-skipped.yaml}}', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'osd-asserts.yaml', 'osd.yaml}', 'overrides/{distro/testing/{flavor/ubuntu_latest.yaml', 'overrides/{frag_enable.yaml', 'tasks/client-recovery.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
Looks like yet another testing kernel regression.
History
#1 Updated by Jeff Layton almost 4 years ago
Sorry for the long delay on this. I haven't heard of this happening since the original occurrence, but I decided to dig out the log and have a closer look. Basically this test has one client do a write, and then expects that write to hang due to the other client not giving up caps (because it's down):
2020-02-29T11:30:43.006 DEBUG:tasks.cephfs.mount:File background_file became visible from 1 after 0s 2020-02-29T11:30:43.035 INFO:teuthology.orchestra.console:Performing hard reset of smithi196 2020-02-29T11:30:43.035 DEBUG:teuthology.orchestra.console:pexpect command: ipmitool -H smithi196.ipmi.sepia.ceph.com -I lanplus -U inktank -P ApGNXcA7 power reset 2020-02-29T11:30:43.096 INFO:teuthology.orchestra.console:Hard reset for smithi196 completed 2020-02-29T11:30:48.198 INFO:teuthology.orchestra.run.smithi160:> sudo adjust-ulimits daemon-helper kill python3 -c ' 2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> import os 2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> import time 2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> 2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> fd = os.open("/home/ubuntu/cephtest/mnt.1/background_file", os.O_RDWR | os.O_CREAT, 0o644) 2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> try: 2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:> while True: 2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:> os.write(fd, b'"'"'content'"'"') 2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:> time.sleep(1) 2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:> if not False: 2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:> break 2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:> except IOError as e: 2020-02-29T11:30:48.201 INFO:teuthology.orchestra.run.smithi160:> pass 2020-02-29T11:30:48.201 INFO:teuthology.orchestra.run.smithi160:> os.close(fd) 2020-02-29T11:30:48.201 INFO:teuthology.orchestra.run.smithi160:> ' 2020-02-29T11:31:06.852 INFO:teuthology.orchestra.run.smithi130:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-02-29T11:31:06.856 INFO:teuthology.orchestra.run.smithi132:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-02-29T11:31:06.861 INFO:teuthology.orchestra.run.smithi160:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-02-29T11:31:06.866 INFO:teuthology.orchestra.run.smithi162:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-02-29T11:31:06.874 INFO:teuthology.orchestra.run.smithi183:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-02-29T11:31:06.877 INFO:teuthology.orchestra.run.smithi196:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-02-29T11:31:23.208 INFO:teuthology.misc:Re-opening connections... 2020-02-29T11:31:23.208 INFO:teuthology.misc:trying to connect to ubuntu@smithi196.front.sepia.ceph.com 2020-02-29T11:31:23.210 INFO:teuthology.orchestra.remote:Trying to reconnect to host 2020-02-29T11:31:23.210 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'smithi196.front.sepia.ceph.com', 'timeout': 60} 2020-02-29T11:31:23.240 DEBUG:tasks.ceph:Missed logrotate, EOFError 2020-02-29T11:31:53.241 INFO:teuthology.orchestra.run.smithi130:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2020-02-29T11:31:53.244 INFO:teuthology.orchestra.run.smithi132:> sudo logrotate /etc/logrotate.d/ceph-test.conf
Unfortunately, it doesn't say what happened during the write attempt. Did it succeed? Did it fail with a hard error besides -EIO? We may need to doctor this test to log more info about what happened with the write.
#2 Updated by Patrick Donnelly almost 4 years ago
- Target version changed from v15.0.0 to v16.0.0
#3 Updated by Patrick Donnelly about 3 years ago
- Target version changed from v16.0.0 to v17.0.0
- Backport set to pacific,octopus,nautilus
#4 Updated by Jeff Layton almost 3 years ago
- Priority changed from Urgent to Normal
Lowering priority to Normal. Patrick have there been any more occurrences of this?
#5 Updated by Patrick Donnelly almost 3 years ago
- Status changed from New to Can't reproduce
Jeff Layton wrote:
Lowering priority to Normal. Patrick have there been any more occurrences of this?
I have not seen it recently. We'll close it for now and I'll reopen if I see it again.