Project

General

Profile

Bug #44384

qa: FAIL: test_evicted_caps (tasks.cephfs.test_client_recovery.TestClientRecovery)

Added by Patrick Donnelly about 4 years ago. Updated almost 3 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-02-29T11:32:29.347 INFO:tasks.cephfs_test_runner:======================================================================
2020-02-29T11:32:29.347 INFO:tasks.cephfs_test_runner:FAIL: test_evicted_caps (tasks.cephfs.test_client_recovery.TestClientRecovery)
2020-02-29T11:32:29.347 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-02-29T11:32:29.347 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-02-29T11:32:29.348 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200229.001503/qa/tasks/cephfs/test_client_recovery.py", line 315, in test_evicted_caps
2020-02-29T11:32:29.348 INFO:tasks.cephfs_test_runner:    self.assertFalse(cap_waiter.finished)
2020-02-29T11:32:29.348 INFO:tasks.cephfs_test_runner:AssertionError: True is not false

From: /ceph/teuthology-archive/pdonnell-2020-02-29_02:56:38-kcephfs-wip-pdonnell-testing-20200229.001503-distro-basic-smithi/4811085/teuthology.log

See also

Failure: Test failure: test_evicted_caps (tasks.cephfs.test_client_recovery.TestClientRecovery)
2 jobs: ['4811085', '4811122']
suites intersection: ['clusters/1-mds-4-client.yaml', 'conf/{client.yaml', 'k-testing.yaml}', 'kcephfs/recovery/{begin.yaml', 'kclient/{mount.yaml', 'log-config.yaml', 'mds.yaml', 'mon.yaml', 'ms-die-on-skipped.yaml}}', 'osd-asserts.yaml', 'osd.yaml}', 'overrides/{distro/testing/{flavor/ubuntu_latest.yaml', 'overrides/{frag_enable.yaml', 'tasks/client-recovery.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['clusters/1-mds-4-client.yaml', 'conf/{client.yaml', 'k-testing.yaml}', 'kcephfs/recovery/{begin.yaml', 'kclient/{mount.yaml', 'log-config.yaml', 'mds.yaml', 'mon.yaml', 'ms-die-on-skipped.yaml}}', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'osd-asserts.yaml', 'osd.yaml}', 'overrides/{distro/testing/{flavor/ubuntu_latest.yaml', 'overrides/{frag_enable.yaml', 'tasks/client-recovery.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

Looks like yet another testing kernel regression.

History

#1 Updated by Jeff Layton almost 4 years ago

Sorry for the long delay on this. I haven't heard of this happening since the original occurrence, but I decided to dig out the log and have a closer look. Basically this test has one client do a write, and then expects that write to hang due to the other client not giving up caps (because it's down):

2020-02-29T11:30:43.006 DEBUG:tasks.cephfs.mount:File background_file became visible from 1 after 0s
2020-02-29T11:30:43.035 INFO:teuthology.orchestra.console:Performing hard reset of smithi196
2020-02-29T11:30:43.035 DEBUG:teuthology.orchestra.console:pexpect command: ipmitool -H smithi196.ipmi.sepia.ceph.com -I lanplus -U inktank -P ApGNXcA7 power reset
2020-02-29T11:30:43.096 INFO:teuthology.orchestra.console:Hard reset for smithi196 completed
2020-02-29T11:30:48.198 INFO:teuthology.orchestra.run.smithi160:> sudo adjust-ulimits daemon-helper kill python3 -c '
2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> import os
2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> import time
2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:>
2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> fd = os.open("/home/ubuntu/cephtest/mnt.1/background_file", os.O_RDWR | os.O_CREAT, 0o644)
2020-02-29T11:30:48.199 INFO:teuthology.orchestra.run.smithi160:> try:
2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:>     while True:
2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:>         os.write(fd, b'"'"'content'"'"')
2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:>         time.sleep(1)
2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:>         if not False:
2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:>             break
2020-02-29T11:30:48.200 INFO:teuthology.orchestra.run.smithi160:> except IOError as e:
2020-02-29T11:30:48.201 INFO:teuthology.orchestra.run.smithi160:>     pass
2020-02-29T11:30:48.201 INFO:teuthology.orchestra.run.smithi160:> os.close(fd)
2020-02-29T11:30:48.201 INFO:teuthology.orchestra.run.smithi160:> '
2020-02-29T11:31:06.852 INFO:teuthology.orchestra.run.smithi130:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T11:31:06.856 INFO:teuthology.orchestra.run.smithi132:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T11:31:06.861 INFO:teuthology.orchestra.run.smithi160:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T11:31:06.866 INFO:teuthology.orchestra.run.smithi162:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T11:31:06.874 INFO:teuthology.orchestra.run.smithi183:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T11:31:06.877 INFO:teuthology.orchestra.run.smithi196:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T11:31:23.208 INFO:teuthology.misc:Re-opening connections...
2020-02-29T11:31:23.208 INFO:teuthology.misc:trying to connect to ubuntu@smithi196.front.sepia.ceph.com
2020-02-29T11:31:23.210 INFO:teuthology.orchestra.remote:Trying to reconnect to host
2020-02-29T11:31:23.210 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'smithi196.front.sepia.ceph.com', 'timeout': 60}
2020-02-29T11:31:23.240 DEBUG:tasks.ceph:Missed logrotate, EOFError
2020-02-29T11:31:53.241 INFO:teuthology.orchestra.run.smithi130:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T11:31:53.244 INFO:teuthology.orchestra.run.smithi132:> sudo logrotate /etc/logrotate.d/ceph-test.conf

Unfortunately, it doesn't say what happened during the write attempt. Did it succeed? Did it fail with a hard error besides -EIO? We may need to doctor this test to log more info about what happened with the write.

#2 Updated by Patrick Donnelly almost 4 years ago

  • Target version changed from v15.0.0 to v16.0.0

#3 Updated by Patrick Donnelly about 3 years ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport set to pacific,octopus,nautilus

#4 Updated by Jeff Layton almost 3 years ago

  • Priority changed from Urgent to Normal

Lowering priority to Normal. Patrick have there been any more occurrences of this?

#5 Updated by Patrick Donnelly almost 3 years ago

  • Status changed from New to Can't reproduce

Jeff Layton wrote:

Lowering priority to Normal. Patrick have there been any more occurrences of this?

I have not seen it recently. We'll close it for now and I'll reopen if I see it again.

Also available in: Atom PDF