Project

General

Profile

Bug #45434

qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed

Added by Ramana Raja 7 months ago. Updated about 1 month ago.

Status:
Triaged
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature:

Description

http://pulpito.ceph.com/yuriw-2020-05-05_20:55:43-kcephfs-wip-yuri-testing-2020-05-05-1439-distro-basic-smithi/5026174/

020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:======================================================================
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:ERROR: test_full_fsync (tasks.cephfs.test_full.TestClusterFull)
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuri-testing-2020-05-05-1439/qa/tasks/cephfs/test_full.py", line 356, in test_full_fsync
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:    self._remote_write_test(remote_script)
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuri-testing-2020-05-05-1439/qa/tasks/cephfs/test_full.py", line 222, in _remote_write_test
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:    is_fuse=isinstance(self.mount_a, FuseMount)
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuri-testing-2020-05-05-1439/qa/tasks/cephfs/mount.py", line 191, in run_python
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:    p.wait()
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/orchestra/run.py", line 162, in wait
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/orchestra/run.py", line 184, in _raise_for_status
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:    node=self.hostname, label=self.label
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:CommandFailedError: Command failed on smithi171 with status 1: 'sudo adjust-ulimits daemon-helper kill python3 -c \'\nimport time\nimport datetime\nimport subprocess\nimport os\n\n# Write some buffered data through before going full, all should be well\nprint("writing some data through which we expect to succeed")\nbytes = 0\nf = os.open("/home/ubuntu/cephtest/mnt.0/full_test_file", os.O_WRONLY | os.O_CREAT)\nbytes += os.write(f, b\'"\'"\'a\'"\'"\' * 4096)\nos.fsync(f)\nprint("fsync\'"\'"\'ed data successfully, will now attempt to fill fs")\n\n# Okay, now we\'"\'"\'re going to fill up the filesystem, and then keep\n# writing until we see an error from fsync.  As long as we\'"\'"\'re doing\n# buffered IO, the error should always only appear from fsync and not\n# from write\nfull = False\n\nfor n in range(0, int(373 * 1.1)):\n    try:\n        bytes += os.write(f, b\'"\'"\'x\'"\'"\' * 1024 * 1024)\n        print("wrote bytes via buffered write, moving on to fsync")\n    except OSError as e:\n        print("Unexpected error %s from write() instead of fsync()" % e)\n        raise\n\n    try:\n        os.fsync(f)\n        print("fsync\'"\'"\'ed successfully")\n    except OSError as e:\n        print("Reached fullness after %.2f MB" % (bytes / (1024.0 * 1024.0)))\n        full = True\n        break\n    else:\n        print("Not full yet after %.2f MB" % (bytes / (1024.0 * 1024.0)))\n\n    if n > 373 * 0.9:\n        # Be cautious in the last region where we expect to hit\n        # the full condition, so that we don\'"\'"\'t overshoot too dramatically\n        print("sleeping a bit as we\'"\'"\'ve exceeded 90% of our expected full ratio")\n        time.sleep(15.0)\n\nif not full:\n    raise RuntimeError("Failed to reach fullness after writing %d bytes" % bytes)\n\n# close() should not raise an error because we already caught it in\n# fsync.  There shouldn\'"\'"\'t have been any more writeback errors\n# since then because all IOs got cancelled on the full flag.\nprint("calling close")\nos.close(f)\nprint("close() did not raise error")\n\nos.unlink("/home/ubuntu/cephtest/mnt.0/full_test_file")\n\''

And I see this earlier in the teuthology.log

2020-05-06T18:04:34.297 INFO:teuthology.orchestra.run.smithi171.stdout:Not full yet after 338.00 MB
2020-05-06T18:04:34.297 INFO:teuthology.orchestra.run.smithi171.stdout:sleeping a bit as we've exceeded 90% of our expected full ratio
2020-05-06T18:04:34.297 INFO:teuthology.orchestra.run.smithi171.stdout:Unexpected error [Errno 28] No space left on device from write() instead of fsync()
2020-05-06T18:04:34.298 INFO:teuthology.orchestra.run.smithi171.stderr:Traceback (most recent call last):
2020-05-06T18:04:34.298 INFO:teuthology.orchestra.run.smithi171.stderr:  File "<string>", line 23, in <module>
2020-05-06T18:04:34.298 INFO:teuthology.orchestra.run.smithi171.stderr:OSError: [Errno 28] No space left on device
2020-05-06T18:04:34.527 INFO:teuthology.orchestra.run.smithi171.stderr:daemon-helper: command failed with exit status 1
2020-05-06T18:04:34.546 DEBUG:teuthology.orchestra.run:got remote process result: 1
2020-05-06T18:04:34.548 INFO:tasks.cephfs_test_runner:test_full_fsync (tasks.cephfs.test_full.TestClusterFull) ... ERROR

History

#1 Updated by Ramana Raja 7 months ago

  • Description updated (diff)

#4 Updated by Patrick Donnelly 6 months ago

/ceph/teuthology-archive/pdonnell-2020-06-12_09:37:27-kcephfs-wip-pdonnell-testing-20200612.063208-distro-basic-smithi/5141828/teuthology.log

Looks like only on the testing branch.

#5 Updated by Patrick Donnelly 6 months ago

  • Subject changed from octopus: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed to qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
  • Status changed from New to Triaged
  • Priority changed from Normal to Urgent
  • Target version set to v16.0.0
  • Backport set to octopus,nautilus
  • Component(FS) qa-suite added
  • Labels (FS) qa added

#6 Updated by Patrick Donnelly about 2 months ago

  • Labels (FS) qa-failure added

Also available in: Atom PDF