Bug #45434

qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed

Added by Ramana Raja 10 months ago. Updated 25 days ago.

Target version:
% Done:


3 - minor
Affected Versions:
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature:


020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:======================================================================
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:ERROR: test_full_fsync (tasks.cephfs.test_full.TestClusterFull)
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuri-testing-2020-05-05-1439/qa/tasks/cephfs/", line 356, in test_full_fsync
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:    self._remote_write_test(remote_script)
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuri-testing-2020-05-05-1439/qa/tasks/cephfs/", line 222, in _remote_write_test
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:    is_fuse=isinstance(self.mount_a, FuseMount)
2020-05-06T18:04:48.476 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuri-testing-2020-05-05-1439/qa/tasks/cephfs/", line 191, in run_python
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:    p.wait()
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/orchestra/", line 162, in wait
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/orchestra/", line 184, in _raise_for_status
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:    node=self.hostname, label=self.label
2020-05-06T18:04:48.477 INFO:tasks.cephfs_test_runner:CommandFailedError: Command failed on smithi171 with status 1: 'sudo adjust-ulimits daemon-helper kill python3 -c \'\nimport time\nimport datetime\nimport subprocess\nimport os\n\n# Write some buffered data through before going full, all should be well\nprint("writing some data through which we expect to succeed")\nbytes = 0\nf ="/home/ubuntu/cephtest/mnt.0/full_test_file", os.O_WRONLY | os.O_CREAT)\nbytes += os.write(f, b\'"\'"\'a\'"\'"\' * 4096)\nos.fsync(f)\nprint("fsync\'"\'"\'ed data successfully, will now attempt to fill fs")\n\n# Okay, now we\'"\'"\'re going to fill up the filesystem, and then keep\n# writing until we see an error from fsync.  As long as we\'"\'"\'re doing\n# buffered IO, the error should always only appear from fsync and not\n# from write\nfull = False\n\nfor n in range(0, int(373 * 1.1)):\n    try:\n        bytes += os.write(f, b\'"\'"\'x\'"\'"\' * 1024 * 1024)\n        print("wrote bytes via buffered write, moving on to fsync")\n    except OSError as e:\n        print("Unexpected error %s from write() instead of fsync()" % e)\n        raise\n\n    try:\n        os.fsync(f)\n        print("fsync\'"\'"\'ed successfully")\n    except OSError as e:\n        print("Reached fullness after %.2f MB" % (bytes / (1024.0 * 1024.0)))\n        full = True\n        break\n    else:\n        print("Not full yet after %.2f MB" % (bytes / (1024.0 * 1024.0)))\n\n    if n > 373 * 0.9:\n        # Be cautious in the last region where we expect to hit\n        # the full condition, so that we don\'"\'"\'t overshoot too dramatically\n        print("sleeping a bit as we\'"\'"\'ve exceeded 90% of our expected full ratio")\n        time.sleep(15.0)\n\nif not full:\n    raise RuntimeError("Failed to reach fullness after writing %d bytes" % bytes)\n\n# close() should not raise an error because we already caught it in\n# fsync.  There shouldn\'"\'"\'t have been any more writeback errors\n# since then because all IOs got cancelled on the full flag.\nprint("calling close")\nos.close(f)\nprint("close() did not raise error")\n\nos.unlink("/home/ubuntu/cephtest/mnt.0/full_test_file")\n\''

And I see this earlier in the teuthology.log

2020-05-06T18:04:34.297 full yet after 338.00 MB
2020-05-06T18:04:34.297 a bit as we've exceeded 90% of our expected full ratio
2020-05-06T18:04:34.297 error [Errno 28] No space left on device from write() instead of fsync()
2020-05-06T18:04:34.298 (most recent call last):
2020-05-06T18:04:34.298  File "<string>", line 23, in <module>
2020-05-06T18:04:34.298 [Errno 28] No space left on device
2020-05-06T18:04:34.527 command failed with exit status 1
2020-05-06T18:04:34.546 remote process result: 1
2020-05-06T18:04:34.548 INFO:tasks.cephfs_test_runner:test_full_fsync (tasks.cephfs.test_full.TestClusterFull) ... ERROR


#1 Updated by Ramana Raja 10 months ago

  • Description updated (diff)

#4 Updated by Patrick Donnelly 9 months ago


Looks like only on the testing branch.

#5 Updated by Patrick Donnelly 9 months ago

  • Subject changed from octopus: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed to qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failed
  • Status changed from New to Triaged
  • Priority changed from Normal to Urgent
  • Target version set to v16.0.0
  • Backport set to octopus,nautilus
  • Component(FS) qa-suite added
  • Labels (FS) qa added

#6 Updated by Patrick Donnelly 5 months ago

  • Labels (FS) qa-failure added

#8 Updated by Patrick Donnelly 3 months ago

  • Assignee deleted (Zheng Yan)

#9 Updated by Patrick Donnelly about 2 months ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport changed from octopus,nautilus to pacific,octopus,nautilus

Also available in: Atom PDF