Bug #14698
closedTest failure: test_full_fsync (tasks.cephfs.test_full.TestQuotaFull)
0%
Description
http://pulpito.ceph.com/gregf-2016-02-08_00:50:24-fs-greg-fs-testing-27-1---basic-mira/10806/
http://pulpito.ceph.com/teuthology-2016-02-05_14:03:02-fs-jewel---basic-smithi/8812/]
http://pulpito.ceph.com/gregf-2016-02-03_23:36:53-fs-greg-fs-testing-23---basic-mira/6032/
http://pulpito.ceph.com/gregf-2015-12-22_19:56:30-fs-master---basic-mira/2891/
This has been happening a lot lately, obviously, but is also very new. Not sure what we could have done to break it.
Updated by John Spray about 8 years ago
Different to http://tracker.ceph.com/issues/14258 ?
Updated by Greg Farnum about 8 years ago
Dunno — it's QuotaFull rather than ClusterFull so I created a new ticket.
Updated by Greg Farnum about 8 years ago
http://pulpito.ceph.com/gregf-2016-02-08_22:02:06-fs-greg-fs-testing-27-1---basic-mira/1009/
Also this failure seems to be consistent now, rather than very erratic. I think we busted something in our code or our environment.
Updated by Greg Farnum about 8 years ago
I'm confused — the log doesn't seem to include any of the output from the remote python script. Which makes me think we're not getting to the output points — which is possible.
So I added some debug spam, but John, it looks like this is trying to write an amount equal to the full size of the cluster, and we fsync it and expect that to pass. Perhaps that has in the past due to slow updates, but I actually wouldn't expect it to pass since any extra space used up ought to make it fail. Am I missing something here about this test just being racy and needing to be more forgiving? (And can't we just write slightly less than the full amount ahead of time, and then go over the line in the loop?)
Updated by John Spray about 8 years ago
The output is
2016-02-08T22:42:16.737 INFO:teuthology.orchestra.run.mira101:Running: 'sudo adjust-ulimits daemon-helper kill python -c \'\nimport time\nimport datetime\nimport subprocess\nimport os\n\n# Write some buffered data through before going full, all should be well\nbytes = 0\nf = os.open("/home/ubuntu/cephtest/mnt.0/full_test_file", os.O_WRONLY | os.O_CREAT)\nbytes += os.write(f, \'"\'"\'a\'"\'"\' * 4096)\nos.fsync(f)\n\n# Okay, now we\'"\'"\'re going to fill up the filesystem, and then keep\n# writing until we see an error from fsync. As long as we\'"\'"\'re doing\n# buffered IO, the error should always only appear from fsync and not\n# from write\nfull = False\n\nfor n in range(0, 32 + 1):\n try:\n bytes += os.write(f, \'"\'"\'x\'"\'"\' * 1024 * 1024)\n except OSError as e:\n print "Unexpected error %s from write() instead of fsync()" % e\n raise\n\n try:\n os.fsync(f)\n except OSError as e:\n print "Reached fullness after %.2f MB" % (bytes / (1024.0 * 1024.0))\n full = True\n break\n else:\n print "Not full yet after %.2f MB" % (bytes / (1024.0 * 1024.0))\n\n if n > 32 * 0.8:\n # Be cautious in the last region where we expect to hit\n # the full condition, so that we don\'"\'"\'t overshoot too dramatically\n time.sleep(15.0)\n\nif not full:\n raise RuntimeError("Failed to reach fullness after writing %d bytes" % bytes)\n\n# The error sticks to the inode until we dispose of it\ntry:\n os.close(f)\nexcept OSError:\n print "Saw error from close() as expected"\nelse:\n raise RuntimeError("Did not see expected error from close()")\n\nos.unlink("/home/ubuntu/cephtest/mnt.0/full_test_file")\n\'' 2016-02-08T22:42:37.258 INFO:teuthology.orchestra.run.mira005:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2016-02-08T22:42:37.263 INFO:teuthology.orchestra.run.mira101:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2016-02-08T22:43:07.284 INFO:teuthology.orchestra.run.mira005:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2016-02-08T22:43:07.288 INFO:teuthology.orchestra.run.mira101:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2016-02-08T22:43:37.308 INFO:teuthology.orchestra.run.mira005:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2016-02-08T22:43:37.312 INFO:teuthology.orchestra.run.mira101:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2016-02-08T22:43:50.904 INFO:teuthology.orchestra.run.mira101.stderr:Traceback (most recent call last): 2016-02-08T22:43:50.904 INFO:teuthology.orchestra.run.mira101.stderr: File "<string>", line 21, in <module> 2016-02-08T22:43:50.904 INFO:teuthology.orchestra.run.mira101.stderr:OSError: [Errno 28] No space left on device 2016-02-08T22:43:50.961 INFO:teuthology.orchestra.run.mira101.stderr:daemon-helper: command failed with exit status 1
(it's hard to see because it comes out at the point the command actually runs rather than when the test failure is reported from the exception)
Updated by John Spray about 8 years ago
The weirdness here is that the error is coming from os.write() rather than os.fsync().
The test expects that the client doesn't see the osdmap with the full flag apart from when it's actually trying to flush something (the client doesn't subscribe continuously). This behaviour is surprising because we're finishing our previous fsync(), so the objecter should be idle, and then before we hit the next write() we're apparently receiving an updated osdmap with the full flag set, causing a failure in write().
The other scenario that would cause this would be if we were for some reason doing a direct write instead of a buffered write.
Updated by John Spray about 8 years ago
aaaand here's why:
2016-02-09 06:42:18.603243 7ff73b7fe700 10 client.4445.objecter tick 2016-02-09 06:42:18.603261 7ff73b7fe700 2 client.4445.objecter tid 9 on osd.0 is laggy 2016-02-09 06:42:18.603268 7ff73b7fe700 10 client.4445.objecter _maybe_request_map subscribing (onetime) to next osd map
This is something I had seen on the openstack runs, but not on real hardware.
When OSDs are so slow that we have laggy ops, the clients start getting osdmaps all the time, and the subtler rules about exactly when we get things like full flags go out the window.
Updated by Greg Farnum about 8 years ago
- Assignee set to Greg Farnum
John Spray wrote:
The output is
[...](it's hard to see because it comes out at the point the command actually runs rather than when the test failure is reported from the exception)
Yeah, I was just confused because we weren't getting any of the debug outputs and I wasn't sure we were breaking out in the first write (which I guess is the case). I've got patches adding some gratuitous debugging output now.
...okay, this is a serious problem with the laggy op detection in Objecter: http://tracker.ceph.com/issues/14750
I'll run that through testing and make sure we're good. Glad our tests spotted that, even if indirectly!
Regarding the slow ops...that's troubling. I wouldn't have expected