Bug #14698: Test failure: test_full_fsync (tasks.cephfs.test_full.TestQuotaFull) - CephFS - Ceph

Actions

Copy link

Bug #14698

closed

Test failure: test_full_fsync (tasks.cephfs.test_full.TestQuotaFull)

Added by Greg Farnum about 8 years ago. Updated about 8 years ago.

Status:

Duplicate

Priority:

Urgent

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

http://pulpito.ceph.com/gregf-2016-02-08_00:50:24-fs-greg-fs-testing-27-1---basic-mira/10806/
http://pulpito.ceph.com/teuthology-2016-02-05_14:03:02-fs-jewel---basic-smithi/8812/]
http://pulpito.ceph.com/gregf-2016-02-03_23:36:53-fs-greg-fs-testing-23---basic-mira/6032/
http://pulpito.ceph.com/gregf-2015-12-22_19:56:30-fs-master---basic-mira/2891/

This has been happening a lot lately, obviously, but is also very new. Not sure what we could have done to break it.

Actions

Copy link

Updated by John Spray about 8 years ago

Different to http://tracker.ceph.com/issues/14258 ?

Actions

Copy link

Updated by Greg Farnum about 8 years ago

Dunno — it's QuotaFull rather than ClusterFull so I created a new ticket.

Actions

Copy link

Updated by Greg Farnum about 8 years ago

http://pulpito.ceph.com/gregf-2016-02-08_22:02:06-fs-greg-fs-testing-27-1---basic-mira/1009/

Also this failure seems to be consistent now, rather than very erratic. I think we busted something in our code or our environment.

Actions

Copy link

Updated by Greg Farnum about 8 years ago

I'm confused — the log doesn't seem to include any of the output from the remote python script. Which makes me think we're not getting to the output points — which is possible.

So I added some debug spam, but John, it looks like this is trying to write an amount equal to the full size of the cluster, and we fsync it and expect that to pass. Perhaps that has in the past due to slow updates, but I actually wouldn't expect it to pass since any extra space used up ought to make it fail. Am I missing something here about this test just being racy and needing to be more forgiving? (And can't we just write slightly less than the full amount ahead of time, and then go over the line in the loop?)

Actions

Copy link

Updated by John Spray about 8 years ago

The output is

2016-02-08T22:42:16.737 INFO:teuthology.orchestra.run.mira101:Running: 'sudo adjust-ulimits daemon-helper kill python -c \'\nimport time\nimport datetime\nimport subprocess\nimport os\n\n# Write some buffered data through before going full, all should be well\nbytes = 0\nf = os.open("/home/ubuntu/cephtest/mnt.0/full_test_file", os.O_WRONLY | os.O_CREAT)\nbytes += os.write(f, \'"\'"\'a\'"\'"\' * 4096)\nos.fsync(f)\n\n# Okay, now we\'"\'"\'re going to fill up the filesystem, and then keep\n# writing until we see an error from fsync.  As long as we\'"\'"\'re doing\n# buffered IO, the error should always only appear from fsync and not\n# from write\nfull = False\n\nfor n in range(0, 32 + 1):\n    try:\n        bytes += os.write(f, \'"\'"\'x\'"\'"\' * 1024 * 1024)\n    except OSError as e:\n        print "Unexpected error %s from write() instead of fsync()" % e\n        raise\n\n    try:\n        os.fsync(f)\n    except OSError as e:\n        print "Reached fullness after %.2f MB" % (bytes / (1024.0 * 1024.0))\n        full = True\n        break\n    else:\n        print "Not full yet after %.2f MB" % (bytes / (1024.0 * 1024.0))\n\n    if n > 32 * 0.8:\n        # Be cautious in the last region where we expect to hit\n        # the full condition, so that we don\'"\'"\'t overshoot too dramatically\n        time.sleep(15.0)\n\nif not full:\n    raise RuntimeError("Failed to reach fullness after writing %d bytes" % bytes)\n\n# The error sticks to the inode until we dispose of it\ntry:\n    os.close(f)\nexcept OSError:\n    print "Saw error from close() as expected"\nelse:\n    raise RuntimeError("Did not see expected error from close()")\n\nos.unlink("/home/ubuntu/cephtest/mnt.0/full_test_file")\n\''
2016-02-08T22:42:37.258 INFO:teuthology.orchestra.run.mira005:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-02-08T22:42:37.263 INFO:teuthology.orchestra.run.mira101:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-02-08T22:43:07.284 INFO:teuthology.orchestra.run.mira005:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-02-08T22:43:07.288 INFO:teuthology.orchestra.run.mira101:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-02-08T22:43:37.308 INFO:teuthology.orchestra.run.mira005:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-02-08T22:43:37.312 INFO:teuthology.orchestra.run.mira101:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-02-08T22:43:50.904 INFO:teuthology.orchestra.run.mira101.stderr:Traceback (most recent call last):
2016-02-08T22:43:50.904 INFO:teuthology.orchestra.run.mira101.stderr:  File "<string>", line 21, in <module>
2016-02-08T22:43:50.904 INFO:teuthology.orchestra.run.mira101.stderr:OSError: [Errno 28] No space left on device
2016-02-08T22:43:50.961 INFO:teuthology.orchestra.run.mira101.stderr:daemon-helper: command failed with exit status 1

(it's hard to see because it comes out at the point the command actually runs rather than when the test failure is reported from the exception)

Actions

Copy link

Updated by John Spray about 8 years ago

The weirdness here is that the error is coming from os.write() rather than os.fsync().

The test expects that the client doesn't see the osdmap with the full flag apart from when it's actually trying to flush something (the client doesn't subscribe continuously). This behaviour is surprising because we're finishing our previous fsync(), so the objecter should be idle, and then before we hit the next write() we're apparently receiving an updated osdmap with the full flag set, causing a failure in write().

The other scenario that would cause this would be if we were for some reason doing a direct write instead of a buffered write.

Actions

Copy link

Updated by John Spray about 8 years ago

aaaand here's why:

2016-02-09 06:42:18.603243 7ff73b7fe700 10 client.4445.objecter tick
2016-02-09 06:42:18.603261 7ff73b7fe700  2 client.4445.objecter  tid 9 on osd.0 is laggy
2016-02-09 06:42:18.603268 7ff73b7fe700 10 client.4445.objecter _maybe_request_map subscribing (onetime) to next osd map

This is something I had seen on the openstack runs, but not on real hardware.

When OSDs are so slow that we have laggy ops, the clients start getting osdmaps all the time, and the subtler rules about exactly when we get things like full flags go out the window.

Actions

Copy link

Updated by Greg Farnum about 8 years ago

Assignee set to Greg Farnum

John Spray wrote:

The output is
[...]

(it's hard to see because it comes out at the point the command actually runs rather than when the test failure is reported from the exception)

Yeah, I was just confused because we weren't getting any of the debug outputs and I wasn't sure we were breaking out in the first write (which I guess is the case). I've got patches adding some gratuitous debugging output now.

...okay, this is a serious problem with the laggy op detection in Objecter: http://tracker.ceph.com/issues/14750

I'll run that through testing and make sure we're good. Glad our tests spotted that, even if indirectly!

Regarding the slow ops...that's troubling. I wouldn't have expected

Actions

Copy link