Bug #17193
closedtruncate can cause unflushed snapshot data lose
0%
Description
Failure in test TestStrays.test_snapshot_remove
http://qa-proxy.ceph.com/teuthology/jspray-2016-08-30_12:07:21-kcephfs:recovery-master-testing-basic-mira/392448/teuthology.log
This differs from the main snapshot tests in that we do an unmount/mount between creating a snapshot and trying to read it back, so I wonder if this is a bug unmounting where we should be waiting to write back buffered data?
The sequence of operations is:
- write some data to snapdir/subdir/file_a
- snapshot snapdir
- write some other data to snapdir/subdir/file_a
- unlink snapdir/subdir/file_a and snapdir/subdir
- unmount the client
- mount the client again
- read back snapdir/.snap/<snapshot>/subdir/file_a and check the original data is still there
I haven't tried reproducing this by hand outside of the automated test, that would be the next natural step.
Updated by John Spray over 7 years ago
Added some more debugging for this to my wip qa-suite branch https://github.com/ceph/ceph-qa-suite/pull/1156/commits/de156bd4bb162bd5b35fd1a11e472605c75f0930
Updated by John Spray over 7 years ago
- Status changed from New to Resolved
This is no longer failing when running against the testing kernel.
Updated by John Spray over 7 years ago
- Status changed from Resolved to 12
This appears to be intermittent: after ~10 runs without a failure, it's back.
Updated by Zheng Yan over 7 years ago
- Project changed from Linux kernel client to CephFS
- Subject changed from kclient snapshot failure to truncate can cause unflushed snapshot data lose
- Category deleted (
fs/ceph) - Status changed from In Progress to Fix Under Review
Updated by John Spray over 7 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to jewel
Updated by Loïc Dachary over 7 years ago
- Copied to Backport #18103: jewel: truncate can cause unflushed snapshot data lose added
Updated by John Spray over 7 years ago
- Status changed from Pending Backport to 12
- Assignee set to Zheng Yan
It looks like the patch hasn't eliminated the failure:
http://pulpito.ceph.com/jspray-2016-12-06_12:37:38-kcephfs:recovery-master-testing-basic-smithi/611141
Zheng, could you take another look?
Updated by Zheng Yan over 7 years ago
2016-12-06T13:28:03.559 INFO:tasks.cephfs_test_runner: self.assertTrue(self.fs.data_objects_absent(file_a_ino, size_mb * 1024 * 1024))
it failed at data pool empty check. it's new issue.
Updated by John Spray over 7 years ago
- Status changed from 12 to Pending Backport
Updated by John Spray over 7 years ago
- Related to Bug #18211: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays) failed at data pool empty check added
Updated by Greg Farnum over 7 years ago
So do we think this is fixed or not? Need to undo https://github.com/ceph/ceph-qa-suite/pull/1156/commits/5f1abf9c310c2732cc6bcd0ff2bd2e947dfb414e (to re-enable the test) when it's completed.
Updated by Nathan Cutler about 7 years ago
- Status changed from Pending Backport to Resolved
Updated by Nathan Cutler about 7 years ago
Re-enable test: https://github.com/ceph/ceph/pull/13200