truncate can cause unflushed snapshot data lose
Failure in test TestStrays.test_snapshot_remove
This differs from the main snapshot tests in that we do an unmount/mount between creating a snapshot and trying to read it back, so I wonder if this is a bug unmounting where we should be waiting to write back buffered data?
The sequence of operations is:
- write some data to snapdir/subdir/file_a
- snapshot snapdir
- write some other data to snapdir/subdir/file_a
- unlink snapdir/subdir/file_a and snapdir/subdir
- unmount the client
- mount the client again
- read back snapdir/.snap/<snapshot>/subdir/file_a and check the original data is still there
I haven't tried reproducing this by hand outside of the automated test, that would be the next natural step.
#1 Updated by John Spray 7 months ago
Added some more debugging for this to my wip qa-suite branch https://github.com/ceph/ceph-qa-suite/pull/1156/commits/de156bd4bb162bd5b35fd1a11e472605c75f0930
#3 Updated by John Spray 4 months ago
- Status changed from Resolved to Verified
This appears to be intermittent: after ~10 runs without a failure, it's back.
#8 Updated by John Spray 4 months ago
- Status changed from Pending Backport to Verified
- Assignee set to Zheng Yan
It looks like the patch hasn't eliminated the failure:
Zheng, could you take another look?
#12 Updated by Greg Farnum 2 months ago
So do we think this is fixed or not? Need to undo https://github.com/ceph/ceph-qa-suite/pull/1156/commits/5f1abf9c310c2732cc6bcd0ff2bd2e947dfb414e (to re-enable the test) when it's completed.