truncate can cause unflushed snapshot data lose
Failure in test TestStrays.test_snapshot_remove
This differs from the main snapshot tests in that we do an unmount/mount between creating a snapshot and trying to read it back, so I wonder if this is a bug unmounting where we should be waiting to write back buffered data?
The sequence of operations is:
- write some data to snapdir/subdir/file_a
- snapshot snapdir
- write some other data to snapdir/subdir/file_a
- unlink snapdir/subdir/file_a and snapdir/subdir
- unmount the client
- mount the client again
- read back snapdir/.snap/<snapshot>/subdir/file_a and check the original data is still there
I haven't tried reproducing this by hand outside of the automated test, that would be the next natural step.
#1 Updated by John Spray over 3 years ago
Added some more debugging for this to my wip qa-suite branch https://github.com/ceph/ceph-qa-suite/pull/1156/commits/de156bd4bb162bd5b35fd1a11e472605c75f0930
#3 Updated by John Spray over 3 years ago
- Status changed from Resolved to 12
This appears to be intermittent: after ~10 runs without a failure, it's back.
#5 Updated by Zheng Yan over 3 years ago
- Project changed from Linux kernel client to fs
- Subject changed from kclient snapshot failure to truncate can cause unflushed snapshot data lose
- Category deleted (
- Status changed from In Progress to Fix Under Review
#8 Updated by John Spray over 3 years ago
- Status changed from Pending Backport to 12
- Assignee set to Zheng Yan
It looks like the patch hasn't eliminated the failure:
Zheng, could you take another look?
#12 Updated by Greg Farnum over 3 years ago
So do we think this is fixed or not? Need to undo https://github.com/ceph/ceph-qa-suite/pull/1156/commits/5f1abf9c310c2732cc6bcd0ff2bd2e947dfb414e (to re-enable the test) when it's completed.