Project

General

Profile

Bug #17193

truncate can cause unflushed snapshot data lose

Added by John Spray about 1 year ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
09/01/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Component(FS):
Needs Doc:
No

Description

Failure in test TestStrays.test_snapshot_remove
http://qa-proxy.ceph.com/teuthology/jspray-2016-08-30_12:07:21-kcephfs:recovery-master-testing-basic-mira/392448/teuthology.log

This differs from the main snapshot tests in that we do an unmount/mount between creating a snapshot and trying to read it back, so I wonder if this is a bug unmounting where we should be waiting to write back buffered data?

The sequence of operations is:

  • write some data to snapdir/subdir/file_a
  • snapshot snapdir
  • write some other data to snapdir/subdir/file_a
  • unlink snapdir/subdir/file_a and snapdir/subdir
  • unmount the client
  • mount the client again
  • read back snapdir/.snap/<snapshot>/subdir/file_a and check the original data is still there

I haven't tried reproducing this by hand outside of the automated test, that would be the next natural step.


Related issues

Related to fs - Bug #18211: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays) failed at data pool empty check Resolved 12/09/2016
Copied to fs - Backport #18103: jewel: truncate can cause unflushed snapshot data lose Resolved

History

#1 Updated by John Spray about 1 year ago

#2 Updated by John Spray 12 months ago

  • Status changed from New to Resolved
  • Needs Doc set to No

This is no longer failing when running against the testing kernel.

#3 Updated by John Spray 11 months ago

  • Status changed from Resolved to Verified

#4 Updated by Zheng Yan 11 months ago

  • Status changed from Verified to In Progress

#5 Updated by Zheng Yan 11 months ago

  • Project changed from Linux kernel client to fs
  • Subject changed from kclient snapshot failure to truncate can cause unflushed snapshot data lose
  • Category deleted (fs/ceph)
  • Status changed from In Progress to Need Review

#6 Updated by John Spray 11 months ago

  • Status changed from Need Review to Pending Backport
  • Backport set to jewel

#7 Updated by Loic Dachary 11 months ago

  • Copied to Backport #18103: jewel: truncate can cause unflushed snapshot data lose added

#8 Updated by John Spray 11 months ago

  • Status changed from Pending Backport to Verified
  • Assignee set to Zheng Yan

It looks like the patch hasn't eliminated the failure:
http://pulpito.ceph.com/jspray-2016-12-06_12:37:38-kcephfs:recovery-master-testing-basic-smithi/611141

Zheng, could you take another look?

#9 Updated by Zheng Yan 10 months ago

2016-12-06T13:28:03.559 INFO:tasks.cephfs_test_runner: self.assertTrue(self.fs.data_objects_absent(file_a_ino, size_mb * 1024 * 1024))

it failed at data pool empty check. it's new issue.

#10 Updated by John Spray 10 months ago

  • Status changed from Verified to Pending Backport

#11 Updated by John Spray 10 months ago

  • Related to Bug #18211: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays) failed at data pool empty check added

#12 Updated by Greg Farnum 9 months ago

So do we think this is fixed or not? Need to undo https://github.com/ceph/ceph-qa-suite/pull/1156/commits/5f1abf9c310c2732cc6bcd0ff2bd2e947dfb414e (to re-enable the test) when it's completed.

#13 Updated by Nathan Cutler 9 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF