Project

General

Profile

Actions

Bug #17193

closed

truncate can cause unflushed snapshot data lose

Added by John Spray over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Failure in test TestStrays.test_snapshot_remove
http://qa-proxy.ceph.com/teuthology/jspray-2016-08-30_12:07:21-kcephfs:recovery-master-testing-basic-mira/392448/teuthology.log

This differs from the main snapshot tests in that we do an unmount/mount between creating a snapshot and trying to read it back, so I wonder if this is a bug unmounting where we should be waiting to write back buffered data?

The sequence of operations is:

  • write some data to snapdir/subdir/file_a
  • snapshot snapdir
  • write some other data to snapdir/subdir/file_a
  • unlink snapdir/subdir/file_a and snapdir/subdir
  • unmount the client
  • mount the client again
  • read back snapdir/.snap/<snapshot>/subdir/file_a and check the original data is still there

I haven't tried reproducing this by hand outside of the automated test, that would be the next natural step.


Related issues 2 (0 open2 closed)

Related to CephFS - Bug #18211: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays) failed at data pool empty checkResolvedZheng Yan12/09/2016

Actions
Copied to CephFS - Backport #18103: jewel: truncate can cause unflushed snapshot data loseResolvedLoïc DacharyActions
Actions #1

Updated by John Spray over 7 years ago

Actions #2

Updated by John Spray over 7 years ago

  • Status changed from New to Resolved

This is no longer failing when running against the testing kernel.

Actions #3

Updated by John Spray over 7 years ago

  • Status changed from Resolved to 12
Actions #4

Updated by Zheng Yan over 7 years ago

  • Status changed from 12 to In Progress
Actions #5

Updated by Zheng Yan over 7 years ago

  • Project changed from Linux kernel client to CephFS
  • Subject changed from kclient snapshot failure to truncate can cause unflushed snapshot data lose
  • Category deleted (fs/ceph)
  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by John Spray over 7 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to jewel
Actions #7

Updated by Loïc Dachary over 7 years ago

  • Copied to Backport #18103: jewel: truncate can cause unflushed snapshot data lose added
Actions #8

Updated by John Spray over 7 years ago

  • Status changed from Pending Backport to 12
  • Assignee set to Zheng Yan

It looks like the patch hasn't eliminated the failure:
http://pulpito.ceph.com/jspray-2016-12-06_12:37:38-kcephfs:recovery-master-testing-basic-smithi/611141

Zheng, could you take another look?

Actions #9

Updated by Zheng Yan over 7 years ago

2016-12-06T13:28:03.559 INFO:tasks.cephfs_test_runner: self.assertTrue(self.fs.data_objects_absent(file_a_ino, size_mb * 1024 * 1024))

it failed at data pool empty check. it's new issue.

Actions #10

Updated by John Spray over 7 years ago

  • Status changed from 12 to Pending Backport
Actions #11

Updated by John Spray over 7 years ago

  • Related to Bug #18211: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays) failed at data pool empty check added
Actions #12

Updated by Greg Farnum over 7 years ago

So do we think this is fixed or not? Need to undo https://github.com/ceph/ceph-qa-suite/pull/1156/commits/5f1abf9c310c2732cc6bcd0ff2bd2e947dfb414e (to re-enable the test) when it's completed.

Actions #13

Updated by Nathan Cutler about 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF