Project

General

Profile

Actions

Bug #41753

closed

avoid page cache for krbd discard round off tests

Added by Yang Dongsheng over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When I am testing my krbd journaling code with teuthology, I found some case about krbd_discard.t failed. Then I reverted my code to do a verify, it still fail.

So I create a simple test case to test it as below:
(1) rbd create test s 4M --image-feature layering
(2) rbd map test
(3) cat truncate.t
$ xfs_io -c 'pwrite -w 0 4M' /dev/rbd0 >/dev/null
$ blkdiscard -o 512 -l 4193792 /dev/rbd0
$ hexdump /dev/rbd0 <---------
hexdump first time
0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd *
0010000 0000 0000 0000 0000 0000 0000 0000 0000 *
0400000
$ hexdump /dev/rbd0 <---------- hexdump second time
0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd *
0010000 0000 0000 0000 0000 0000 0000 0000 0000 *
0400000

(4) for i in `seq 1 200`; do cram -v truncate.t ; done

I can reproduce it with this steps easily and I found the fail are mostly happend at first hexdump.


Files

truncate_output (757 KB) truncate_output Yang Dongsheng, 09/16/2019 04:12 AM

Related issues 1 (0 open1 closed)

Copied to rbd - Backport #41915: nautilus: avoid page cache for krbd discard round off testsResolvedNathan CutlerActions
Actions #1

Updated by Ilya Dryomov over 4 years ago

  • Category set to rbd
  • Assignee set to Ilya Dryomov

Hi Dongsheng,

I tried your test case a few times but didn't see any failures.

How does it fail -- what is the output?

Which kernel are you testing on?

Actions #2

Updated by Yang Dongsheng over 4 years ago

Ilya Dryomov wrote:

Hi Dongsheng,

I tried your test case a few times but didn't see any failures.

How does it fail -- what is the output?

Which kernel are you testing on?

output:

truncate.t: failed
--- truncate.t
+++ truncate.t.err
@@ -2,6 +2,10 @@
   $ blkdiscard -o 512 -l 4193792 /dev/rbd0
   $ hexdump /dev/rbd0
   0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
+  *
+  0000200 0000 0000 0000 0000 0000 0000 0000 0000
+  *
+  0001000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
   *
   0010000 0000 0000 0000 0000 0000 0000 0000 0000
   *
# Ran 1 tests, 0 skipped, 1 failed.

That's strange, as krbd would make the discard aligned, so that part of data should not be touched.
one thing should be corrected, I said above that "all fail happened at first hexdump", but I got
one case fail both in first and second hexdump. It happend, but possibility is very very low
than only first hexdump fail. output as below:

truncate.t: failed
--- truncate.t
+++ truncate.t.err
@@ -7,12 +7,20 @@
   $ hexdump /dev/rbd0
   0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
   *
+  0000200 0000 0000 0000 0000 0000 0000 0000 0000
+  *
+  0001000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
+  *
   0010000 0000 0000 0000 0000 0000 0000 0000 0000
   *
   0400000
   $ hexdump /dev/rbd0
   0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
   *
+  0000200 0000 0000 0000 0000 0000 0000 0000 0000
+  *
+  0001000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
+  *
   0010000 0000 0000 0000 0000 0000 0000 0000 0000
   *
   0400000
# Ran 1 tests, 0 skipped, 1 failed.

And what's more, there is one case failed with different output:

truncate.t: failed
--- truncate.t
+++ truncate.t.err
@@ -7,7 +7,13 @@
   $ hexdump /dev/rbd0
   0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
   *
+  0000200 0000 0000 0000 0000 0000 0000 0000 0000
+  *
+  0001000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
+  *
   0010000 0000 0000 0000 0000 0000 0000 0000 0000
+  *
+  03ff000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
   *
   0400000
   $ hexdump /dev/rbd0
# Ran 1 tests, 0 skipped, 1 failed.

ceph version:
I am using vstart.sh (commit id: 846d6c775a09d7a6fda02aecad4ada17f8bc4a35).

kernel version:
I tried 5.3-rc1, 5.3-rc8, and this commit 0c93e1b7a26b418247218d08a6d0b95d61c9c415 (rbd: round off and ignore discards that are too small)

More information, I want to make sure the data is expected before discard, so I add a hexdump before discard as below:

  $ xfs_io -c 'pwrite -w 0 4M' /dev/rbd0 >/dev/null
  $ hexdump /dev/rbd0
  0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
  *
  0400000
  $ blkdiscard -o 512 -l 4193792 /dev/rbd0
  $ hexdump /dev/rbd0
  0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
  *
  0010000 0000 0000 0000 0000 0000 0000 0000 0000
  *
  0400000
  $ hexdump /dev/rbd0
  0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
  *
  0010000 0000 0000 0000 0000 0000 0000 0000 0000
  *
  0400000

I rerun my test with 10000 times as below, and the result attached.

for i in `seq 1 10000`; do echo "case $i" >> truncate_output; cram -v truncate.t >> truncate_output; done

Actions #3

Updated by Ilya Dryomov over 4 years ago

  • Status changed from New to In Progress

I see, this is an issue with the test. The page cache is truncated according to the original offset and length, before calling into the driver. All pages except for the first page will be dropped, but the first page may remain after being partially zeroed out (512~3584). Given that the test asserts on-disk state, we need to read directly from disk.

Actions #4

Updated by Ilya Dryomov over 4 years ago

  • Status changed from In Progress to Fix Under Review
Actions #5

Updated by Ilya Dryomov over 4 years ago

  • Project changed from Linux kernel client to rbd
  • Subject changed from krbd: read after truncate to get wrong data to avoid page cache for krbd discard round off tests
  • Category deleted (rbd)
  • Backport set to nautilus

This is going to need a backport to nautilus, so moving to rbd.

Actions #6

Updated by Ilya Dryomov over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41915: nautilus: avoid page cache for krbd discard round off tests added
Actions #8

Updated by Nathan Cutler over 4 years ago

  • Pull request ID set to 30452
Actions #9

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF