Project

General

Profile

Actions

Bug #4364

closed

ObjectCacher: inconsistency after flatten

Added by Josh Durgin about 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
bobtail
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Using a vstart-created cluster with:

./ceph_test_librbd_fsx -l 40960 -N 1000 -S 2 -d rbd foo8

and ceph.conf including:

[client]
[client]
        keyring = /home/joshd/ceph/src/keyring
        log file = out/$name.log
        rbd cache = true
        ms inject socket failures = 500
        log max recent = 500
        log max new = 1000
        debug rbd = 20
        debug ms = 1
        debug objectcacher = 30

There's a read interpreted as starting with zeros (since the first bh is marked !exists in the cache):

truncating to largest ever: 0x774c
1 trunc from 0x0 to 0x774c
2 trunc from 0x774c to 0x8b7
3 write 0x5a51 thru     0x9fff  (0x45af bytes)
4 read  0x3036 thru     0x533f  (0x230a bytes)
5 punch from 0x1340 to 0x4d9b, (0x3a5b bytes)
8 read  0x4eca thru     0x9fff  (0x5136 bytes)
9 trunc from 0xa000 to 0x3c31
13 write        0x50a thru      0x9fff  (0x9af6 bytes)
14 punch        from 0x8db2 to 0xa000, (0x124e bytes)
16 punch        from 0x2519 to 0xa000, (0x7ae7 bytes)
18 trunc        from 0xa000 to 0x7621
19 punch        from 0x1430 to 0x7621, (0x61f1 bytes)
21 read 0x4969 thru     0x7620  (0x2cb8 bytes)
22 read 0x5211 thru     0x7620  (0x2410 bytes)
23 read 0x68b4 thru     0x7620  (0xd6d bytes)
24 write        0x1d41 thru     0x9fff  (0x82bf bytes)
27 clone        1 order 22 su 2097152 sc 2
28 trunc        from 0xa000 to 0x5b5c
29 trunc        from 0x5b5c to 0x277d
30 read 0x20f3 thru     0x277c  (0x68a bytes)
31 write        0x1a6c thru     0x9fff  (0x8594 bytes)
32 write        0x7922 thru     0x9fff  (0x26de bytes)
33 write        0x6c34 thru     0x80e2  (0x14af bytes)
35 flatten
37 read 0x807 thru      0x9fff  (0x97f9 bytes)
READ BAD DATA: offset = 0x807, size = 0x97f9, fname = foo8
OFFSET  GOOD    BAD     RANGE
0x  807 0x960d  0x0000  0x    0
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  808 0x0d1f  0x0000  0x    1
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  809 0x1f0d  0x0000  0x    2
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  80a 0x0d6d  0x0000  0x    3
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  80b 0x6d0d  0x0000  0x    4
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  80c 0x0d9b  0x0000  0x    5
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  80d 0x9b0d  0x0000  0x    6
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  80e 0x0d48  0x0000  0x    7
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  80f 0x480d  0x0000  0x    8
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  810 0x0d59  0x0000  0x    9
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  811 0x590d  0x0000  0x    a
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  812 0x0dca  0x0000  0x    b
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  813 0xca0d  0x0000  0x    c
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  814 0x0dfc  0x0000  0x    d
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  815 0xfc0d  0x0000  0x    e
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x  816 0x0ddd  0x0000  0x    f
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (37 total operations):
1(  1 mod 256): TRUNCATE UP     from 0x0 to 0x774c      ******WWWW
2(  2 mod 256): TRUNCATE DOWN   from 0x774c to 0x8b7    ******WWWW
3(  3 mod 256): WRITE    0x5a51 thru 0x9fff     (0x45af bytes) HOLE     ***WWWW
4(  4 mod 256): READ     0x3036 thru 0x533f     (0x230a bytes)
5(  5 mod 256): PUNCH    0x1340 thru 0x4d9a     (0x3a5b bytes)  ******PPPP
6(  6 mod 256): SKIPPED (no operation)
7(  7 mod 256): SKIPPED (no operation)
8(  8 mod 256): READ     0x4eca thru 0x9fff     (0x5136 bytes)
9(  9 mod 256): TRUNCATE DOWN   from 0xa000 to 0x3c31
10( 10 mod 256): SKIPPED (no operation)
11( 11 mod 256): SKIPPED (no operation)
12( 12 mod 256): SKIPPED (no operation)
13( 13 mod 256): WRITE    0x50a thru 0x9fff     (0x9af6 bytes) EXTEND   ***WWWW
14( 14 mod 256): PUNCH    0x8db2 thru 0x9fff    (0x124e bytes)
15( 15 mod 256): SKIPPED (no operation)
16( 16 mod 256): PUNCH    0x2519 thru 0x9fff    (0x7ae7 bytes)
17( 17 mod 256): SKIPPED (no operation)
18( 18 mod 256): TRUNCATE DOWN  from 0xa000 to 0x7621
19( 19 mod 256): PUNCH    0x1430 thru 0x7620    (0x61f1 bytes)
20( 20 mod 256): SKIPPED (no operation)
21( 21 mod 256): READ     0x4969 thru 0x7620    (0x2cb8 bytes)
22( 22 mod 256): READ     0x5211 thru 0x7620    (0x2410 bytes)
23( 23 mod 256): READ     0x68b4 thru 0x7620    (0xd6d bytes)
24( 24 mod 256): WRITE    0x1d41 thru 0x9fff    (0x82bf bytes) EXTEND
25( 25 mod 256): SKIPPED (no operation)
26( 26 mod 256): SKIPPED (no operation)
27( 27 mod 256): CLONE
28( 28 mod 256): TRUNCATE DOWN  from 0xa000 to 0x5b5c
29( 29 mod 256): TRUNCATE DOWN  from 0x5b5c to 0x277d
30( 30 mod 256): READ     0x20f3 thru 0x277c    (0x68a bytes)
31( 31 mod 256): WRITE    0x1a6c thru 0x9fff    (0x8594 bytes) EXTEND
32( 32 mod 256): WRITE    0x7922 thru 0x9fff    (0x26de bytes)
33( 33 mod 256): WRITE    0x6c34 thru 0x80e2    (0x14af bytes)
34( 34 mod 256): SKIPPED (no operation)
35( 35 mod 256): FLATTEN
36( 36 mod 256): SKIPPED (no operation)
37( 37 mod 256): READ     0x807 thru 0x9fff     (0x97f9 bytes)  ***RRRR***
Correct content saved for comparison
(maybe hexdump "foo8" vs "foo8.fsxgood")

Exporting the image and comparing to foo8.fsxgood shows that the actual image matches, but the cache was wrong.
I'm guessing this is a bug in flatten and cache interaction, so you'd only see it if you're using librbd directly,
but it could be a more general problem with the cache.

Client log is attached.


Files

flatten_cache_bug.log (195 KB) flatten_cache_bug.log Josh Durgin, 03/06/2013 07:14 PM
Actions #1

Updated by Josh Durgin about 11 years ago

  • Status changed from In Progress to 7

If this doesn't cause any problems, it should be backported to bobtail. Leaving in testing until then.

Actions #2

Updated by Sage Weil about 11 years ago

  • Status changed from 7 to Pending Backport
Actions #3

Updated by Josh Durgin almost 11 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF