Project

General

Profile

Bug #17562

backtrace check fails when scrubbing directory created by fsstress

Added by Zheng Yan about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

when scrubbing created by fsstress, there are lots of false backtrace errors. The bug is in inode_backtrace_t::compare. but the code is so confusing. I can't understand Greg's intention

One example of failure

[root@zhyan-desk1 mds]# ceph daemon mds.a scrub_path /fsstress/pf/d8/d62/d764/dc6d/d14b4/d20d6/d2831/d3a7f/d488/d511/d1df5/d1d00/d4821/d50b7/d1146/dc59/ repair
{
    "performed_validation": true,
    "passed_validation": false,
    "backtrace": {
        "checked": true,
        "passed": false,
        "read_ret_val": 0,
        "ondisk_value": "(269)1000000cd38:[<10000008937\/dc59 v1218>,<10000007705\/d54c v170>,<1000000723d\/d3cf v2426>,<100000051de\/d366 v2068>,<10000004b63\/db0 v4387>,<10000004b60\/d25 v3759>,<10000004a79\/d21 v4482>,<1000000002a\/d8 v2286>,<10000000000\/pf v16768>,<1\/fsstress v2348>]\/\/",
        "memoryvalue": "(269)1000000cd38:[<10000008937\/dc59 v19537>,<1000000af34\/d1146 v3524>,<10000039adc\/d50b7 v1321>,<10000019d91\/d4821 v288>,<1000000b3cc\/d1d00 v4928>,<100000086d2\/d1df5 v34596>,<100000081cd\/d511 v2529>,<10000005d8d\/d488 v28540>,<1000000947f\/d3a7f v28103>,<1000001cb01\/d2831 v19700>,<1000000a413\/d20d6 v28872>,<1000000ce1e\/d14b4 v13328>,<10000005552\/dc6d v23056>,<10000004dfd\/d764 v21852>,<10000004a79\/d62 v57237>,<1000000002a\/d8 v24004>,<10000000000\/pf v285463>,<1\/fsstress v42491>]\/\/",
        "error_str": "" 
    },
    "raw_stats": {
        "checked": true,
        "passed": true,
        "read_ret_val": 0,
        "ondisk_value.dirstat": "f(v0 m2016-10-13 17:01:16.208620 8=6+2)",
        "ondisk_value.rstat": "n(v0 rc2016-10-13 17:01:16.208620 b4457089 19=15+4)",
        "memory_value.dirrstat": "f(v0 m2016-10-13 17:01:16.208620 8=6+2)",
        "memory_value.rstat": "n(v7 rc2016-10-13 17:01:16.208620 b4457089 19=15+4)",
        "error_str": "" 
    },
    "return_code": 0
}

Breakpoint 1, inode_backtrace_t::compare (this=this@entry=0x55f3dd638c80, other=..., equivalent=equivalent@entry=0x7f08ee22690e, divergent=divergent@entry=0x7f08ee22690f)
    at /home/zhyan/Ceph/ceph-2/src/mds/inode_backtrace.cc:124
124      int min_size = MIN(ancestors.size(),other.ancestors.size());
(gdb) n
125      *divergent = false;
(gdb) 
126      if (min_size == 0)
(gdb) 
123    {
(gdb) 
129      if (ancestors[0].version > other.ancestors[0].version)
(gdb) 
130        comparator = 1;
(gdb) 
133      for (int i = 1; i < min_size; ++i) {
(gdb) 
134        if (*divergent) {
(gdb) 
141        if (ancestors[i].dirino != other.ancestors[i].dirino) {
(gdb) 
142          *equivalent = false;
(gdb) 
143          if (ancestors[i-1].version < other.ancestors[i-1].version) {
(gdb) 
147          } else if (ancestors[i-1].version > other.ancestors[i-1].version) {
(gdb) 
148            if (comparator < 0)
(gdb) 
150            return 1;
(gdb) 
168    }
(gdb) 
CInode::ValidationContinuation::_backtrace (this=0x55f3dd417ac0, rval=0) at /home/zhyan/Ceph/ceph-2/src/mds/CInode.cc:3843
3843          if (equivalent) {
(gdb) 
3846            if (divergent || memory_newer <= 0) {
(gdb) p equivalent
$1 = false
(gdb) p divergent
$2 = false

Associated revisions

Revision bbdfd137
Added by John Spray about 6 years ago

Merge pull request #11517 from ukernel/wip-17562

Fix #17562 (backtrace check fails when scrubbing directory created by fsstress)

Reviewed-by: John Spray <>
Reviewed-by: Greg Farnum <>

History

#1 Updated by Greg Farnum about 6 years ago

Well, in inode_backtrace_t::compare() we're trying to determine and return three different things, as described in the header comments.
1) Which version is newer (if either); that's the comparator and return value.
2) If the two backtraces both have the same dentries (regardless of versions on each); that's the "equivalent" pointer. (So if one says "a1/b1/c1" and the other says "a2/b2/c1" they would be equivalent.)
3) If the two backtraces have different dentries but the versions stuck on the dentries mean they shouldn't. (So if one says "a1/b1/c1" and the other says "a1/b1/d1" they would be divergent, but if the other said "a1/b2/d1" they would not.)

Does that make sense? The code is a little annoying but it's basically just walking through the dentry vectors of each backtrace and comparing their versions (until they differ) and making sure any difference is acceptable given the associated versions. Apparently it's not quite correct, but that's the idea.

#2 Updated by Greg Farnum about 6 years ago

  • Assignee changed from Greg Farnum to Zheng Yan

It looks to me like compare() is doing the right thing, but that the scrubbing code is declaring an error if the on-disk backtrace is out-of-date. (results->backtrace.passed = true; only if (equivalent)) I can't recall off-hand what the behavior we want there is, but probably it should be adjusted. (We have or had some cases where we wanted failures and flushed the journal to backing objects before scrubbing; that isn't the only case we need to handle for online scrub though!)

#3 Updated by Zheng Yan about 6 years ago

  • Status changed from New to Fix Under Review

#4 Updated by Zheng Yan about 6 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF