Bug #18743: Scrub considers dirty backtraces to be damaged, puts in damage table even though it repairs - CephFS - Ceph

Actions

Copy link

Bug #18743

closed

Scrub considers dirty backtraces to be damaged, puts in damage table even though it repairs

Added by John Spray about 7 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

fsck/damage handling

Target version:

Ceph - v12.0.0

% Done:

Source:

Tags:

Backport:

luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Two things are wrong here:

When running scrub_path /teuthology-archive on the lab filesystem, I get a flurry of "bad backtrace" on very recently written files, which makes me 99% certain these are false positives for dirty metadata
The scrub code goes ahead and repairs backtraces, but proceeds to put an entry in the damage table anyway.

For example:

2017-01-31 11:50:45.992113 7f1be3c6c700  0 log_channel(cluster) log [WRN] : Scrub error on inode [inode 100221723c2 [2,head] /teuthology-archive/teuthology-2017-01-31_11:30:02-hadoop-kraken---basic-mira/768757/data/mon.2.tgz auth v8 dirtyparent s=119028 n(v0 b119028 1=1+0) (iversion lock) | dirtyparent=1 scrubqueue=0 dirty=1 0x7f1bfb354998] (/teuthology-archive/teuthology-2017-01-31_11:30:02-hadoop-kraken---basic-mira/768757/data/mon.2.tgz) see mds.mira049 log for details
2017-01-31 11:50:45.992150 7f1be3c6c700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 100221723c2 [2,head] /teuthology-archive/teuthology-2017-01-31_11:30:02-hadoop-kraken---basic-mira/768757/data/mon.2.tgz auth v8 dirtyparent s=119028 n(v0 b119028 1=1+0) (iversion lock) | dirtyparent=1 scrubqueue=0 dirty=1 0x7f1bfb354998]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0:[]\/\/","memoryvalue":"(0)100221723c2:[<100221723bf\/mon.2.tgz v8>,<100221707e0\/data v283>,<100221707df\/768757 v582>,<100011cf577\/teuthology-2017-01-31_11:30:02-hadoop-kraken---basic-mira v294521364>,<1\/teuthology-archive v139611246>]\/\/","error_str":"failed to read off disk; see retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-01-31 11:50:46.012429 7f1be3c6c700  0 log_channel(cluster) log [WRN] : bad backtrace on inode [inode 100221723c3 [...2,head] /teuthology-archive/teuthology-2017-01-31_11:30:02-hadoop-kraken---basic-mira/768757/remote/ auth v416 dirtyparent f(v0 m2017-01-31 11:42:21.894689 3=0+3) n(v13 rc2017-01-31 11:43:03.254448 b2671453 108=98+10)/n(v12 rc2017-01-31 11:43:03.254448 b2671453 108=98+10) (inest lock dirty) (iversion lock) | dirtyscattered=1 lock=0 dirfrag=1 dirtyrstat=1 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x7f1bfb354f80], rewriting it at /teuthology-archive/teuthology-2017-01-31_11:30:02-hadoop-kraken---basic-mira/768757/remote

I suspect we would reproduce this rather easily if scrubbing during normal workloads in our automated testing (http://tracker.ceph.com/issues/17856)

Related issues 1 (0 open — 1 closed)