Project

General

Profile

Actions

Bug #22292

open

mds: scrub may mark repaired directory with lost dentries and not flush backtrace

Added by Patrick Donnelly over 6 years ago. Updated about 4 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
fsck/damage handling
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, qa-suite
Labels (FS):
qa, scrub
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Simple reproducer (with selected output):

$ ../src/vstart.sh -d -b -l -n
$ # mount at ./mnt
$ cp -a /usr mnt
^C
$ bin/ceph daemon mds.c flush journal
$ bin/rados --pool=cephfs_metadata_a listomapkeys 1000000001c.00000000
site-functions_head
$ bin/rados --pool=cephfs_metadata_a getxattr 1000000001c.00000000 parent
�       zsh]shareWlocaluusr+
$ bin/rados --pool=cephfs_metadata_a getxattr 1000000001d.00000000 parent
�"site-functions        zsh]shareWlocaluusr+
$ bin/rados --pool=cephfs_metadata_a rm 1000000001c.00000000
$ bin/ceph daemon mds.b scrub_path / recursive force repair
$ grep -B2 'scrub complete' < out/mds.b.log 
2017-11-30 18:11:42.798 7fe62a268700 10 log_client  logged 2017-11-30 18:11:37.052016 mds.b mds.0 127.0.0.1:6839/1510709700 1 : cluster [WRN] bad backtrace on inode 0x1000000001c(/usr/local/share/zsh), rewriting it
2017-11-30 18:11:42.798 7fe62a268700 10 log_client  logged 2017-11-30 18:11:37.052050 mds.b mds.0 127.0.0.1:6839/1510709700 2 : cluster [INF] Scrub repaired inode 0x1000000001c (/usr/local/share/zsh)
2017-11-30 18:11:42.798 7fe62a268700 10 log_client  logged 2017-11-30 18:11:37.346361 mds.b mds.0 127.0.0.1:6839/1510709700 3 : cluster [INF] scrub complete
$ bin/rados --pool=cephfs_metadata_a getxattr 1000000001c.00000000 parent
# ???? I thought scrub would flush the backtrace?
$ bin/ceph daemon mds.b flush journal
$ bin/rados --pool=cephfs_metadata_a getxattr 1000000001c.00000000 parent
�       zsh]sharellocal�usr:pdo
$ bin/rados --pool=cephfs_metadata_a listomapkeys 1000000001c.00000000
$

so "site-functions" dentry is lost but the directory is considered repaired.

Actions #1

Updated by Patrick Donnelly over 6 years ago

Consensus during scrub is that this can be resolved by adding an appropriate warning to scrub output that the directory inode was recreated and the operator needs to run cephfs-data-scan to recover the lost dentries.

Actions #2

Updated by Patrick Donnelly over 6 years ago

Greg also brought up some good points that we should also mark the directory as damaged (especially in a persistent way that survives fail-over if it doesn't already exist) and/or go read-only so that the operator must resolve the damage before allowing other I/O which could create inconsistency.

Actions #3

Updated by Patrick Donnelly about 6 years ago

  • Target version set to v14.0.0
  • Component(FS) qa-suite added
  • Labels (FS) qa added
Actions #4

Updated by Patrick Donnelly about 5 years ago

  • Target version changed from v14.0.0 to v15.0.0
Actions #5

Updated by Patrick Donnelly about 4 years ago

  • Assignee deleted (Douglas Fuller)
  • Target version deleted (v15.0.0)
Actions #6

Updated by Patrick Donnelly about 4 years ago

  • Labels (FS) scrub added
Actions

Also available in: Atom PDF