Project

General

Profile

Actions

Bug #56605

open

Snapshot and xattr scanning in cephfs-data-scan

Added by Xiubo Li almost 2 years ago. Updated almost 2 years ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
tools
Labels (FS):
scrub, snapshots
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are doing the recovery by steps with a alternate metadata pool, more detail please see https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery.

Found that we couldn't recover the snapshot info:

For exmaple, there has a path /mnt/cephfs/mydir/myfile, and the 0th object of the myfile in data pool will have a parent xattr, which is the backtrace info of this object. And then make a snapshot in /mnt/cephfs/.snap, let's assume the snapid is a:

We can see that under the mydir, the dentry name of myfile is:

# rados -p recovery listomapkeys 1000098a1a4.00000000
myfile_head

And from the data pool we can see that the parent xattr is set for myfile:

# ./bin/rados -p cephfs.a.data listxattr 1000098a1a5.00000000
layout
parent

And then remove the myfile, the dentry name will become to:

# rados -p recovery listomapkeys 1000098a1a4.00000000
myfile_a

The myfile 0th object will lose the parent xattr:

# rados -p cephfs.a.data getxattr 1000098a1a5.00000000 parent 
error getting xattr cephfs.a.data/1000098a1a5.00000000/parent: (2) No such file or directory
./bin/rados -p cephfs.a.data listxattr 1000098a1a5.00000000
error getting xattr set cephfs.a.data/1000098a1a5.00000000: (2) No such file or directory

We can see the 1000098a1a5.00000000 object is still in the data pool:

./bin/rados -p cephfs.a.data ls
1000098a39c.00000002
10000000009.00000000
10000000b2b.00000000
1000000042d.00000000
1000098a39c.00000000
1000098a1a5.00000000
...

So when running the scan_inodes it could find a backtrace in object 1000098a1a5.00000000, and then couldn't add the myfile_a dentry to the mydir/:

cephfs-data-scan scan_inodes --alternate-pool recovery --filesystem <original filesystem name> --force-corrupt --force-init <original data pool name>

So later for the scan_links it couldn't add the snapid a to the SnapServer table:

cephfs-data-scan scan_links --filesystem recovery-fs

Is this a bug ? Or else where should we find the parent xattr ?

Actions

Also available in: Atom PDF