Project

General

Profile

Actions

Feature #56140

open

cephfs: tooling to identify inode (metadata) corruption

Added by Venky Shankar almost 2 years ago. Updated about 1 year ago.

Status:
Pending Backport
Priority:
Normal
Category:
fsck/damage handling
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy
Reviewed:
Affected Versions:
Component(FS):
tools
Labels (FS):
Pull request ID:

Related issues 2 (2 open0 closed)

Related to CephFS - Bug #54546: mds: crash due to corrupt inode and omap entryNewPatrick Donnelly

Actions
Copied to CephFS - Backport #59303: quincy: cephfs: tooling to identify inode (metadata) corruptionIn ProgressPatrick DonnellyActions
Actions #1

Updated by Venky Shankar almost 2 years ago

  • Subject changed from cephfs: tooling to identify inode (metadata to cephfs: tooling to identify inode (metadata) corruption
  • Category set to fsck/damage handling
  • Assignee set to Patrick Donnelly
  • Target version set to v18.0.0
  • Backport set to quincy
  • Component(FS) tools added

Tracker https://tracker.ceph.com/issues/54546 related to metadata corruption which is seen when running databases (esp. PostgreSQL) on Ceph File System. While we work on identifying the bug, we need to provide users a way to infer such corruption which might exists in the cluster (possible, if they ran PostgreSQL on CephFS).

The tool should scan the inodes in the metadata pool and alert for any anomalies in file metadata (and optionally fix it, which is a bit tricky atm), E.g.: the corruption mentioned above clobbers an inodes' "first" field (to -2,head or to a value which is not sane). Maybe, we would want to include these checks as part of file system scrub.

Actions #2

Updated by Patrick Donnelly almost 2 years ago

Venky Shankar wrote:

Tracker https://tracker.ceph.com/issues/54546 related to metadata corruption which is seen when running databases (esp. PostgreSQL) on Ceph File System. While we work on identifying the bug, we need to provide users a way to infer such corruption which might exists in the cluster (possible, if they ran PostgreSQL on CephFS).

The tool should scan the inodes in the metadata pool and alert for any anomalies in file metadata (and optionally fix it, which is a bit tricky atm), E.g.: the corruption mentioned above clobbers an inodes' "first" field (to -2,head or to a value which is not sane). Maybe, we would want to include these checks as part of file system scrub.

Normally metadata pool scans is achieved via MDS scrub. You are suggesting a separate tool?

Actions #3

Updated by Venky Shankar almost 2 years ago

Patrick Donnelly wrote:

Venky Shankar wrote:

Tracker https://tracker.ceph.com/issues/54546 related to metadata corruption which is seen when running databases (esp. PostgreSQL) on Ceph File System. While we work on identifying the bug, we need to provide users a way to infer such corruption which might exists in the cluster (possible, if they ran PostgreSQL on CephFS).

The tool should scan the inodes in the metadata pool and alert for any anomalies in file metadata (and optionally fix it, which is a bit tricky atm), E.g.: the corruption mentioned above clobbers an inodes' "first" field (to -2,head or to a value which is not sane). Maybe, we would want to include these checks as part of file system scrub.

Normally metadata pool scans is achieved via MDS scrub. You are suggesting a separate tool?

I'm fine with having this as part of scrub.

Actions #4

Updated by Venky Shankar almost 2 years ago

Posting an update here based on discussion between me, Greg and Patrick:

Short term plan: Helper script to identify corrupted files - this will be an offline script (does not require ceph-mds to be available) that would list out corrupted files. The script will not perform any remedial action on the corrupted files though. The idea is to provide the user with a tool to assess the extent of corruption in the file system and perform any backup operation if required (such as copying the file out from the file system, etc..). This tool will be maintained in the ceph repo.

Long term plan: Fix the bug (obviously!), but, at the same time assess if its really necessary to assert when certain conditions are not satisfied (such as this one in the unlink path).

Actions #5

Updated by Patrick Donnelly over 1 year ago

  • Parent task deleted (#54546)
Actions #6

Updated by Patrick Donnelly over 1 year ago

  • Related to Bug #54546: mds: crash due to corrupt inode and omap entry added
Actions #7

Updated by Patrick Donnelly over 1 year ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 47542
Actions #8

Updated by Venky Shankar over 1 year ago

  • Status changed from Fix Under Review to Resolved
Actions #9

Updated by Ken Dreyer about 1 year ago

  • Status changed from Resolved to Pending Backport

https://github.com/ceph/ceph/pull/47542 will be in reef, and we need this in quincy.

Actions #10

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59303: quincy: cephfs: tooling to identify inode (metadata) corruption added
Actions #11

Updated by Backport Bot about 1 year ago

  • Tags set to backport_processed
Actions #12

Updated by Venky Shankar about 1 year ago

Ken Dreyer wrote:

https://github.com/ceph/ceph/pull/47542 will be in reef, and we need this in quincy.

The reason this wasn't a candidate for backport is that the tool was very much standalone and not packaged. Users using the tool directly fetched it from the ceph repo (although care is taken to have the tool work with older ceph versions - p, q and now r).

Actions

Also available in: Atom PDF