Project

General

Profile

Actions

Bug #62925

open

cephfs-journal-tool: Add preventive measures in the tool to avoid corruting a ceph file system

Added by Prashant D 8 months ago. Updated 3 months ago.

Status:
Fix Under Review
Priority:
Urgent
Assignee:
Category:
Code Hygiene
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
reef,quincy,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
tools
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The cephfs-journal-tool should be used by expert who has the knowledge of CephFS internals. Though we have a clear warning message on https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects doc to not to use cephfs-journal-tool to reset journal without cephfs team's advice, still some users venture out to try this tools without much thought which can result in MDS crash as observed in https://tracker.ceph.com/issues/58878.

sh-4.4$ cephfs-journal-tool --rank ocs-storagecluster-cephfilesystem:0 event recover_dentries summary
Events by type:
  RESETJOURNAL: 1
Errors: 0
sh-4.4$ cephfs-journal-tool --rank ocs-storagecluster-cephfilesystem:0 journal reset
old journal was 8388608~48
new journal start will be 12582912 (4194256 bytes past old end)
writing journal head
writing EResetJournal entry
done

We should have a warning message with a prompt to continue or not when we run this tool to reset the journal. Also cephfs-journal-tool should not be run when cephfs is online or we should have a clear warning message when user attempts to run against live cephfs, mostly when "event recover_dentries summary" command to write any inodes/dentries recoverable from the journal to the RADOS store.

Actions

Also available in: Atom PDF