Project

General

Profile

Actions

Bug #54557

open

scrub repair does not clear earlier damage health status

Added by Milind Changire about 2 years ago. Updated 5 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
fsck/damage handling
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
backport_processed
Backport:
reef,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
scrub, task(easy)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From Chris Palmer on cpeh-users.ceph.io mailing list ...

Reading this thread made me realise I had overlooked cephfs scrubbing, so i tried it on a small 16.2.7 cluster. The normal forward scrub showed nothing. However "ceph tell mds.0 scrub start ~mdsdir recursive" did find one backtrace error (putting the cluster into HEALTH_ERR). I then did a repair which according to the log did rewrite the inode, and subsequent scrubs have not found it.

However the cluster health is still ERR, and the MDS still shows the damage:

ceph@xxxx1:~$ ceph tell mds.0 damage ls 
2022-03-12T18:42:01.609+0000 7f1b817fa700  0 client.173985213 ms_handle_reset on v2:192.168.80.121:6824/939134894
2022-03-12T18:42:01.625+0000 7f1b817fa700  0 client.173985219 ms_handle_reset on v2:192.168.80.121:6824/939134894
[
    {
        "damage_type": "backtrace",
        "id": 3308827822,
        "ino": 256,
        "path": "~mds0" 
    }
]

What are the right steps from here? Has the error actually been corrected but just needs clearing or is it still there?

In case it is relevant: there is one active and two standby MDS. The log is from the node currently hosting rank 0.
From the mds log:

2022-03-12T18:13:41.593+0000 7f61d30c1700  1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive]} (starting...)
2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0
2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0]
2022-03-12T18:13:41.601+0000 7f61cb0b1700  0 log_channel(cluster) log [WRN] : Scrub error on inode 0x100 (~mds0) see mds.xxxx1 log and `damage ls` output for details
2022-03-12T18:13:41.601+0000 7f61cb0b1700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x100 [...2,head] ~mds0/ auth v6798 ap=1 snaprealm=0x55d59548
4800 f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest lock) (iversion lock) | dirtysca
ttered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=1 scrubqueue=0 0x55d595486000]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked" 
:true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"ch
ecked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","mem
ory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","error_str":""},"return_code":-61}
2022-03-12T18:13:41.601+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
2022-03-12T18:13:45.317+0000 7f61cf8ba700  0 log_channel(cluster) log [INF] : scrub summary: idle

2022-03-12T18:13:52.881+0000 7f61d30c1700  1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive,repair]} (starting...)
2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0
2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0]
2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x100(~mds0), rewriting it
2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : Scrub repaired inode 0x100 (~mds0)
2022-03-12T18:13:52.881+0000 7f61cb0b1700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x100 [...2,head] ~mds0/ auth v6798 ap=1 snaprealm=0x55d595484800 DIRTYPARENT f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirtyparent=1 dirty=1 authpin=1 scrubqueue=0 0x55d595486000]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","memory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","error_str":""},"return_code":-61}
2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
2022-03-12T18:13:55.317+0000 7f61cf8ba700  0 log_channel(cluster) log [INF] : scrub summary: idle

2022-03-12T18:14:12.608+0000 7f61d30c1700  1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive,repair]} (starting...)
2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0
2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0]
2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
2022-03-12T18:14:15.316+0000 7f61cf8ba700  0 log_channel(cluster) log [INF] : scrub summary: idle


Related issues 2 (1 open1 closed)

Copied to CephFS - Backport #63810: reef: scrub repair does not clear earlier damage health statusResolvedNeeraj Pratap SinghActions
Copied to CephFS - Backport #63811: quincy: scrub repair does not clear earlier damage health statusIn ProgressNeeraj Pratap SinghActions
Actions

Also available in: Atom PDF