Actions
Bug #43543
closedmds: scrub on directory with recently created files may fail to load backtraces and report damage
% Done:
0%
Source:
Development
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
scrub
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
On a vstart cluster, copy a directory tree into CephFS and do a recursive scrub concurrently:
$ bin/ceph tell mds.a scrub start / recursive
Then in the MDS log:
2020-01-09T23:03:23.519+0000 7f88482f3700 10 MDSContext::complete: 19MDSIOContextWrapper 2020-01-09T23:03:23.519+0000 7f88482f3700 20 mds.0.cache.ino(0x1000000009f) ondisk_read_retval: -61 2020-01-09T23:03:23.519+0000 7f88482f3700 10 mds.0.cache.ino(0x1000000009f) scrub: inotable ino = 0x1000000009f 2020-01-09T23:03:23.519+0000 7f88482f3700 10 mds.0.cache.ino(0x1000000009f) scrub: inotable free says 0 2020-01-09T23:03:23.519+0000 7f88482f3700 10 MDSContext::complete: 16C_InodeValidated 2020-01-09T23:03:23.519+0000 7f88482f3700 0 log_channel(cluster) log [WRN] : Scrub error on inode 0x1000000009f (/bin/nl-neigh-delete) see mds.a log and `damage ls` output for details 2020-01-09T23:03:23.519+0000 7f88482f3700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x1000000009f [2,head] /bin/nl-neigh-delete auth v364 dirtyparent s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4305=0-4194304@1} caps={4305=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=4305 | request=0 caps=1 dirtyparent=1 scrubqueue=0 dirty=1 0x55fe1bf7f800]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(2)0x1000000009f:[<0x10000000000/nl-neigh-delete v364>,<0x1/bin v776>]//","error_str":"failed to read off disk; see retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61} 2020-01-09T23:03:23.519+0000 7f88482f3700 20 mds.0.cache.ino(0x1000000009f) scrub_finished 2020-01-09T23:03:23.519+0000 7f88482f3700 20 mds.0.cache.dir(0x10000000000) scrub_dentry_finished on dn [dentry #0x1/bin/nl-neigh-delete [2,head] auth (dversion lock) v=364 ino=0x1000000009f state=1610612736 | request=0 lock=0 inodepin=1 scrubparent=1 dirty=1 authpin=0 0x55fe1bfa6900]
Updated by Patrick Donnelly over 4 years ago
If you flush the journal:
2020-01-09T23:14:02.155+0000 7f88482f3700 10 mds.0.cache.ino(0x1000000009f) _stored_backtrace v 364 2020-01-09T23:14:02.155+0000 7f88482f3700 10 mds.0.cache.ino(0x1000000009f) auth_unpin by 0x55fe1bf7f800 on [inode 0x1000000009f [2,head] /bin/nl-neigh-delete auth v1570 dirtyparent s=13776 n(v0 rc2020-01-09T23:03:18.966105+0000 b13776 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=0 caps=0 dirtyparent=1 scrubqueue=0 dirty=1 authpin=0 0x55fe1bf7f800] now 0 2020-01-09T23:14:02.155+0000 7f88482f3700 15 mds.0.cache.dir(0x10000000000) adjust_nested_auth_pins -1 on [dir 0x10000000000 /bin/ [2,head] auth v=7257 cv=7257/0 ap=1+1602 state=1610612769|complete|committing f(v0 m2020-01-09T23:03:55.149261+0000 1793=1793+0) n(v0 rc2020-01-09T23:03:55.162878+0000 b270047911 1777=1777+0) hs=1793+0,ss=0+0 dirty=1793 | child=1 dirty=1 authpin=1 0x55fe1b94d700] by 0x55fe1bf7f800 count now 1/1602 2020-01-09T23:14:02.155+0000 7f88482f3700 10 mds.0.cache.ino(0x1000000009f) clear_dirty_parent
and then a new scrub completes without error.
2020-01-09T23:16:05.892+0000 7f88482f3700 10 MDSContext::complete: 19MDSIOContextWrapper 2020-01-09T23:16:05.892+0000 7f88482f3700 20 mds.0.cache.ino(0x1000000009f) ondisk_read_retval: 0 2020-01-09T23:16:05.892+0000 7f88482f3700 10 mds.0.cache.ino(0x1000000009f) decoded 100 bytes of backtrace successfully 2020-01-09T23:16:05.892+0000 7f88482f3700 10 mds.0.cache.ino(0x1000000009f) scrub: inotable ino = 0x1000000009f 2020-01-09T23:16:05.892+0000 7f88482f3700 10 mds.0.cache.ino(0x1000000009f) scrub: inotable free says 0 2020-01-09T23:16:05.892+0000 7f88482f3700 10 MDSContext::complete: 16C_InodeValidated 2020-01-09T23:16:05.892+0000 7f88482f3700 10 mds.0.scrubstack _validate_inode_done scrub passed on inode [inode 0x1000000009f [2,head] /bin/nl-neigh-delete auth v1570 s=13776 n(v0 rc2020-01-09T23:03:18.966105+0000 b13776 1=1+0) (iversion lock) 0x55fe1bf7f800] 2020-01-09T23:16:05.892+0000 7f88482f3700 20 mds.0.cache.ino(0x1000000009f) scrub_finished 2020-01-09T23:16:05.892+0000 7f88482f3700 20 mds.0.cache.dir(0x10000000000) scrub_dentry_finished on dn [dentry #0x1/bin/nl-neigh-delete [2,head] auth (dversion lock) v=1570 ino=0x1000000009f state=1073741824 | request=0 lock=0 inodepin=0 scrubparent=1 dirty=0 authpin=0 0x55fe1bfa6900]
Updated by Zheng Yan over 4 years ago
This issue exists since scrub is first implemented. Should be easy to fix, just ignore checking backtrace if dirty_parent is set
Updated by Patrick Donnelly over 4 years ago
- Status changed from New to Triaged
- Assignee changed from Zheng Yan to Milind Changire
- Priority changed from Urgent to Normal
Updated by Patrick Donnelly about 4 years ago
- Priority changed from Normal to High
- Target version changed from v15.0.0 to v16.0.0
- Labels (FS) scrub added
Updated by Patrick Donnelly almost 4 years ago
- Status changed from Triaged to Fix Under Review
- Backport set to octopus,nautilus
Updated by Patrick Donnelly almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45851: octopus: mds: scrub on directory with recently created files may fail to load backtraces and report damage added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45852: nautilus: mds: scrub on directory with recently created files may fail to load backtraces and report damage added
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Actions