Project

General

Profile

Actions

Bug #65020

open

qa: Scrub error on inode 0x1000000356c (/volumes/qa/sv_0/2f8f6bb4-3ea9-47a0-bd79-a0f50dc149d5/client.0/tmp/clients/client7/~dmtmp/PARADOX) see mds.b log and `damage ls` output for details" in cluster log

Added by Patrick Donnelly about 2 months ago. Updated 3 days ago.

Status:
Triaged
Priority:
Urgent
Category:
fsck/damage handling
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
squid,reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
scrub
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Actions #1

Updated by Patrick Donnelly about 2 months ago

Maybe also related: https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612993

"2024-03-20T21:19:00.179790+0000 mds.c (mds.0) 1 : cluster [ERR] dir 0x10000000000 object missing on disk; some files may be lost (/dir)" in cluster log

Actions #2

Updated by Venky Shankar about 2 months ago

  • Assignee set to Milind Changire
Actions #3

Updated by Venky Shankar about 2 months ago

  • Status changed from New to Triaged
Actions #4

Updated by Milind Changire 5 days ago

Patrick Donnelly wrote in #note-1:

Maybe also related: https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612993

"2024-03-20T21:19:00.179790+0000 mds.c (mds.0) 1 : cluster [ERR] dir 0x10000000000 object missing on disk; some files may be lost (/dir)" in cluster log

This job doesn't have the log in the ignore list.

Actions #5

Updated by Venky Shankar 5 days ago

Actions #6

Updated by Milind Changire 5 days ago

Venky Shankar wrote in #note-5:

Isn't this same as: https://tracker.ceph.com/issues/48562 ?

"object missing on disk" is not the issue here

Actions #7

Updated by Milind Changire 5 days ago

Patrick Donnelly wrote:

https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612910/

and many others.

More fallout after https://github.com/ceph/ceph/pull/55455 was merged.

This is the scrub error for the PARADOX item:

"raw_stats":{
    "checked":true,
    "passed":false,
    "read_ret_val":0,
    "ondisk_value.dirstat":"f(v0 m2024-03-20T19:09:04.770954+0000 25=25+0)",
    "ondisk_value.rstat":"n(v0 rc2024-03-20T19:09:04.804841+0000 b5230239 25=24+1)",
    "memory_value.dirstat":"f(v69 m2024-03-20T19:09:04.804841+0000 24=24+0)",
    "memory_value.rstat":"n(v7 rc2024-03-20T19:09:04.804841+0000 b5230239 25=24+1)",
    "error_str":"freshly-calculated rstats don't match existing ones" 
}

So the ondisk and memory dirstat (apart from the timestamp) seem to be off by 1

Actions #8

Updated by Milind Changire 5 days ago

Milind Changire wrote in #note-6:

Venky Shankar wrote in #note-5:

Isn't this same as: https://tracker.ceph.com/issues/48562 ?

"object missing on disk" is not the issue here

Or at least the "object missing on disk" log doesn't show up in the cluster logs for the job listed in the description.

Actions #9

Updated by Venky Shankar 3 days ago

Milind Changire wrote in #note-8:

Milind Changire wrote in #note-6:

Venky Shankar wrote in #note-5:

Isn't this same as: https://tracker.ceph.com/issues/48562 ?

"object missing on disk" is not the issue here

Or at least the "object missing on disk" log doesn't show up in the cluster logs for the job listed in the description.

OK. So that was a separate issue which is fixed by adding the string to ignorelist. It got mentioned in https://tracker.ceph.com/issues/65020#note-1, so I inferred if this is the same problem.

Actions

Also available in: Atom PDF