Project

General

Profile

Actions

Bug #16879

closed

scrub: inode wrongly marked free: 0x10000000002

Added by Jeff Layton over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I ran the "fs" testsuite on a branch that has a pile of small, userland client-side patches. One of the tests (tasks/forward-scrub.yaml ?) failed with this set of messages:

2016-07-30T19:05:32.061 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:15.132372 mon.0 [INF] fsmap e28: 1/1/1 up {0=b=up:active}, 1 up:standby
2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:18.542980 mon.0 [INF] pgmap v56: 24 pgs: 24 active+clean; 18456 kB data, 471 MB used, 2792 GB / 2793 GB avail; 551 B/s rd, 0 op/s
2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:19.544472 mon.0 [INF] pgmap v57: 24 pgs: 24 active+clean; 18456 kB data, 471 MB used, 2792 GB / 2793 GB avail; 2648 B/s rd, 378 B/s wr, 2 op/s
2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.032874 mds.0 [INF] _validate_inode_done [inode 10000000000 [2,head] /file1_sixmegs auth v8 s=6291456 n(v0 b6291456 1=1+0) (iversion lock) 0xa799160] r=0
2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.032909 mds.0 [ERR] scrub: inode wrongly marked free: 0x10000000002
2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.032918 mds.0 [ERR] inode table repaired for inode: 0x10000000002
2016-07-30T19:05:32.063 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.032958 mds.0 [INF] _validate_inode_done [inode 10000000002 [2,head] /file3_sixmegs auth v24 s=6291456 n(v0 b6291456 1=1+0) (iversion lock) 0xa7985c0] r=0
2016-07-30T19:05:32.064 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.033110 mds.0 [ERR] scrub: inode wrongly marked free: 0x10000000001
2016-07-30T19:05:32.064 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.033116 mds.0 [ERR] inode table repaired for inode: 0x10000000001
2016-07-30T19:05:32.064 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.033151 mds.0 [INF] _validate_inode_done [inode 10000000001 [2,head] /file2_sixmegs auth v22 s=6291456 n(v0 b6291456 1=1+0) (iversion lock) 0xa798b90] r=0
2016-07-30T19:05:32.064 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.038881 mds.0 [INF] _validate_inode_done [inode 1 [...2,head] / auth v14 snaprealm=0xa680f40 f(v0 m2016-07-31 02:04:51.155487 3=3+0) n(v0 1=0+1) (iversion lock) | request=0 lock=0 dirfrag=1 scrubqueue=0 authpin=0 0xa7945d0] r=0
2016-07-30T19:05:32.065 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.038887 mds.0 [INF] scrub complete with tag ''
2016-07-30T19:05:32.065 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:23.543837 mon.0 [INF] pgmap v58: 24 pgs: 24 active+clean; 18456 kB data, 471 MB used, 2792 GB / 2793 GB avail; 2866 B/s rd, 409 B/s wr, 2 op/s
2016-07-30T19:05:32.065 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:24.547678 mon.0 [INF] pgmap v59: 24 pgs: 24 active+clean; 18456 kB data, 471 MB used, 2792 GB / 2793 GB avail
2016-07-30T19:05:32.066 DEBUG:tasks.ceph.mds.a:waiting for process to exit
2016-07-30T19:05:32.066 INFO:teuthology.orchestra.run:waiting for 300
2016-07-30T19:05:32.067 DEBUG:tasks.ceph.mds.b:waiting for process to exit
2016-07-30T19:05:32.067 INFO:teuthology.orchestra.run:waiting for 300
2016-07-30T19:05:32.107 INFO:tasks.ceph.mds.a.mira088.stdout:starting mds.a at :/0
2016-07-30T19:05:32.108 INFO:tasks.ceph.mds.b.mira088.stdout:starting mds.b at :/0
2016-07-30T19:05:32.108 INFO:tasks.ceph.mds.b.mira088.stderr:2016-07-31 02:05:17.032909 7f48816b3700 -1 log_channel(cluster) log [ERR] : scrub: inode wrongly marked free: 0x10000000002
2016-07-30T19:05:32.108 INFO:tasks.ceph.mds.b.mira088.stderr:2016-07-31 02:05:17.032917 7f48816b3700 -1 log_channel(cluster) log [ERR] : inode table repaired for inode: 0x10000000002
2016-07-30T19:05:32.108 INFO:tasks.ceph.mds.b.mira088.stderr:2016-07-31 02:05:17.033109 7f48816b3700 -1 log_channel(cluster) log [ERR] : scrub: inode wrongly marked free: 0x10000000001
2016-07-30T19:05:32.109 INFO:tasks.ceph.mds.b.mira088.stderr:2016-07-31 02:05:17.033115 7f48816b3700 -1 log_channel(cluster) log [ERR] : inode table repaired for inode: 0x10000000001

The test run in question is here:

http://pulpito.ceph.com/jlayton-2016-07-29_18:51:42-fs-wip-jlayton-nlink---basic-mira/341531/

I don't think this is anything my patchset would have broken, as it's strictly client-side changes to how capabilities are handled during lookups, and the failure seems to be related to mds-side scrubbing of damaged filesystems.

Actions #1

Updated by Jeff Layton over 7 years ago

Message comes from CInode::validate_disk_state, but I haven't yet been able to figure out where the test itself comes from. The yaml file just says:

tasks:
  - cephfs_test_runner:
      modules:
        - tasks.cephfs.test_forward_scrub

...but I can't seem to locate any reference to test_forward_scrub in ceph itself or the qa suite.

Actions #3

Updated by Nathan Cutler over 7 years ago

I found it by looking at the "task" function in "tasks/cephfs_test_runner.py" - it says: Run everything in tasks/cephfs/test_*.py

Actions #4

Updated by Jeff Layton over 7 years ago

Ahh thanks, Nathan. Ok, this is a recently-added test and my local ceph-qa-suite was missing it. A git pull fixed that. At this point, perhaps jspray can comment? Any idea why this failure would have occurred?

Actions #5

Updated by Jeff Layton over 7 years ago

Reran the test and it failed again: (btw: thanks Nathan for the pointer to how to filter out failures and rerun only them)

http://pulpito.ceph.com/jlayton-2016-08-01_04:07:30-fs-wip-jlayton-nlink---basic-mira/345005

The branch this is based on is a few days old though, so maybe I ought to rebase to master and retry it. It's possible this is fixed now in mainline.

Actions #6

Updated by Jeff Layton over 7 years ago

Rebased onto current master branch, and still seeing the error. Rerunning the test now on a branch without any of my patches to see whether it passes.

Actions #7

Updated by Jeff Layton over 7 years ago

This test also fails with the master branch (as of earlier this morning):

http://pulpito.ceph.com/jlayton-2016-08-01_09:13:54-fs-wip-jlayton-control---basic-mira/345193/
Actions #8

Updated by John Spray over 7 years ago

  • Status changed from New to Fix Under Review
Actions #9

Updated by John Spray over 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF