Bug #16879
closedscrub: inode wrongly marked free: 0x10000000002
0%
Description
I ran the "fs" testsuite on a branch that has a pile of small, userland client-side patches. One of the tests (tasks/forward-scrub.yaml ?) failed with this set of messages:
2016-07-30T19:05:32.061 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:15.132372 mon.0 [INF] fsmap e28: 1/1/1 up {0=b=up:active}, 1 up:standby 2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:18.542980 mon.0 [INF] pgmap v56: 24 pgs: 24 active+clean; 18456 kB data, 471 MB used, 2792 GB / 2793 GB avail; 551 B/s rd, 0 op/s 2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:19.544472 mon.0 [INF] pgmap v57: 24 pgs: 24 active+clean; 18456 kB data, 471 MB used, 2792 GB / 2793 GB avail; 2648 B/s rd, 378 B/s wr, 2 op/s 2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.032874 mds.0 [INF] _validate_inode_done [inode 10000000000 [2,head] /file1_sixmegs auth v8 s=6291456 n(v0 b6291456 1=1+0) (iversion lock) 0xa799160] r=0 2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.032909 mds.0 [ERR] scrub: inode wrongly marked free: 0x10000000002 2016-07-30T19:05:32.062 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.032918 mds.0 [ERR] inode table repaired for inode: 0x10000000002 2016-07-30T19:05:32.063 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.032958 mds.0 [INF] _validate_inode_done [inode 10000000002 [2,head] /file3_sixmegs auth v24 s=6291456 n(v0 b6291456 1=1+0) (iversion lock) 0xa7985c0] r=0 2016-07-30T19:05:32.064 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.033110 mds.0 [ERR] scrub: inode wrongly marked free: 0x10000000001 2016-07-30T19:05:32.064 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.033116 mds.0 [ERR] inode table repaired for inode: 0x10000000001 2016-07-30T19:05:32.064 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.033151 mds.0 [INF] _validate_inode_done [inode 10000000001 [2,head] /file2_sixmegs auth v22 s=6291456 n(v0 b6291456 1=1+0) (iversion lock) 0xa798b90] r=0 2016-07-30T19:05:32.064 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.038881 mds.0 [INF] _validate_inode_done [inode 1 [...2,head] / auth v14 snaprealm=0xa680f40 f(v0 m2016-07-31 02:04:51.155487 3=3+0) n(v0 1=0+1) (iversion lock) | request=0 lock=0 dirfrag=1 scrubqueue=0 authpin=0 0xa7945d0] r=0 2016-07-30T19:05:32.065 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:17.038887 mds.0 [INF] scrub complete with tag '' 2016-07-30T19:05:32.065 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:23.543837 mon.0 [INF] pgmap v58: 24 pgs: 24 active+clean; 18456 kB data, 471 MB used, 2792 GB / 2793 GB avail; 2866 B/s rd, 409 B/s wr, 2 op/s 2016-07-30T19:05:32.065 INFO:teuthology.orchestra.run.mira088.stdout:2016-07-31 02:05:24.547678 mon.0 [INF] pgmap v59: 24 pgs: 24 active+clean; 18456 kB data, 471 MB used, 2792 GB / 2793 GB avail 2016-07-30T19:05:32.066 DEBUG:tasks.ceph.mds.a:waiting for process to exit 2016-07-30T19:05:32.066 INFO:teuthology.orchestra.run:waiting for 300 2016-07-30T19:05:32.067 DEBUG:tasks.ceph.mds.b:waiting for process to exit 2016-07-30T19:05:32.067 INFO:teuthology.orchestra.run:waiting for 300 2016-07-30T19:05:32.107 INFO:tasks.ceph.mds.a.mira088.stdout:starting mds.a at :/0 2016-07-30T19:05:32.108 INFO:tasks.ceph.mds.b.mira088.stdout:starting mds.b at :/0 2016-07-30T19:05:32.108 INFO:tasks.ceph.mds.b.mira088.stderr:2016-07-31 02:05:17.032909 7f48816b3700 -1 log_channel(cluster) log [ERR] : scrub: inode wrongly marked free: 0x10000000002 2016-07-30T19:05:32.108 INFO:tasks.ceph.mds.b.mira088.stderr:2016-07-31 02:05:17.032917 7f48816b3700 -1 log_channel(cluster) log [ERR] : inode table repaired for inode: 0x10000000002 2016-07-30T19:05:32.108 INFO:tasks.ceph.mds.b.mira088.stderr:2016-07-31 02:05:17.033109 7f48816b3700 -1 log_channel(cluster) log [ERR] : scrub: inode wrongly marked free: 0x10000000001 2016-07-30T19:05:32.109 INFO:tasks.ceph.mds.b.mira088.stderr:2016-07-31 02:05:17.033115 7f48816b3700 -1 log_channel(cluster) log [ERR] : inode table repaired for inode: 0x10000000001
The test run in question is here:
http://pulpito.ceph.com/jlayton-2016-07-29_18:51:42-fs-wip-jlayton-nlink---basic-mira/341531/
I don't think this is anything my patchset would have broken, as it's strictly client-side changes to how capabilities are handled during lookups, and the failure seems to be related to mds-side scrubbing of damaged filesystems.
Updated by Jeff Layton over 7 years ago
Message comes from CInode::validate_disk_state, but I haven't yet been able to figure out where the test itself comes from. The yaml file just says:
tasks: - cephfs_test_runner: modules: - tasks.cephfs.test_forward_scrub
...but I can't seem to locate any reference to test_forward_scrub in ceph itself or the qa suite.
Updated by Nathan Cutler over 7 years ago
Updated by Nathan Cutler over 7 years ago
I found it by looking at the "task" function in "tasks/cephfs_test_runner.py" - it says: Run everything in tasks/cephfs/test_*.py
Updated by Jeff Layton over 7 years ago
Ahh thanks, Nathan. Ok, this is a recently-added test and my local ceph-qa-suite was missing it. A git pull fixed that. At this point, perhaps jspray can comment? Any idea why this failure would have occurred?
Updated by Jeff Layton over 7 years ago
Reran the test and it failed again: (btw: thanks Nathan for the pointer to how to filter out failures and rerun only them)
http://pulpito.ceph.com/jlayton-2016-08-01_04:07:30-fs-wip-jlayton-nlink---basic-mira/345005
The branch this is based on is a few days old though, so maybe I ought to rebase to master and retry it. It's possible this is fixed now in mainline.
Updated by Jeff Layton over 7 years ago
Rebased onto current master branch, and still seeing the error. Rerunning the test now on a branch without any of my patches to see whether it passes.
Updated by Jeff Layton over 7 years ago
This test also fails with the master branch (as of earlier this morning):
http://pulpito.ceph.com/jlayton-2016-08-01_09:13:54-fs-wip-jlayton-control---basic-mira/345193/
Updated by John Spray over 7 years ago
- Status changed from New to Fix Under Review
Updated by John Spray over 7 years ago
- Status changed from Fix Under Review to Resolved