Project

General

Profile

Actions

Bug #989

closed

rstats and fragstat messages in mds logs

Added by Brian Chrisman about 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'd like to track down what's going wrong here. These messages generate about 500MB of mds log file while I'm copying around 4.1GB of files onto the cluster (kernel client).
I'm trying to get our IT group to setup a DMZ here to allow remote access.

2011-04-06 22:01:47.648612 waiting_locks --
2011-04-06 22:01:47.648630 2011-04-06 22:01:47.648863 7f6cd6122710 mds0.server handle_client_file_setlock: start: 0, length: 0, client: 4113, pid: 763, type: 4
2011-04-06 22:01:47.648879 2011-04-06 22:01:47.648889 7f6cd6122710 mds0.server state prior to lock change: ceph_lock_state_t. held_locks.size()=1, waiting_locks.size()=0, client_held_lock_counts -- {4113=1}
client_waiting_lock_counts -- {}
held_locks -- start: 1073741826, length: 510, client: 4113, pid: 763, type: 1
2011-04-06 22:01:47.648916 waiting_locks --
2011-04-06 22:01:47.648922 2011-04-06 22:01:47.648928 7f6cd6122710 mds0.server got unlock
2011-04-06 22:01:47.648968 7f6cd6122710 mds0.server state after lock change: ceph_lock_state_t. held_locks.size()=0, waiting_locks.size()=0, client_held_lock_counts -- {}
client_waiting_lock_counts -- {}
held_locks --
waiting_locks --
2011-04-06 22:01:47.648978 2011-04-06 22:01:48.947122 7f6cd6122710 mds0.cache.dir(602) mismatch between head items and fnode.fragstat! printing dentries
2011-04-06 22:01:48.947188 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/1000000012c [2,head] auth (dversion lock) v=2411 inode=0x7f6cd85dc910 state=new | inodepin dirty 0x7f6cc8346348]
2011-04-06 22:01:48.947200 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000134 [2,head] auth (dversion lock) v=2413 inode=0x7f6cd85e4980 state=new | inodepin dirty 0x7f6cc8345838]
2011-04-06 22:01:48.947211 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000135 [2,head] auth (dversion lock) v=2418 inode=0x7f6cd85de2c0 state=new | inodepin dirty 0x7f6cc8345be8]
2011-04-06 22:01:48.947223 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000136 [2,head] auth (dversion lock) pv=2420 v=2414 ap=0+1 inode=0x7f6cd85deb50 state=new | inodepin dirty 0x7f6cc8343ab8]
2011-04-06 22:01:48.947234 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000137 [2,head] auth (dversion lock) v=2415 inode=0x7f6cd863a2f0 state=new | inodepin dirty 0x7f6cc83422c0]
2011-04-06 22:01:48.947245 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000138 [2,head] auth (dversion lock) v=2417 inode=0x7f6cd85d48a0 state=new | inodepin dirty 0x7f6cc8341b60]
2011-04-06 22:01:48.947253 7f6cd6122710 mds0.cache.dir(602) mismatch between child accounted_rstats and my rstats!
2011-04-06 22:01:48.947273 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/1000000012c [2,head] auth (dversion lock) v=2411 inode=0x7f6cd85dc910 state=new | inodepin dirty 0x7f6cc8346348] n(v0 1=1+0)
2011-04-06 22:01:48.947285 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000134 [2,head] auth (dversion lock) v=2413 inode=0x7f6cd85e4980 state=new | inodepin dirty 0x7f6cc8345838] n(v0 b153 1=1+0)
2011-04-06 22:01:48.947296 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000135 [2,head] auth (dversion lock) v=2418 inode=0x7f6cd85de2c0 state=new | inodepin dirty 0x7f6cc8345be8] n(v0 1=1+0)
2011-04-06 22:01:48.947308 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000136 [2,head] auth (dversion lock) pv=2420 v=2414 ap=0+1 inode=0x7f6cd85deb50 state=new | inodepin dirty 0x7f6cc8343ab8] n(v0 1=1+0)
2011-04-06 22:01:48.947319 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000137 [2,head] auth (dversion lock) v=2415 inode=0x7f6cd863a2f0 state=new | inodepin dirty 0x7f6cc83422c0] n(v0 1=1+0)
2011-04-06 22:01:48.947330 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000138 [2,head] auth (dversion lock) v=2417 inode=0x7f6cd85d48a0 state=new | inodepin dirty 0x7f6cc8341b60] n(v0 1=1+0)
2011-04-06 22:01:48.947431 7f6cd6122710 mds0.cache.dir(602) mismatch between head items and fnode.fragstat! printing dentries
2011-04-06 22:01:48.947448 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/1000000012c [2,head] auth (dversion lock) v=2411 inode=0x7f6cd85dc910 state=new | inodepin dirty 0x7f6cc8346348]
2011-04-06 22:01:48.947459 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000134 [2,head] auth (dversion lock) v=2413 inode=0x7f6cd85e4980 state=new | inodepin dirty 0x7f6cc8345838]
2011-04-06 22:01:48.947470 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000135 [2,head] auth (dversion lock) v=2418 inode=0x7f6cd85de2c0 state=new | inodepin dirty 0x7f6cc8345be8]
2011-04-06 22:01:48.947481 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000136 [2,head] auth (dversion lock) pv=2420 v=2414 ap=0+1 inode=0x7f6cd85deb50 state=new | inodepin dirty 0x7f6cc8343ab8]
2011-04-06 22:01:48.947521 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000137 [2,head] auth (dversion lock) pv=2422 v=2415 ap=0+1 inode=0x7f6cd863a2f0 state=new | inodepin dirty 0x7f6cc83422c0]
2011-04-06 22:01:48.947532 7f6cd6122710 mds0.cache.dir(602) [dentry #100/stray2/10000000138 [2,head] auth (dversion lock) v=2417 inode=0x7f6cd85d48a0 state=new | inodepin dirty 0x7f6cc8341b60]
2011-04-06 22:01:48.947539 7f6cd6122710 mds0.cache.dir(602) mismatch between child accounted_rstats and my rstats!


Files

mdslog.gz (3.14 KB) mdslog.gz Brian Chrisman, 04/08/2011 04:26 PM
mds.debug20.check_rstats.log.1.bz2 (2.55 MB) mds.debug20.check_rstats.log.1.bz2 Brian Chrisman, 04/11/2011 06:43 PM
mds.debug20.check_rstats.log.2.bz2 (2.47 MB) mds.debug20.check_rstats.log.2.bz2 Brian Chrisman, 04/11/2011 06:43 PM
Actions #1

Updated by Sage Weil about 13 years ago

This may be a dup of #966.. we may want to fix that one first (it's easy to reproduce).

Actions #2

Updated by Greg Farnum about 13 years ago

  • Assignee set to Greg Farnum

Snagging this on the assumption it's the same as #966, which I'll be looking at today.

Actions #3

Updated by Greg Farnum about 13 years ago

Well, I was unable to reproduce #966. Maybe Sage can at some point, but in the meantime let's look at this one. You're using a single MDS, correct? What version of the code? (git hash, please!)

It looks like the log you've put in the description is just a selection of excerpts, right? (I'm noticing the second+ delays between some lines.) Basically we need to get a log with good debugging output and then trace the directory in question to figure out where it's going wrong. :/

Actions #4

Updated by Brian Chrisman about 13 years ago

That excerpt was basically where the log first started going into the rstats dump output.
I'll grab the commit id... anything I can do to track down which directory we need to debug?

We're still working on getting a DMZ port opened.

Actions #5

Updated by Greg Farnum about 13 years ago

2011-04-06 22:01:47.648978 2011-04-06 22:01:48.947122 7f6cd6122710 mds0.cache.dir(602) mismatch between head items and fnode.fragstat! printing dentries

That prefix there consists of the date/time, the thread id, and the object whose function is printing out the error. So it's a CDir with ino 602. All the dentry printouts include their path, which tells us that ino 602 is the "#100/stray2" (where #100 is the ino of its parent directory). The stray directories are created by the MDS to keep track of unlinked inodes that clients still have caps on (so they potentially remain open), and I happen to know that ino #100 is the mds0 data directory inode. :)
It looks like there are only 6 entries in the directory so hopefully it should be pretty easy to find the problem: at some point a modification occurred and it failed to propagate .

And I just realized that this error is printing out with debug 1 but the parent dir doesn't print out its stats until debug 20. That'll make things rather harder. Bah. If you do run it again you should do something like this to the MDS code:

diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc
index 625e295..140bb9c 100644
--- a/src/mds/CDir.cc
+++ b/src/mds/CDir.cc
@@ -215,6 +215,10 @@ bool CDir::check_rstats()
            << " fnode.fragstat.nsubdirs=" << fnode.fragstat.nsubdirs << dendl;
   if(!(get_num_head_items()==
       (fnode.fragstat.nfiles + fnode.fragstat.nsubdirs))) {
+    dout(1) << "get_num_head_items() = " << get_num_head_items()
+             << "; fnode.fragstat.nfiles=" << fnode.fragstat.nfiles
+             << " fnode.fragstat.nsubdirs=" << fnode.fragstat.nsubdirs << dendl;
+
     dout(1) << "mismatch between head items and fnode.fragstat! printing dentries" << dendl;
     for (map_t::iterator i = items.begin(); i != items.end(); ++i) {
       //if (i->second->get_linkage()->is_primary())
@@ -237,6 +241,9 @@ bool CDir::check_rstats()
   if ((!(sub_info.rbytes == fnode.rstat.rbytes)) ||
       (!(sub_info.rfiles == fnode.rstat.rfiles)) ||
       (!(sub_info.rsubdirs == fnode.rstat.rsubdirs))) {
+    dout(1) << "total of child dentrys: " << sub_info << dendl;
+    dout(1) << "my rstats:              " << fnode.rstat << dendl;
+
     dout(1) << "mismatch between child accounted_rstats and my rstats!" << dendl;
     for (map_t::iterator i = items.begin(); i != items.end(); ++i) {
       if (i->second->get_linkage()->is_primary()) {

Actions #6

Updated by Brian Chrisman about 13 years ago

I'll test this... the bigger issue I've found is that the actual problem I was investigating when I saw these log files was an NFS problem (with the current changes in master). I'd like to track this down first since there's a clear mechanism, so I'll compile with the patch you listed above and reproduce (assuming this will apply cleanly to master).
(I can copy the same files with scp to the node with the kernel client mounted without problem, but copying via nfs to the same node is extremely slow and then stalls at about 200MB... which is probably more dependent on files/inodes/caching/mds-issues than actual file sizes.)

Actions #7

Updated by Greg Farnum about 13 years ago

I actually just pushed a slightly cleaner one to master so if you're pulling on a regular basis no need to patch. :)

Based on previous bugs like this, I expect once we get good logs it'll just take a while banging against the logs that will result in a pretty simple fix crosses fingers, so hopefully it won't take too long!

Actions #8

Updated by Brian Chrisman about 13 years ago

I see more debug messages here now. This looks like one cycle of it, but I can turn my debugging up higher if necessary. We're seeing real problems with NFS (copy stalling), but it seems best to handle one thing at a time.

2011-04-08 23:01:05.709252 7f30ddc99710 mds0.7 cluster recovered.
2011-04-08 23:01:11.058398 7f30ddc99710 mds0.cache.dir(100) mismatch between head items and fnode.fragstat! printing dentries
2011-04-08 23:01:11.058451 7f30ddc99710 mds0.cache.dir(100) get_num_head_items() = 11; fnode.fragstat.nfiles=2 fnode.fragstat.nsubdirs=10
2011-04-08 23:01:11.058479 7f30ddc99710 mds0.cache.dir(100) [dentry #100/journal [2,head] auth (dversion lock) v=12979 inode=0x7f30cc751900 | inodepin dirty 0x7f30a50e87f0]
2011-04-08 23:01:11.058491 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray0 [2,head] auth (dversion lock) pv=13698 v=13385 ap=0+2 inode=0x7f30cc800060 | inodepin dirty 0x7f30a48f7020]
2011-04-08 23:01:11.058502 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray1 [2,head] auth (dversion lock) v=13386 inode=0x7f30cc883ac0 | inodepin dirty 0x7f30a4958440]
2011-04-08 23:01:11.058513 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray2 [2,head] auth (dversion lock) v=13415 ap=0+2 inode=0x7f30d000d7f0 | inodepin dirty 0x7f30a4012488]
2011-04-08 23:01:11.058524 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray3 [2,head] auth (dversion lock) pv=13693 v=13411 ap=0+2 inode=0x7f30dc02c530 | inodepin dirty 0x7f30e00e87d8]
2011-04-08 23:01:11.058565 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray4 [2,head] auth (dversion lock) pv=13694 v=13412 ap=0+2 inode=0x7f30a41684e0 | inodepin dirty 0x7f30a4251b20]
2011-04-08 23:01:11.058577 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray5 [2,head] auth (dversion lock) pv=13692 v=13400 ap=0+26 inode=0x7f30dc05bf40 | inodepin dirty 0x7f30a42a5028]
2011-04-08 23:01:11.058589 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray6 [2,head] auth (dversion lock) pv=13691 v=13402 ap=0+109 inode=0x7f30e01b0a60 | inodepin dirty 0x7f30a44218f0]
2011-04-08 23:01:11.058600 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray7 [2,head] auth (dversion lock) pv=13695 v=13382 ap=0+2 inode=0x7f30a4583140 | inodepin dirty 0x7f30a449fa00]
2011-04-08 23:01:11.058611 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray8 [2,head] auth (dversion lock) pv=13696 v=13383 ap=0+2 inode=0x7f30a4523490 | inodepin dirty 0x7f30a481b7c8]
2011-04-08 23:01:11.058622 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray9 [2,head] auth (dversion lock) pv=13697 v=13384 ap=0+2 inode=0x7f30cc6fc790 | inodepin dirty 0x7f30a488dcc8]
2011-04-08 23:01:11.058632 7f30ddc99710 mds0.cache.dir(100) mismatch between child accounted_rstats and my rstats!
2011-04-08 23:01:11.058653 7f30ddc99710 mds0.cache.dir(100) total of child dentrys: n(v0 rc2011-04-08 22:22:21.151356 b-183551 2171=1+2170)
2011-04-08 23:01:11.058663 7f30ddc99710 mds0.cache.dir(100) my rstats: n(v314 rc2011-04-08 22:48:24.641464 b-183551 2172=2+2170)
2011-04-08 23:01:11.058675 7f30ddc99710 mds0.cache.dir(100) [dentry #100/journal [2,head] auth (dversion lock) v=12979 inode=0x7f30cc751900 | inodepin dirty 0x7f30a50e87f0] n(v0 1=1+0)
2011-04-08 23:01:11.058688 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray0 [2,head] auth (dversion lock) pv=13698 v=13385 ap=0+2 inode=0x7f30cc800060 | inodepin dirty 0x7f30a48f7020] n(v1 rc2011-04-08 22:13:06.327309 b-22675 2=0+2)
2011-04-08 23:01:11.058701 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray1 [2,head] auth (dversion lock) v=13386 inode=0x7f30cc883ac0 | inodepin dirty 0x7f30a4958440] n(v1 rc2011-04-08 22:13:39.171658 b-23201 2=0+2)
2011-04-08 23:01:11.058714 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray2 [2,head] auth (dversion lock) v=13415 ap=0+2 inode=0x7f30d000d7f0 | inodepin dirty 0x7f30a4012488] n(v4 rc2011-04-08 22:20:52.603020 b-19051 657=0+657)
2011-04-08 23:01:11.058728 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray3 [2,head] auth (dversion lock) pv=13693 v=13411 ap=0+2 inode=0x7f30dc02c530 | inodepin dirty 0x7f30e00e87d8] n(v3 rc2011-04-08 22:22:16.684989 b-10674 960=0+960)
2011-04-08 23:01:11.058741 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray4 [2,head] auth (dversion lock) pv=13694 v=13412 ap=0+2 inode=0x7f30a41684e0 | inodepin dirty 0x7f30a4251b20] n(v3 rc2011-04-08 22:22:21.151356 b-8433 62=0+62)
2011-04-08 23:01:11.058755 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray5 [2,head] auth (dversion lock) pv=13692 v=13400 ap=0+26 inode=0x7f30dc05bf40 | inodepin dirty 0x7f30a42a5028] n(v1 rc2011-04-08 22:10:06.712412 b-3647 112=0+112)
2011-04-08 23:01:11.058768 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray6 [2,head] auth (dversion lock) pv=13691 v=13402 ap=0+109 inode=0x7f30e01b0a60 | inodepin dirty 0x7f30a44218f0] n(v1 rc2011-04-08 22:11:26.714829 b-9646 369=0+369)
2011-04-08 23:01:11.058782 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray7 [2,head] auth (dversion lock) pv=13695 v=13382 ap=0+2 inode=0x7f30a4583140 | inodepin dirty 0x7f30a449fa00] n(v1 rc2011-04-08 22:11:54.761663 b-42078 2=0+2)
2011-04-08 23:01:11.058795 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray8 [2,head] auth (dversion lock) pv=13696 v=13383 ap=0+2 inode=0x7f30a4523490 | inodepin dirty 0x7f30a481b7c8] n(v1 rc2011-04-08 22:12:19.347959 b-22802 2=0+2)
2011-04-08 23:01:11.058808 7f30ddc99710 mds0.cache.dir(100) [dentry #100/stray9 [2,head] auth (dversion lock) pv=13697 v=13384 ap=0+2 inode=0x7f30cc6fc790 | inodepin dirty 0x7f30a488dcc8] n(v1 rc2011-04-08 22:12:44.889210 b-21344 2=0+2)

Actions #9

Updated by Brian Chrisman about 13 years ago

Full section of othat log

Actions #10

Updated by Greg Farnum about 13 years ago

The more I look at this the less sense it makes. The only things in the mds dir are the journal and the stray dirs. Yet somehow the mds directory ends up thinking it has 2 files. Despite this the size of the data matches up correctly with the children's stats -- yet all the sizes are negative?

I'm not at all sure the copy is the thing that caused this problem -- it's more likely that the copy just pops up the debugging because it call check_rstats a lot, but the place where it broke must be somewhere else. Have you been able to reproduce this on a clean fs? (I tried and couldn't with a built linux source tree.)

You are only using 1 MDS, right?

Actions #11

Updated by Brian Chrisman about 13 years ago

This may be something popping out of the MDS changes intended for NFS support (inode lookup).
I'm going to retest by copying the same data set over via scp to ceph-kernel-mount.

There were a lot of other variables changing with this test as well (previously we were on 0.25).

Some more context:
3 node cluster, all nodes have all daemons with 1 mds active.
One node is exporting via NFS for testing purposes.
cp of a build tree is triggering the issue.
My apologies for not mentioning this earlier, as we'd run into a few other issues moving to the latest master.

Actions #12

Updated by Greg Farnum about 13 years ago

Well the MDS directory is a permanent thing -- once it's broken it's going to stay broken. Ideally we'd reproduce with lots of debugging on a fresh cluster. If we can't reproduce it we'll have to do code inspection, which is maybe possible but we'll need to know as much of what the cluster was doing as possible.

...which makes me realize. This is an old cluster previously created on an earlier version of Ceph? That's the problem, then...the rstats on some special inodes (like the MDS system inodes) weren't being initialized correctly. We'll have to hack together something that will fix them for upgrade users. Hmm.

Actions #13

Updated by Brian Chrisman about 13 years ago

Every test I run is on a freshly installed cluster.
We've automated all the underlying stuff such that it's pretty quick.

On the downside, that means that logfiles aren't always sticking around. :)
I'll have an scp test result in an hour or two.

Actions #14

Updated by Greg Farnum about 13 years ago

Okay. An old filesystem wouldn't have explained the negative rstats anyway.

Let me know what you come up with!

Updated by Brian Chrisman about 13 years ago

The log file up until a ways after the first check_rstats problem was just over 5MB, so I split it up here such that it has everything since fs creation

Actions #16

Updated by Greg Farnum about 13 years ago

  • Status changed from New to 4
  • Target version set to v0.27

All right, try this patch and see if it fixes it?

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 47649d4..cc54d88 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -5129,9 +5129,11 @@ void Server::_rename_prepare(MDRequest *mdr,
   int predirty_dir = silent ? 0:PREDIRTY_DIR;

   // sub off target
-  if (destdn->is_auth() && !destdnl->is_null())
+  if (destdn->is_auth() && !destdnl->is_null()) {
     mdcache->predirty_journal_parents(mdr, metablob, destdnl->get_inode(), destdn->get_dir(),
                                      (destdnl->is_primary() ? PREDIRTY_PRIMARY:0)|predirty_dir, -1);
+    mdcache->predirty_journal_parents(mdr, metablob, destdnl->get_inode(), straydn->get_dir(), PREDIRTY_PRIMARY|PREDIRTY_DIR, 1);
+  }

   // move srcdn
   int predirty_primary = (srcdnl->is_primary() && srcdn->get_dir() != destdn->get_dir()) ? PREDIRTY_PRIMARY:0;

We were simply failing to update book-keeping when we renamed over an existing file -- the inode would get moved into the stray directory but the rename paths didn't do proper bookkeeping for the move.

Actions #17

Updated by Greg Farnum about 13 years ago

  • Status changed from 4 to Resolved

Seems to be working. Pushed to master in b7b66ac95e406076608df24424ac6d95d72a9ac6.

Actions #18

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_story_points set to 2
  • Translation missing: en.field_position set to 633
Actions

Also available in: Atom PDF