Project

General

Profile

Actions

Bug #1084

closed

blogbench won't finish: waiting for Fr cap forever

Added by Henry Chang almost 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run blogbench with kclient: blogbench -d /mnt/ceph/henry/b5/
Blogbench won't finish and keeps waiting for Fr caps of several files.
Attached is the MDS log showing that MDS did not issue the Fr cap of inode 10000058cb1 to the client on file_update_finish.

Note: "ls -l /proc/<blogbench pid>/fd/" (it may need to be executed multiple times) will help blogbench to finish.


Files

mds-050509-050511.log.gz (3.5 MB) mds-050509-050511.log.gz Henry Chang, 05/11/2011 08:03 PM
Actions #1

Updated by Sage Weil almost 13 years ago

i initially thought something like this would work

diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index 3c7c190..04428c9 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@ -758,6 +758,10 @ void Locker::eval_scatter_gathers(CInode *in)

void Locker::eval(SimpleLock *lock, bool *need_issue)
{
+ bool my_need_issue = false;
+ if (!need_issue)
+ need_issue = &my_need_issue;

switch (lock->get_type()) {
case CEPH_LOCK_IFILE:
return file_eval((ScatterLock*)lock, need_issue);
@ -767,6 +771,9 @ void Locker::eval(SimpleLock lock, bool *need_issue)
default:
return simple_eval(lock, need_issue);
}

+ if (my_need_issue && lock->get_type() != CEPH_LOCK_DN)
+ issue_caps((CInode
)lock->get_parent());
}

but i think that is going to cause lots of non-optimal lock thrashing. i suspect a better approach would be to make the eval() *need_issue argument non-optional and passed down via drop_locks, so that callers like file_update_finish have more control over when things happen. i'll take a closer look tomorrow...

Actions #2

Updated by Sage Weil almost 13 years ago

  • Target version set to v0.29
Actions #3

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_story_points set to 8
Actions #4

Updated by Sage Weil almost 13 years ago

  • Assignee set to Sage Weil
Actions #5

Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.29 to v0.30
Actions #6

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_position set to 5
Actions #7

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_position deleted (14)
  • Translation missing: en.field_position set to 13
Actions #8

Updated by Greg Farnum almost 13 years ago

  • Status changed from New to In Progress
  • Assignee changed from Sage Weil to Greg Farnum

Henry, have you seen this lately? I get the impression you were seeing it very easily and I've not been able to reproduce it so far (although I'm going to work on Sage's patch anyway, since there's a bug even if it's hard to hit).

Actions #9

Updated by Henry Chang almost 13 years ago

I haven't test it for a while. I'll give it a try after the holidays.

Actions #10

Updated by Greg Farnum almost 13 years ago

Having looked at this some more, I'm pretty sure it's a different issue than Sage though.

The eval family of functions is only supposed to issue_caps if they are removing caps, not if it's now possible for somebody to take more (after all, these functions only look at the lock -- they don't know anything about who wants caps). As best I can tell, they do this correctly -- they pass around a bool *need_issue, and when appropriate1 they either set it to true, or if the pointer is NULL they call need_issue themselves.
It is the responsibility of higher-level functions (like file_update_finish) to issue caps to clients that want more caps. It does attempt to do this:

    if (cap && (cap->wanted() & ~cap->pending())) {
      issue_caps(in, cap);
    }

However, in the partial log we've received, file_update_caps gets called after check_inode_max_size (called under handle_client_cap_release) and check_max_inode_size doesn't set the cap variable.
So that's why in this case the MDS isn't issuing caps again. I'll have to look around a bit more to figure out the proper fix, though.

[1]: On a cursory inspection -- maybe they've missed a case, but either way I don't think it could cause the bug we're seeing here.

Actions #11

Updated by Greg Farnum almost 13 years ago

The real question here is why the earlier callers of issue_caps think that the client (a loner) can't have the Fr cap which it is asking for. It shows up right away in the requested caps but isn't part of the allowed.

Optimizations may involve calculating caps in other places, but this is the actual bug.

Actions #12

Updated by Greg Farnum almost 13 years ago

Looked at this some more yesterday. Turns out the problem seems to be that the client closes the inode, and all caps on it are dropped. Then the MDS starts check_max_inode_size (and should it do this when an inode closes properly?). The client reopens the inode, but most of the caps are in sync state and the MDS has the write cap (from a force_wrlock, which might impact things on its own?) -- this prevents the inode from getting the file access caps it wants during the initial open.
When check_inode_max_size completes, the MDS drops the caps it's grabbed on to. In the log fragment we have, the client is then never issued any more caps. I'm not sure if this is because it would never be issued more caps (meaning the MDS needs to do something differently), or if it's because there was some other bug preventing it from being issued caps and the log we have is just incomplete.

I'm going to try and reproduce this bug, I think, but more information on the versions being and how to reproduce would be appreciated!

Actions #13

Updated by Henry Chang almost 13 years ago

I just tested it with the latest ceph stable branch (0.29+) and kclient master branch (backported for 2.6.32 by me). Tested it 3 times, and the last one hangs.

As before, "ls -l /proc/<blogbench pid>/fd/" will let blogbench finish. It seems to me that if the client sent the cap request to the MDS again, the MDS would be able to issue Fr to the client without problems. Thus, there should be no other bug preventing Fr from being issued. It was just that MDS missed the chance of issueing Fr on file_update_finish, and then there are no other "trigger points" to issue it later.

After applying the following diff, blogbench can finish successfully every time:

diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index 3c7c190..36da98e 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@@ -1242,7 +1242,7 @@ void Locker::file_update_finish(CInode *in, Mutation *mut, bool share, client_t
    if (gather)
      eval_cap_gather(in);
  } else {
-    if (cap && (cap->wanted() & ~cap->pending())) {
+    if (!cap || (cap && (cap->wanted() & ~cap->pending()))) {
      issue_caps(in, cap);
    }

This may not be the right fix, just backing up my thoughts.

On the other side, I am also wondering if the bug is caused by my backport since no others than me are easy to reproduce it...................

Actions #14

Updated by Sage Weil almost 13 years ago

  • Assignee changed from Greg Farnum to Sage Weil
Actions #15

Updated by Sage Weil almost 13 years ago

The mds_issue_caps branch has my current work on cleaning the issue_caps stuff up. I'm pretty happy with what the MDS code looks like at the moment. My blogbench runs have turned up a couple of bugs in the kclient (pushed to master), but so far it's holding up...

Actions #16

Updated by Sage Weil almost 13 years ago

  • Status changed from In Progress to Resolved

I can't reproduce this on my mds_issue_caps, so I've merged it into master, commit:0f8fbd562cc58351541ac4f60a1dc98bdd1c91bf. Henry, can let me know if you are still seeing problems here?

Thanks!

Actions #17

Updated by Henry Chang almost 13 years ago

It is working great. I haven't see this problem after applying the patches.

Actions #18

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.30)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF