Project

General

Profile

Actions

Bug #49500

closed

qa: "Assertion `cb_done' failed."

Added by Patrick Donnelly about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Related issues 4 (0 open4 closed)

Copied from CephFS - Bug #49309: nautilus: qa: "Assertion `cb_done' failed."ResolvedJeff Layton

Actions
Copied to CephFS - Backport #50188: octopus: qa: "Assertion `cb_done' failed."RejectedActions
Copied to CephFS - Backport #50189: nautilus: qa: "Assertion `cb_done' failed."RejectedActions
Copied to CephFS - Backport #50190: pacific: qa: "Assertion `cb_done' failed."Resolvedsinguliere _Actions
Actions #1

Updated by Patrick Donnelly about 3 years ago

  • Copied from Bug #49309: nautilus: qa: "Assertion `cb_done' failed." added
Actions #2

Updated by Jeff Layton about 3 years ago

With the most recent change to make that variable atomic, I doubt we're hitting cache-coherency problems. It seems more likely that the callback just didn't happen. Could it be that the cluster in this case is OK with 1000 inodes on the client and doesn't trigger the ino_release_cb?

Actions #3

Updated by Jeff Layton about 3 years ago

Yeah, looking at the MDS logs from the above run. I don't see any occurrences of the word "recall" in there and at least some of the dout(7) messages in Server::recall_client_state should have fired. I think that this test just didn't trigger any recalls. Have the MDS's default limits changed?

Actions #4

Updated by Patrick Donnelly about 3 years ago

Jeff Layton wrote:

Yeah, looking at the MDS logs from the above run. I don't see any occurrences of the word "recall" in there and at least some of the dout(7) messages in Server::recall_client_state should have fired. I think that this test just didn't trigger any recalls. Have the MDS's default limits changed?

Ah, yes that is probably it. I think it's caused by 63392e1b65fbead6ef8c7acd6a70e6ef5b322390 and the new mds_min_caps_working_set option.

Actions #5

Updated by Jeff Layton about 3 years ago

I'm not sure that setting is enough to explain this. AFAICT, that setting is only consulted in notify_health(), so I think that should just affect health warnings.

This test was written when the logic to trigger cap recall was pretty simple. Once you hit ~1k caps outstanding, the MDS would ask the client to shrink its caps. This has evidently changed recently, but the test was not updated to take that into account.

Basically we want this test to find new inodes up until the point where we know that the MDS will start recalling them. What's the right way to do that now?

Actions #6

Updated by Jeff Layton about 3 years ago

Maybe we could lower mds_max_caps_per_client for this test? It defaults to 1M now, but we could take that down to 500 or so for this test (and then reset it when we're done)?

Actions #7

Updated by Patrick Donnelly about 3 years ago

Jeff Layton wrote:

Maybe we could lower mds_max_caps_per_client for this test? It defaults to 1M now, but we could take that down to 500 or so for this test (and then reset it when we're done)?

Looking at this test more closely... why it ever worked is unclear to me. MDS does not normally drive recall for a client reaching 1k caps. What is supposed to trigger call release_cb?

We can reduce `mds_max_caps_per_client` but I'd like to understand what's supposed to be tested. Just that the callback works?

Actions #8

Updated by Jeff Layton about 3 years ago

Patrick Donnelly wrote:

Jeff Layton wrote:

Maybe we could lower mds_max_caps_per_client for this test? It defaults to 1M now, but we could take that down to 500 or so for this test (and then reset it when we're done)?

Looking at this test more closely... why it ever worked is unclear to me. MDS does not normally drive recall for a client reaching 1k caps. What is supposed to trigger call release_cb?

It used to do that, IIRC, but it was based on some rather fluid limits.

We can reduce `mds_max_caps_per_client` but I'd like to understand what's supposed to be tested. Just that the callback works?

Yes, just that the callback is called when inodes are being recalled (a'la CEPH_SESSION_RECALL_STATE).

Actions #9

Updated by Patrick Donnelly about 3 years ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Jeff Layton to Patrick Donnelly
  • Pull request ID set to 40418
Actions #10

Updated by Patrick Donnelly about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Component(FS) Client added
Actions #11

Updated by Backport Bot about 3 years ago

  • Copied to Backport #50188: octopus: qa: "Assertion `cb_done' failed." added
Actions #12

Updated by Backport Bot about 3 years ago

  • Copied to Backport #50189: nautilus: qa: "Assertion `cb_done' failed." added
Actions #13

Updated by Backport Bot about 3 years ago

  • Copied to Backport #50190: pacific: qa: "Assertion `cb_done' failed." added
Actions #14

Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF