Project

General

Profile

Actions

Bug #21748

closed

client assertions tripped during some workloads

Added by Jeff Layton over 6 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We had a report of some crashes in ganesha here:

https://github.com/nfs-ganesha/nfs-ganesha/issues/215

Dan and I looked at the report and determined that it's probably this assertion in ll_setattrx:

assert(in == target.get());
Actions #1

Updated by Jeff Layton over 6 years ago

The right fix is probably to just remove that assertion. I don't think it's really valid anyway. cephfs turns the inode into a path when sending it to the MDS. There is nothing that prevents that path changing once the call is in flight.

Actions #2

Updated by Jeff Layton over 6 years ago

Actually this is wrong (as Zheng pointed out). The call is made with a zero-length path that starts from the inode on which the setattr is being done. I wonder if we did end up with a similar race though, and got a traceless reply, that could potentially cause this since we do a pathwalk on the client in that case and it could have landed in a different spot than we expected.

Actions #3

Updated by Zheng Yan over 6 years ago

this shouldn't happen even for traceless reply. I suspect the 'in' passed to ceph_ll_setattr isn't belong to the 'cmount'

Actions #4

Updated by Jeff Layton over 6 years ago

Huh. That is an interesting theory. I don't see how ganesha would do that, but maybe. Unfortunately, the original problem reporter has gone unresponsive, so we don't have a lot to go on here.

Either way, I think the right solution here is to probably to add some more logging (and maybe assertions) to catch these sorts of cases. Maybe with that we'll get a better handle on this problem.

Actions #5

Updated by Jeff Layton over 6 years ago

  • Status changed from New to Can't reproduce

No response in several months, and I've never seen this trip in my own testing. Closing for now. Please reopen if you have more information.

Actions

Also available in: Atom PDF