Project

General

Profile

Bug #47385

intermittent EACCES with recover_session=clean

Added by Ilya Dryomov 3 months ago. Updated 12 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature:

Description

$ sudo mount -t ceph -o recover_session=clean ... /mnt
$ ls /mnt
$ ceph daemon mds.a session evict ...
$ ls /mnt
ls: cannot access '/mnt': Permission denied
$ ls /mnt

Even though ls's stat() call blocks for a noticeable amount of time, it still fails with EACCES. Only the next call succeeds.

History

#1 Updated by Jeff Layton 2 months ago

I'm pretty sure this comes from __do_request, which does this:

                if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
                        err = -EACCES;
                        goto out_session;
                }

I think what we probably need to do is have that return a more distinct error code in this situation and have ceph_mdsc_submit_request wait until session recovery has taken place and then retry the call.
We'll also need to handle that case in __wake_requests and kick_requests.

#2 Updated by Jeff Layton 2 months ago

Actually, we have the r_wait list_head, and can just queue the request to the waiting_on_map queue. The problem there though is that the reconnect calls ceph_umount_begin and that seems to make the request abort with EIO instead. I think we just need to fix up that handling so that the request gets restarted after a new session is reestablished.

#3 Updated by Jeff Layton 2 months ago

I'm able to reproduce this and have started playing with some patches in this area, but the MDS behavior doesn't make a lot of sense to me. Here's what I'm seeing:

1/ start with a mount with -o recover_session=clean
2/ stat a file that's already present in it
3/ from cephadm shell:

# ceph tell mds.scratch.ceph.sdxhol client ls
[
    {
        "id": 914164,
        "entity": {
            "name": {
                "type": "client",
                "num": 914164
            },
...

# ceph daemon mds.scratch.ceph.sdxhol session evict 914164
# ceph tell mds.scratch.ceph.sdxhol client ls 
[]

4. On client stat a file and let it do the recover session dance. Small delay and then the stat comes back

# ceph tell mds.scratch.ceph.sdxhol client ls
[
    {
        "id": 914164,
        "entity": {
            "name": {
                "type": "client",
                "num": 914164
            },
...

...note that the "id" is the same as the original one. Is that expected?

If I then evict it again, the client gets stuck and doesn't seem to reestablish the session. That might be a client side bug (still investigating) but the duplicate id for the session seems odd at best.

#4 Updated by Jeff Layton 2 months ago

  • Assignee set to Jeff Layton

#5 Updated by Jeff Layton 2 months ago

Continuing work on this today:

Patrick seemed to think that we would regenerate the nonce on every reconnect and that should be enough to cause a new session ID to be allocated. It looks like that is done in the code, but the reconnected session doesn't seem to display it in the "client ls" output.

Looking further. With my (modest) changes so far, the first eviction seems to work correctly:

[ 1686.564800] ceph: mds0 rejected session
[ 1686.881323] ceph: auto reconnect after blocklisted

...but after the second eviction, I don't get the "auto reconnect after blocklisted" message:

[ 1721.571840] ceph: mds0 rejected session

Looking now to see why...

#6 Updated by Jeff Layton 2 months ago

Ahh, it seems to have been the time limit. In maybe_recover_session:

        if (fsc->last_auto_reconnect &&
            time_before(jiffies, fsc->last_auto_reconnect + HZ * 60 * 30))
                return;

When I comment that out, this seems to work as expected.

I think we probably ought to remove this timeout, but I need to go back through Zheng's design on this to make sure that won't be problematic.

#7 Updated by Jeff Layton 2 months ago

I have a patchset that seems to smooth this over, but I'm not convinced it's actually safe. The problem as I see it is that we may end up taking cap references and making a call that then gets queued until the reconnect. The old session is torn down and caps are removed, and then we issue the call on the new session, but it seems likely we may not have been re-granted the caps we needed for that call.

I think what I'll do is post the patchset I have upstream as an RFC and we can discuss it there.

#8 Updated by Jeff Layton 20 days ago

  • Status changed from New to In Progress

Patchset in testing branch and has been for about a month with no issues.

#9 Updated by Jeff Layton 12 days ago

  • Status changed from In Progress to Fix Under Review

Also available in: Atom PDF