Bug #55464: cephfs: mds/client error when client stale reconnect - CephFS - Ceph

Actions

Copy link

Bug #55464

open

cephfs: mds/client error when client stale reconnect

Added by Mer Xuanyi almost 2 years ago. Updated 12 months ago.

Status:

In Progress

Priority:

Normal

Assignee:

Category:

Correctness/Safety

Target version:

Ceph - v19.0.0

% Done:

Source:

Community (dev)

Tags:

Backport:

reef,quincy,pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Client, MDS

Labels (FS):

Pull request ID:

46050

Crash signature (v1):

Crash signature (v2):

Description

Options:
mds_session_blocklist_on_evict: false
mds_session_blocklist_on_timeout: false
client_reconnect_stale: true

We expect client can work well when mds reboot. (or session stale because temporarily network unavailable)
But in fact it may lead to mds/client crash.
When mds reboot into RECONNECT phrase, client will detected the state changes from mdsmap, and call Client::send_reconnect function.
In Client::send_reconnect, client will send session->unsafe_requests, unprocessed mds_requests and client_reconnect message.
If reconnect time out, mds call kill_session to evict client (not blocklist), client call Client::kick_requests_closed in Client::_closed_mds_session, and that will kick&remove all inflight requests, looks nice right? But in Client::make_requests it will check if request->reply and try to resend it when mds is active and session reopen, that may lead to lots of error.

One typical situation is we have two requests req.1 (mkdir test_dir) and req.2 (touch test_dir/test_file), req.1 got early_reply.
when mds reboot but client reconnect timed out, client will drop req.1, req.2 will be resend when mds's state change to active with session reopen, but Server can't process this request correctly cause the ino of test_dir is not real exist. Finally mds tell client the ino of test_dir is stale, and client will retry req.2 from this infinite loop.

Possible Problems from this bug:

1. stale ino
2. client cache mud / client crash（ino added into inode_map when handle mds early_reply, but updated it for another request when mds reboot）
3. objecter mud (client write data when get early_reply, but droped when mds reboot)
4. mds crash (when mds alloc ino, it find the ino is already in inode_map -- only find in jewel , a special OPEN event journaled after mkdir)

These PR #29095, #30969 (removed by commit a7a1b0a3) solved a part of this problem but only effect when mds has not yet switch to active