Bug #65647: Evicted kernel client may get stuck after reconnect - CephFS - Ceph

Actions

Copy link

Bug #65647

open

Evicted kernel client may get stuck after reconnect

Added by Mykola Golub 10 days ago. Updated 1 day ago.

Status:

Triaged

Priority:

Normal

Assignee:

Xiubo Li

Category:

Correctness/Safety

Target version:

Ceph - v20.0.0

% Done:

Source:

Tags:

Backport:

reef,squid

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Our customer were observing sporadic "client isn't responding to mclientcaps(revoke)" issue so they configured auto eviction without blocklisting:

mds               advanced  mds_cap_revoke_eviction_timeout        600.000000                                                                                            
mds               advanced  mds_session_blocklist_on_evict         false                                                                                                 
mds               advanced  mds_session_blocklist_on_timeout       false

It works the most of times: after eviction the client opens a new connection and k8s pods may successfully bind to the volume. But sometimes the client gets stuck: the pods fail to bind to the volume until the host is reboot, although on the mds side the session is seen in "open" state and looks like just an idle. If we do a manual evict for such session (ceph tell mds.0 session kill id) it helps and the pods may bind to the volume again.

From the mds log it looks like for these sporadic cases we always observe "denied reconnect attempt" message, like below:

Apr 08 14:35:03 ceph04 ceph-mds[48012]: mds.0.server evict_cap_revoke_non_responders: evicting cap revoke non-responder client id 19707773
Apr 08 14:35:03 ceph04 ceph-mds[48012]: mds.0.63169 Evicting client session 19707773 (v1:192.168.100.45:0/4254479370)
Apr 08 14:35:03 ceph04 ceph-mds[48012]: log_channel(cluster) log [INF] : Evicting client session 19707773 (v1:192.168.100.45:0/4254479370)
Apr 08 14:35:10 ceph04 ceph-mds[48012]: mds.0.server no longer in reconnect state, ignoring reconnect, sending close
Apr 08 14:35:10 ceph04 ceph-mds[48012]: log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:active) from client.19707773 v1:192.168.100.45:0/4254479370 after 1.68543e+07 (allowed interval 45)

Note, it is just some lines from the logs, for illustration. The full log is attached.

So, normally, when a client successfully reconnects we may see only "no longer in reconnect state, ignoring reconnect, sending close" messages, which according to the code are when the client sends "reconnect" message while the session is in close state.

But "denied reconnect attempt" message seems to be possible only when the session is in open state. My interpretation of this (although may be incorrect) is the following. Normally (when it works) the sequence looks like the following:

1) on eviction, the mds closes the session
2) the client notices this and sends "reconnect" message
3) the mds sees the session in close state and "ignores reconnect, sending close".
4) the client sends "open" and the session becomes "open" again.

And in the problematic case to me it looks like after (4) the mds receives "reconnect" message from (2) again (a resent dup?) and this makes it get stuck.

The client kernel version:

"kernel_version": "5.15.0-92-generic",

The mds version:

"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)"

Files

ceph-mds.cephfs.ceph04.kqqlbq.log.gz (44.7 KB) ceph-mds.cephfs.ceph04.kqqlbq.log.gz

Mykola Golub, 04/24/2024 11:12 AM

Actions

Copy link

Updated by Xiubo Li 10 days ago

I think you have enabled recover_session in kclient ?

In general, this option should not be used by most clients that have stable connections to the MDS's, but in environments where connectivity may be spotty, and availability is more important than correctness, admins may want to consider enabling it.

More detail about this option please see https://ceph.io/en/news/blog/2020/automatic-cephfs-recovery-after-blacklisting/.

Actions

Copy link

Updated by Xiubo Li 10 days ago

Xiubo Li wrote in #note-1:

I think you have enabled recover_session in kclient ?

[...]

More detail about this option please see https://ceph.io/en/news/blog/2020/automatic-cephfs-recovery-after-blacklisting/.

Pardon, please just ignore this, this reconnect was from the reset logic.

Actions

Copy link

Updated by Xiubo Li 10 days ago

Mykola Golub wrote:

Our customer were observing sporadic "client isn't responding to mclientcaps(revoke)" issue so they configured auto eviction without blocklisting:
[...]
It works the most of times: after eviction the client opens a new connection and k8s pods may successfully bind to the volume. But sometimes the client gets stuck: the pods fail to bind to the volume until the host is reboot, although on the mds side the session is seen in "open" state and looks like just an idle. If we do a manual evict for such session (ceph tell mds.0 session kill id) it helps and the pods may bind to the volume again.

From the mds log it looks like for these sporadic cases we always observe "denied reconnect attempt" message, like below:
[...]
Note, it is just some lines from the logs, for illustration. The full log is attached.

So, normally, when a client successfully reconnects we may see only "no longer in reconnect state, ignoring reconnect, sending close" messages, which according to the code are when the client sends "reconnect" message while the session is in close state.

But "denied reconnect attempt" message seems to be possible only when the session is in open state. My interpretation of this (although may be incorrect) is the following. Normally (when it works) the sequence looks like the following:

1) on eviction, the mds closes the session
2) the client notices this and sends "reconnect" message
3) the mds sees the session in close state and "ignores reconnect, sending close".
4) the client sends "open" and the session becomes "open" again.

I didn't see any new open sessions logs just after the reconnect.

When the MDS reset the socket connection the kclient will detect it and then try to send a reconnect. While in the MDS it will just deny it since it's not in the RECONNECT state any more.

All the logs look normal to me.

And in the problematic case to me it looks like after (4) the mds receives "reconnect" message from (2) again (a resent dup?) and this makes it get stuck.

From the logs I didn't see this.

And do you have the kcleint side logs ? What's the errors there ?

Is that possible to enable the mds debug logs, let's see whether there are other logs about this.

debug mds = 20
debug ms = 1

The client kernel version:

"kernel_version": "5.15.0-92-generic",

The mds version:

"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)"

Actions

Copy link

Updated by Mykola Golub 9 days ago

Xiubo Li wrote in #note-3:

Is that possible to enable the mds debug logs, let's see whether there are other logs about this.

Hi Xiubo, thank you for the reply. It is our customer production cluster, the issue is very sporadic, I would say once per month or so, thus having the debug log enabled for all that time waiting for the issue to reoccur would be problematic. Also it is problematic to get access to the client hosts to collect debug info there (unless you may provide exact instructions how to get the debug information we need, I don't have much experience with the cephfs kernel client troubleshooting). Anyway if I get any new data I will update the ticket.

The log I attached is not very verbose, especially for ms subsystem, so my assumptions about "open session" is based on looking at the code: both "no longer in reconnect state, ignoring reconnect, sending close" and "denied reconnect attempt from " messages are printed by Server::handle_client_reconnect, i.e. it should be when handling a "reconnect" message, and the first message is printed when the session is not in open state on the mds side, while the second is printed when the session is open state.

[1] https://github.com/ceph/ceph/blob/c61472c83ce76869b0fe076c19ed2f4bcc3fb4af/src/mds/Server.cc#L1471

Actions

Copy link

Updated by Venky Shankar 4 days ago

Mykola Golub wrote in #note-4:

Xiubo Li wrote in #note-3:

Is that possible to enable the mds debug logs, let's see whether there are other logs about this.

Hi Xiubo, thank you for the reply. It is our customer production cluster, the issue is very sporadic, I would say once per month or so, thus having the debug log enabled for all that time waiting for the issue to reoccur would be problematic. Also it is problematic to get access to the client hosts to collect debug info there (unless you may provide exact instructions how to get the debug information we need, I don't have much experience with the cephfs kernel client troubleshooting). Anyway if I get any new data I will update the ticket.

The log I attached is not very verbose, especially for ms subsystem, so my assumptions about "open session" is based on looking at the code: both "no longer in reconnect state, ignoring reconnect, sending close" and "denied reconnect attempt from " messages are printed by Server::handle_client_reconnect, i.e. it should be when handling a "reconnect" message, and the first message is printed when the session is not in open state on the mds side, while the second is printed when the session is open state.

[1] https://github.com/ceph/ceph/blob/c61472c83ce76869b0fe076c19ed2f4bcc3fb4af/src/mds/Server.cc#L1471

I don't get it. The denied reconnect attempt cluster log is right after the no longer in reconnect state ... log message.

    if (reconnect_all_deny) {
      dout(1) << "mds_deny_all_reconnect was set to speed up reboot phase, ignoring reconnect, sending close" << dendl;
    } else {
      dout(1) << "no longer in reconnect state, ignoring reconnect, sending close" << dendl;
    }
    mds->clog->info() << "denied reconnect attempt (mds is " 
       << ceph_mds_state_name(mds->get_state())
       << ") from " << m->get_source_inst()
       << " after " << delay << " (allowed interval " << g_conf()->mds_reconnect_timeout << ")";
    deny = true;

... and further down below in the function, there is:

    if (!error_str.empty()) {
      deny = true;
      dout(1) << " " << error_str << ", ignoring reconnect, sending close" << dendl;
      mds->clog->info() << "denied reconnect attempt from " 
                        << m->get_source_inst() << " (" << error_str << ")";
    }

The tracker description mentions denied reconnect attempt (mds is ... for the problematic case, but that should always show up in cluster log. I'm missing something.... :/

Actions

Copy link

Updated by Xiubo Li 4 days ago

Venky Shankar wrote in #note-5:

Mykola Golub wrote in #note-4:

Xiubo Li wrote in #note-3:

Is that possible to enable the mds debug logs, let's see whether there are other logs about this.

Hi Xiubo, thank you for the reply. It is our customer production cluster, the issue is very sporadic, I would say once per month or so, thus having the debug log enabled for all that time waiting for the issue to reoccur would be problematic. Also it is problematic to get access to the client hosts to collect debug info there (unless you may provide exact instructions how to get the debug information we need, I don't have much experience with the cephfs kernel client troubleshooting). Anyway if I get any new data I will update the ticket.

The log I attached is not very verbose, especially for ms subsystem, so my assumptions about "open session" is based on looking at the code: both "no longer in reconnect state, ignoring reconnect, sending close" and "denied reconnect attempt from " messages are printed by Server::handle_client_reconnect, i.e. it should be when handling a "reconnect" message, and the first message is printed when the session is not in open state on the mds side, while the second is printed when the session is open state.

[1] https://github.com/ceph/ceph/blob/c61472c83ce76869b0fe076c19ed2f4bcc3fb4af/src/mds/Server.cc#L1471

I don't get it. The denied reconnect attempt cluster log is right after the no longer in reconnect state ... log message.

[...]

... and further down below in the function, there is:

[...]

The tracker description mentions denied reconnect attempt (mds is ... for the problematic case, but that should always show up in cluster log. I'm missing something.... :/

All these logs look normal and that's because in this case it would trigger the socket reset in client side and then the client tried to reconnect to the sessions, but since the MDS was not in reconnect state and denied it.

I didn't see any try to open the session before and after this.

Actions

Copy link

Updated by Venky Shankar 4 days ago

Category set to Correctness/Safety
Assignee set to Xiubo Li
Target version set to v20.0.0
Backport set to reef,squid

Actions

Copy link

Updated by Venky Shankar 4 days ago

Status changed from New to Triaged

Actions

Copy link

Updated by Mykola Golub 3 days ago

Xiubo Li wrote in #note-6:

Venky Shankar wrote in #note-5:

The tracker description mentions denied reconnect attempt (mds is ... for the problematic case, but that should always show up in cluster log. I'm missing something.... :/

All these logs look normal and that's because in this case it would trigger the socket reset in client side and then the client tried to reconnect to the sessions, but since the MDS was not in reconnect state and denied it.

I didn't see any try to open the session before and after this.

Hi Xiubo and Venky,

In the attached logs you may find many cases when clients were auto-evicted. In most of the cases the clients successfully reconnected after this. In some cases they didn't (the client host needed reboot or one had to tell mds `session kill` for the problematic session to allow the pods successfully bind to the problematic volume). We was not able to check all the cases so cannot be 100% sure, but it seemed that for that problematic cases we always observed "denied reconnect attempt (mds is up:active) from client.*" messages. In other ("normal") cases we observed only "no longer in reconnect state, ignoring reconnect, sending close" messages.

If you think that the behavior is expected how do you explain the difference? Why do we observe "denied reconnect attempt" messages in some cases but not in others, and why do we have troubles with the client in the first case?

Note, normally (e.g. if I manually evict a client) I don't see "denied reconnect attempt (mds is up:active) from client.*" message, I failed to reproduce it, and it looks to me like a race.

Actions

Copy link

#10

Updated by Venky Shankar 1 day ago

Mykola, thank you for the update. It's very likely these config setting has exposed a bug in the MDS. I did a quick search in our qa suite and there isn't any test where mds_session_blocklist_on_evict is disabled. I should also point you to [0] which was specifically introduced to recover a blocklisted session and the respective tests exist in our qa suite.

I should mention that this is to be used in environments where availability matters more than correctness (which comes close to the config settings mentioned in the tracker description).

[0]: https://ceph.io/en/news/blog/2020/automatic-cephfs-recovery-after-blacklisting/

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #65647

Evicted kernel client may get stuck after reconnect

Updated by Xiubo Li 10 days ago

Updated by Xiubo Li 10 days ago

Updated by Xiubo Li 10 days ago

Updated by Mykola Golub 9 days ago

Updated by Venky Shankar 4 days ago

Updated by Xiubo Li 4 days ago

Updated by Venky Shankar 4 days ago

Updated by Venky Shankar 4 days ago

Updated by Mykola Golub 3 days ago

Updated by Venky Shankar 1 day ago