Project

General

Profile

Actions

Feature #51416

open

kclient: add debugging for mds failover events

Added by Patrick Donnelly almost 3 years ago. Updated almost 2 years ago.

Status:
Fix Under Review
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
pacific
Reviewed:
Affected Versions:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:

Related issues 1 (1 open0 closed)

Related to CephFS - Bug #51410: kclient: fails to finish reconnect during MDS thrashing (testing branch)New

Actions
Actions #1

Updated by Patrick Donnelly almost 3 years ago

  • Related to Bug #51410: kclient: fails to finish reconnect during MDS thrashing (testing branch) added
Actions #2

Updated by Jeff Layton almost 3 years ago

On IRC Patrick said:

19:00 < batrick> MDSMap epoch received
19:00 < batrick> mds failover in process
19:00 < batrick> would be a good start
Actions #3

Updated by Jeff Layton almost 3 years ago

We already have this dout() message already when we get a new map:

        dout("check_new_map new %u old %u\n",
             newmap->m_epoch, oldmap->m_epoch);

By mds failover, do you mean the point where the client gives up on an MDS and migrates to another? Or do you mean some sort of export/import activity?

Actions #4

Updated by Patrick Donnelly almost 3 years ago

Jeff Layton wrote:

We already have this dout() message already when we get a new map:

[...]

By mds failover, do you mean the point where the client gives up on an MDS and migrates to another? Or do you mean some sort of export/import activity?

If an MDS is recovering (e.g. up:reconnect/up:replay/etc.). The client could note that and perhaps note how many requests it (will) need to replay, caps with that rank, etc.

Actions #5

Updated by Jeff Layton over 2 years ago

I can see where to add such a message, but I'm not that familiar with all of the different MDS states. Which ones, specifically, should we notify the user of? You mentioned:

up:replay
up:reconnect

...but there are a bunch of others. What about up and out states, or up:resolve, up:rejoin, etc.?

Actions #6

Updated by Patrick Donnelly over 2 years ago

Jeff Layton wrote:

I can see where to add such a message, but I'm not that familiar with all of the different MDS states. Which ones, specifically, should we notify the user of? You mentioned:
[...]
...but there are a bunch of others. What about up and out states, or up:resolve, up:rejoin, etc.?

All state transitions for a rank should be reported. (IMO)

Actions #7

Updated by Jeff Layton over 2 years ago

I'm not convinced it's beneficial to spam the kernel's ring buffer with these messages. Ceph is already a bit too chatty with this sort of thing as it is...

Since you said you want this for debugging purposes, do you need anything beyond the dout() message that already reports this?

                dout("check_new_map mds%d state %s%s -> %s%s (session %s)\n",
                     i, ceph_mds_state_name(oldstate),
                     ceph_mdsmap_is_laggy(oldmap, i) ? " (laggy)" : "",
                     ceph_mds_state_name(newstate),
                     ceph_mdsmap_is_laggy(newmap, i) ? " (laggy)" : "",
                     ceph_session_state_name(s->s_state));

If you want this mainly for teuthology testing, what may be best is to create a new script that gets run on the kclient before a test that can selectively enable certain debug messages like this one.

Would that be acceptable?

Actions #8

Updated by Patrick Donnelly over 2 years ago

Jeff Layton wrote:

I'm not convinced it's beneficial to spam the kernel's ring buffer with these messages. Ceph is already a bit too chatty with this sort of thing as it is...

Since you said you want this for debugging purposes, do you need anything beyond the dout() message that already reports this?

[...]

If you want this mainly for teuthology testing, what may be best is to create a new script that gets run on the kclient before a test that can selectively enable certain debug messages like this one.

How do we do that?

Would that be acceptable?

Probably.

Actions #9

Updated by Jeff Layton over 2 years ago

Basically, we'd just need to do something like this in a script before the test runs:

#!/bin/sh
modprobe ceph
echo 'module ceph func check_new_map +p' > /sys/kernel/debug/dynamic_debug/control

That would turn on all of the dout() messages in the check_new_map() function. We can get more specific than that using line numbers too, but that gets to be a bit more fragile in the face of kernel changes.

We could extend this idea to something more general too:

https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html

Ilya, do you have anything like this already for rbd testing?

Actions #10

Updated by Jeff Layton over 2 years ago

Oh, actually, we can use the format directive, so something like this would probably also work:

#!/bin/sh
modprobe ceph
echo 'module ceph format "check_new_map mds" +p' > /sys/kernel/debug/dynamic_debug/control
Actions #11

Updated by Ilya Dryomov over 2 years ago

No, we don't enable douts anywhere in the krbd suite.

Actions #12

Updated by Patrick Donnelly over 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Jeff Layton to Patrick Donnelly
  • Target version set to v17.0.0
  • Source set to Development
  • Backport set to pacific
  • Component(FS) qa-suite added
  • Labels (FS) qa added
Actions #13

Updated by Patrick Donnelly over 2 years ago

  • Pull request ID set to 42512
Actions #14

Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v17.0.0)
Actions

Also available in: Atom PDF