Bug #10151: mds client cache pressure health warning oscillates on/off - CephFS - Ceph

Actions

Copy link

Bug #10151

closed

mds client cache pressure health warning oscillates on/off

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

John Spray

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

seeing this on lab cluster. not sure if it is a problem in the mds health reporting or the mon, but it goes on and off every few seconds. probably depends whether you hit the leader mon?

Actions

Copy link

Updated by John Spray over 9 years ago

Yes -- the leader is reporting the health warning but the peons are not.

The warning is "Client 2922132 failing to respond to cache pressure", the session state is:

[
    { "id": 5443238,
      "num_leases": 0,
      "num_caps": 50766,
      "state": "open",
      "replay_requests": 0,
      "reconnecting": false,
      "inst": "client.5443238 10.214.131.141:0\/27885",
      "client_metadata": {}},
    { "id": 2922132,
      "num_leases": 0,
      "num_caps": 150,
      "state": "open",
      "replay_requests": 0,
      "reconnecting": false,
      "inst": "client.2922132 10.214.131.102:0\/951298617",
      "client_metadata": {}},
    { "id": 1756771,
      "num_leases": 0,
      "num_caps": 94,
      "state": "open",
      "replay_requests": 0,
      "reconnecting": false,
      "inst": "client.1756771 10.214.137.25:0\/1841820156",
      "client_metadata": {}},
    { "id": 4894101,
      "num_leases": 5476,
      "num_caps": 104401,
      "state": "open",
      "replay_requests": 0,
      "reconnecting": false,
      "inst": "client.4894101 10.214.137.23:0\/2571774570",
      "client_metadata": {}},
    { "id": 1756816,
      "num_leases": 0,
      "num_caps": 1,
      "state": "open",
      "replay_requests": 0,
      "reconnecting": false,
      "inst": "client.1756816 10.214.137.27:0\/2508210603",
      "client_metadata": {}}]

So aside from the inconsistency between mons, the warning looks bogus, as the named session only has 150 caps.

Actions

Copy link

Updated by John Spray over 9 years ago

Status changed from New to In Progress

Reproduced this locally by just allowing 3 mons in a vstart cluster and following the procedure from the mds_client_limits/_test_client_pin test.

Actions

Copy link

Updated by John Spray over 9 years ago

Status changed from In Progress to Fix Under Review

master: https://github.com/ceph/ceph/pull/2989
giant: https://github.com/ceph/ceph/pull/2990

Actions

Copy link

Updated by John Spray over 9 years ago

Opened PR against master instead of next by mistake. Next PR is https://github.com/ceph/ceph/pull/2996

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Status changed from Fix Under Review to Pending Backport

Merged to master as of commit:aa4d1478647ce416e9cf4e8fcd32411230639f40. I like to let things go through testing before backporting, so I'll let you do that John.

Actions

Copy link

Updated by John Spray over 9 years ago

Status changed from Pending Backport to Resolved

The version on next has a pass on client-limits (the one that exercises health): http://pulpito.front.sepia.ceph.com/sage-2014-12-01_11:11:17-fs-next-distro-basic-multi/628932/

Merged backport to giant:

commit c8b46d68c71f66d4abbda1230741cc4c7284193b
Author: John Spray <john.spray@redhat.com>
Date:   Mon Nov 24 11:00:25 2014 +0000

    mon: fix MDS health status from peons

    The health data was there, but we were attempting
    to enumerate MDS GIDs from pending_mdsmap (empty on
    peons) instead of mdsmap (populated from paxos updates)

    Fixes: #10151
    Backport: giant

    Signed-off-by: John Spray <john.spray@redhat.com>
    (cherry picked from commit 0c33930e3a90f3873b7c7b18ff70dec2894fce29)

    Conflicts:
        src/mon/MDSMonitor.cc

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #10151

mds client cache pressure health warning oscillates on/off

Updated by John Spray over 9 years ago

Updated by John Spray over 9 years ago

Updated by John Spray over 9 years ago

Updated by John Spray over 9 years ago

Updated by Greg Farnum over 9 years ago

Updated by John Spray over 9 years ago