Project

General

Profile

Actions

Bug #9800

closed

client-limits test is not passing

Added by Greg Farnum over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/teuthology-2014-10-13_23:04:01-fs-giant-distro-basic-multi/547170

The client isn't dropping its caps:

ubuntu@plana03:~$ sudo ceph -s
    cluster 6de65115-1809-449f-bb3e-29f304d73025
     health HEALTH_WARN mds0: Client plana03:1 failing to respond to capability release; mds0: Client plana03:0 failing to respond to cache pressure

I'm a little confused because while the MDS shows plana03:0 has 181 caps,

ubuntu@plana39:~$ sudo ceph daemon /var/run/ceph/ceph-mds.a.asok session ls
[
    { "id": 4176,
      "num_leases": 0,
      "num_caps": 0,
      "state": "closed",
      "replay_requests": 0,
      "reconnecting": false,
      "inst": "client.4176 10.214.131.37:0\/7083",
      "client_metadata": { "entity_id": "1",
          "hostname": "plana03",
          "mount_point": "\/home\/ubuntu\/cephtest\/mnt.1"}},
    { "id": 4177,
      "num_leases": 0,
      "num_caps": 181,
      "state": "open",
      "replay_requests": 0,
      "reconnecting": false,
      "inst": "client.4177 10.214.131.37:0\/7123",
      "client_metadata": { "entity_id": "0",
          "hostname": "plana03",
          "mount_point": "\/home\/ubuntu\/cephtest\/mnt.0"}},
    { "id": 4178,
      "num_leases": 0,
      "num_caps": 2,
      "state": "open",
      "replay_requests": 0,
      "reconnecting": false,
      "inst": "client.4178 10.214.131.37:0\/7166",
      "client_metadata": { "entity_id": "1",
          "hostname": "plana03",
          "mount_point": "\/home\/ubuntu\/cephtest\/mnt.1"}}]

...the client itself claims to have 503 caps!

ubuntu@plana03:~$ sudo ceph daemon /var/run/ceph/ceph-client.0.7123.asok mds_sessions
{ "id": 4177,
  "sessions": [
        { "mds": 0,
          "addr": "10.214.132.39:6805\/8408",
          "seq": 1,
          "cap_gen": 0,
          "cap_ttl": "2014-10-16 13:42:56.919437",
          "last_cap_renew_request": "2014-10-16 13:41:56.919437",
          "cap_renew_seq": 6448,
          "num_caps": 503,
          "state": "open"}],
  "mdsmap_epoch": 38}

The MDS log didn't have much of use, but it does have a stuck request from client.1 in addition to client.0 not dropping caps:

2014-10-15 01:54:12.079232 7fb64891a700  0 log_channel(default) log [WRN] : client.4178 isn't responding to mclientcaps(revoke), ino 100000001f5 pending pAsxLsFsxcrwb issued pAsxLsXsxFsxcrwb, sent 63.165651 seconds ago
2014-10-15 01:54:12.079268 7fb64891a700  0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 63.165923 secs
2014-10-15 01:54:12.079272 7fb64891a700  0 log_channel(default) log [WRN] : slow request 63.165923 seconds old, received at 2014-10-15 01:53:08.913324: client_request(client.4178:12 rmxattr #100000001f5 security.ima 2014-10-15 01:53:08.913046) currently failed to xlock, waiting
2

ubuntu@plana39:~$ sudo ceph daemon /var/run/ceph/ceph-mds.a.asok dump_ops_in_flight
{ "ops": [
        { "description": "client_request(client.4178:12 rmxattr #100000001f5 security.ima 2014-10-15 01:53:08.913046)",
          "initiated_at": "2014-10-15 01:53:08.913324",
          "age": "129435.891259",
          "duration": "0.000279",
          "type_data": [
                "failed to xlock, waiting",
                "client.4178:12",
                "client_request",
                { "client": "client.4178",
                  "tid": 12},
                [   
                    { "time": "2014-10-15 01:53:08.913324",
                      "event": "initiated"},
                    { "time": "2014-10-15 01:53:08.913603",
                      "event": "failed to xlock, waiting"}]]}],
  "num_ops": 1}

ubuntu@plana03:~$ sudo ceph daemon /var/run/ceph/ceph-client.0.7123.asok mds_requests
{}
ubuntu@plana03:~$ sudo ceph daemon /var/run/ceph/ceph-client.1.7166.asok mds_requests
{ "request": { "tid": 12,
      "op": "rmxattr",
      "path": "#100000001f5",
      "path2": "security.ima",
      "ino": "100000001f5",
      "hint_ino": "0",
      "sent_stamp": "2014-10-15 01:53:08.913049",
      "mds": 0,
      "resend_mds": -1,
      "send_to_auth": 0,
      "sent_on_mseq": 0,
      "retry_attempt": 0,
      "got_unsafe": 0,
      "uid": 1000,
      "gid": 1000,
      "oldest_client_tid": 12,
      "mdsmap_epoch": 0,
      "flags": 0,
      "num_retry": 0,
      "num_fwd": 0,
      "num_releases": 0}}

It was blocking machines so I killed it, and the logs are turned way down so there was nothing useful at all in the client ones. I'm hoping this is apparent upon inspection; otherwise it'll pop up again and we should perhaps add debugging output to this test.

Actions

Also available in: Atom PDF