Actions
Bug #9800
closedclient-limits test is not passing
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/teuthology-2014-10-13_23:04:01-fs-giant-distro-basic-multi/547170
The client isn't dropping its caps:
ubuntu@plana03:~$ sudo ceph -s cluster 6de65115-1809-449f-bb3e-29f304d73025 health HEALTH_WARN mds0: Client plana03:1 failing to respond to capability release; mds0: Client plana03:0 failing to respond to cache pressure
I'm a little confused because while the MDS shows plana03:0 has 181 caps,
ubuntu@plana39:~$ sudo ceph daemon /var/run/ceph/ceph-mds.a.asok session ls [ { "id": 4176, "num_leases": 0, "num_caps": 0, "state": "closed", "replay_requests": 0, "reconnecting": false, "inst": "client.4176 10.214.131.37:0\/7083", "client_metadata": { "entity_id": "1", "hostname": "plana03", "mount_point": "\/home\/ubuntu\/cephtest\/mnt.1"}}, { "id": 4177, "num_leases": 0, "num_caps": 181, "state": "open", "replay_requests": 0, "reconnecting": false, "inst": "client.4177 10.214.131.37:0\/7123", "client_metadata": { "entity_id": "0", "hostname": "plana03", "mount_point": "\/home\/ubuntu\/cephtest\/mnt.0"}}, { "id": 4178, "num_leases": 0, "num_caps": 2, "state": "open", "replay_requests": 0, "reconnecting": false, "inst": "client.4178 10.214.131.37:0\/7166", "client_metadata": { "entity_id": "1", "hostname": "plana03", "mount_point": "\/home\/ubuntu\/cephtest\/mnt.1"}}]
...the client itself claims to have 503 caps!
ubuntu@plana03:~$ sudo ceph daemon /var/run/ceph/ceph-client.0.7123.asok mds_sessions { "id": 4177, "sessions": [ { "mds": 0, "addr": "10.214.132.39:6805\/8408", "seq": 1, "cap_gen": 0, "cap_ttl": "2014-10-16 13:42:56.919437", "last_cap_renew_request": "2014-10-16 13:41:56.919437", "cap_renew_seq": 6448, "num_caps": 503, "state": "open"}], "mdsmap_epoch": 38}
The MDS log didn't have much of use, but it does have a stuck request from client.1 in addition to client.0 not dropping caps:
2014-10-15 01:54:12.079232 7fb64891a700 0 log_channel(default) log [WRN] : client.4178 isn't responding to mclientcaps(revoke), ino 100000001f5 pending pAsxLsFsxcrwb issued pAsxLsXsxFsxcrwb, sent 63.165651 seconds ago 2014-10-15 01:54:12.079268 7fb64891a700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 63.165923 secs 2014-10-15 01:54:12.079272 7fb64891a700 0 log_channel(default) log [WRN] : slow request 63.165923 seconds old, received at 2014-10-15 01:53:08.913324: client_request(client.4178:12 rmxattr #100000001f5 security.ima 2014-10-15 01:53:08.913046) currently failed to xlock, waiting 2
ubuntu@plana39:~$ sudo ceph daemon /var/run/ceph/ceph-mds.a.asok dump_ops_in_flight { "ops": [ { "description": "client_request(client.4178:12 rmxattr #100000001f5 security.ima 2014-10-15 01:53:08.913046)", "initiated_at": "2014-10-15 01:53:08.913324", "age": "129435.891259", "duration": "0.000279", "type_data": [ "failed to xlock, waiting", "client.4178:12", "client_request", { "client": "client.4178", "tid": 12}, [ { "time": "2014-10-15 01:53:08.913324", "event": "initiated"}, { "time": "2014-10-15 01:53:08.913603", "event": "failed to xlock, waiting"}]]}], "num_ops": 1}
ubuntu@plana03:~$ sudo ceph daemon /var/run/ceph/ceph-client.0.7123.asok mds_requests {} ubuntu@plana03:~$ sudo ceph daemon /var/run/ceph/ceph-client.1.7166.asok mds_requests { "request": { "tid": 12, "op": "rmxattr", "path": "#100000001f5", "path2": "security.ima", "ino": "100000001f5", "hint_ino": "0", "sent_stamp": "2014-10-15 01:53:08.913049", "mds": 0, "resend_mds": -1, "send_to_auth": 0, "sent_on_mseq": 0, "retry_attempt": 0, "got_unsafe": 0, "uid": 1000, "gid": 1000, "oldest_client_tid": 12, "mdsmap_epoch": 0, "flags": 0, "num_retry": 0, "num_fwd": 0, "num_releases": 0}}
It was blocking machines so I killed it, and the logs are turned way down so there was nothing useful at all in the client ones. I'm hoping this is apparent upon inspection; otherwise it'll pop up again and we should perhaps add debugging output to this test.
Actions