Bug #13334
closeddelayed revoke warning in test_client_recovery test
0%
Description
http://pulpito.ceph.com/teuthology-2015-09-29_23:04:01-fs-infernalis---basic-multi/1077881/
2015-10-01T03:13:41.515 INFO:teuthology.run:Summary data: {description: 'fs/recovery/{clusters/2-remote-clients.yaml debug/mds_client.yaml dirfrag/frag_enable.yaml mounts/ceph-fuse.yaml tasks/client-recovery.yaml}', duration: 1106.058995962143, failure_reason: '"2015-10-01 03:06:53.013570 mds.0 10.214.134.104:6806/20139 5 : cluster [WRN] client.4537 isn''t responding to mclientcaps(revoke), ino 10000000000 pending pAsxLsXsxFcb issued pAsxLsXsxFsxcrwb, sent 60.308408 seconds ago" in cluster log', flavor: basic, owner: scheduled_teuthology@teuthology, success: false}
I think maybe we just need to whitelist this warning, since we do a lot of skewing around. But perhaps something has gone horribly wrong.
Updated by John Spray over 8 years ago
Yeah, this is racy because mds_session_timeout is 60s, and so is the threshold for emitting that warning.
Actually, we should probably change the timeouts in ceph, because having those two close together means that in the case of dead client, followed by attempt by another client to access a file held by the dead client, users will also see this nondeterministic behaviour where sometimes they get the "isn't responding to" message before the client's evicted and sometimes they don't.
But yeah, the test should whitelist this message anyway.
Updated by John Spray over 8 years ago
Oh, it's even simpler. Can just switch the order of locker->tick and server->find_idle_sessions to get rid of this behaviour when the timeouts are the same.
Updated by Greg Farnum over 8 years ago
- Status changed from New to Resolved