Actions
Bug #21604
openmds: may recall all client caps (?) because dirty/pinned metadata is not flushed
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Testing tasks.cephfs.test_client_limits.TestClientLimits.test_client_pin with this patch:
$ git diff diff --git a/qa/tasks/cephfs/test_client_limits.py b/qa/tasks/cephfs/test_client_limits.py index 14ac2c4..1789b7f 100644 --- a/qa/tasks/cephfs/test_client_limits.py +++ b/qa/tasks/cephfs/test_client_limits.py @@ -90,8 +90,8 @@ class TestClientLimits(CephFSTestCase): raise elif num_caps == mds_min_caps_per_client: return True - elif num_caps == int((1.0-mds_max_ratio_caps_per_client)*(open_files+2)): - return True + #elif num_caps == int((1.0-mds_max_ratio_caps_per_client)*(open_files+2)): + #return True else: return False
$ ../src/vstart.sh -d -b -l -n $ python2 ../qa/tasks/vstart_runner.py --interactive tasks.cephfs.test_client_limits.TestClientLimits.test_client_pin [...] 2017-09-29 10:27:13,124.124 INFO:tasks.cephfs.test_client_limits:Terminating process holding files open 2017-09-29 10:27:13,124.124 INFO:__main__:kill 2017-09-29 10:27:13,125.125 INFO:__main__:kill: killing pid 8534 (['python', '-c', '\nimport sys\nimport time\nimport os\n\nn = 800\nabs_path = "/tmp/tmpXH62es/mnt.0/subdir/mount_a"\n\nif not os.path.exists(os.path.dirname(abs_path)):\n os.makedirs(os.path.dirname(abs_path))\n\nhandles = []\nfor i in range(0, n):\n fname = "{0}_{1}".format(abs_path, i)\n handles.append(open(fname, \'w\'))\n\nwhile True:\n time.sleep(1)\n']) 2017-09-29 10:27:13,143.143 INFO:__main__:run args=['./bin/ceph', 'fs', 'dump', '--format=json'] 2017-09-29 10:27:13,143.143 INFO:__main__:Running ['./bin/ceph', 'fs', 'dump', '--format=json'] 2017-09-29 10:27:13,483.483 INFO:__main__:run args=['./bin/ceph', 'daemon', 'mds.c', 'session', 'ls'] 2017-09-29 10:27:13,484.484 INFO:__main__:Running ['./bin/ceph', 'daemon', 'mds.c', 'session', 'ls'] 2017-09-29 10:27:13,662.662 INFO:tasks.cephfs.filesystem:_json_asok output: [ { "id": 4227, "num_leases": 0, "num_caps": 1, "state": "open", "replay_requests": 0, "completed_requests": 1, "reconnecting": false, "inst": "client.4227 127.0.0.1:0/1930023096", "client_metadata": { "ceph_sha1": "1aef50a1edbd2a41e7694ffb61bfd546e5648bf9", "ceph_version": "ceph version 12.1.2-2473-g1aef50a (1aef50a1edbd2a41e7694ffb61bfd546e5648bf9) mimic (dev)", "entity_id": "1", "hostname": "senta02", "mount_point": "/tmp/tmpXH62es/mnt.1", "pid": "8112", "root": "/" } }, { "id": 4229, "num_leases": 0, "num_caps": 802, "state": "open", "replay_requests": 0, "completed_requests": 800, "reconnecting": false, "inst": "client.4229 127.0.0.1:0/3107993577", "client_metadata": { "ceph_sha1": "1aef50a1edbd2a41e7694ffb61bfd546e5648bf9", "ceph_version": "ceph version 12.1.2-2473-g1aef50a (1aef50a1edbd2a41e7694ffb61bfd546e5648bf9) mimic (dev)", "entity_id": "0", "hostname": "senta02", "mount_point": "/tmp/tmpXH62es/mnt.0", "pid": "8075", "root": "/" } } ] 2017-09-29 10:27:18,667.667 INFO:__main__:run args=['./bin/ceph', 'fs', 'dump', '--format=json'] 2017-09-29 10:27:18,742.742 INFO:__main__:Running ['./bin/ceph', 'fs', 'dump', '--format=json'] 2017-09-29 10:27:20,075.075 INFO:__main__:run args=['./bin/ceph', 'daemon', 'mds.c', 'session', 'ls'] 2017-09-29 10:27:20,075.075 INFO:__main__:Running ['./bin/ceph', 'daemon', 'mds.c', 'session', 'ls'] 2017-09-29 10:27:20,245.245 INFO:tasks.cephfs.filesystem:_json_asok output: [ { "id": 4227, "num_leases": 0, "num_caps": 1, "state": "open", "replay_requests": 0, "completed_requests": 1, "reconnecting": false, "inst": "client.4227 127.0.0.1:0/1930023096", "client_metadata": { "ceph_sha1": "1aef50a1edbd2a41e7694ffb61bfd546e5648bf9", "ceph_version": "ceph version 12.1.2-2473-g1aef50a (1aef50a1edbd2a41e7694ffb61bfd546e5648bf9) mimic (dev)", "entity_id": "1", "hostname": "senta02", "mount_point": "/tmp/tmpXH62es/mnt.1", "pid": "8112", "root": "/" } }, { "id": 4229, "num_leases": 0, "num_caps": 100, "state": "open", "replay_requests": 0, "completed_requests": 800, "reconnecting": false, "inst": "client.4229 127.0.0.1:0/3107993577", "client_metadata": { "ceph_sha1": "1aef50a1edbd2a41e7694ffb61bfd546e5648bf9", "ceph_version": "ceph version 12.1.2-2473-g1aef50a (1aef50a1edbd2a41e7694ffb61bfd546e5648bf9) mimic (dev)", "entity_id": "0", "hostname": "senta02", "mount_point": "/tmp/tmpXH62es/mnt.0", "pid": "8075", "root": "/" } } ] 2017-09-29 10:27:20,245.245 INFO:__main__:run args=['./bin/ceph', 'log', 'Ended test tasks.cephfs.test_client_limits.TestClientLimits.test_client_pin'] 2017-09-29 10:27:20,246.246 INFO:__main__:Running ['./bin/ceph', 'log', 'Ended test tasks.cephfs.test_client_limits.TestClientLimits.test_client_pin']
The MDS is recalling the client caps to the minimum because it cannot/does not flush its dirty metadata:
2017-09-29 10:27:18.441826 7f19cf4e5700 7 mds.0.cache trim bytes_used=1MB limit=1GB reservation=0.05% count=0 2017-09-29 10:27:18.441841 7f19cf4e5700 7 mds.0.cache trim_lru trimming 0 items from LRU size=811 mid=0 pintail=811 pinned=811 2017-09-29 10:27:18.441851 7f19cf4e5700 7 mds.0.cache trim_lru trimmed 0 items 2017-09-29 10:27:18.441855 7f19cf4e5700 10 mds.0.cache trim_client_leases 2017-09-29 10:27:18.442331 7f19cf4e5700 2 mds.0.cache check_memory_usage total 430556, rss 48376, heap 313848, baseline 313848, buffers 0, 160 / 813 inodes have caps, 161 caps, 0.198032 caps per inode 2017-09-29 10:27:18.442367 7f19cf4e5700 10 mds.0.server recall_client_state 0.2, caps per client 100-128 2017-09-29 10:27:18.442373 7f19cf4e5700 10 mds.0.server session client.4229 127.0.0.1:0/3107993577 caps 160, leases 0 2017-09-29 10:27:18.442381 7f19cf4e5700 10 mds.0.16 send_message_client client.4229 127.0.0.1:0/3107993577 client_session(recall_state max_caps 100 max_leases 0) v2 2017-09-29 10:27:18.442387 7f19cf4e5700 1 -- 127.0.0.1:6848/563895976 --> 127.0.0.1:0/3107993577 -- client_session(recall_state max_caps 100 max_leases 0) v2 -- 0x557ffeed2d80 con 0
Once the client reaches the minimum number of caps, the test ends.
Using "flush journal" on the admin socket causes the MDS to finally flush the dirty metadata and unpin the inodes allowing the cache to shrink.
Updated by Greg Farnum over 6 years ago
This sounds like something controlled by config options. The MDS wants to batch up data when flushing to avoid gratuitous numbers of IOs. I don't know what the default sizes are, but if you're running a test with only one client and a very small cache those numbers may be much smaller than the default minimum size.
Updated by Patrick Donnelly over 6 years ago
- Assignee set to Patrick Donnelly
- Priority changed from High to Immediate
Updated by Patrick Donnelly over 6 years ago
- Priority changed from Immediate to Normal
Updated by Patrick Donnelly about 5 years ago
- Assignee deleted (
Patrick Donnelly)
Actions