Explanation for cache pressure
Following up on the thread in the ceph-users mailing list. The SUSE team provided some valuable information about the reoccuring MDS issue "X clients failing to respond to cache pressure". It would be really helpful to the community to add some of that information to the docs.
Turning on debug logs for a short period of time revealed these messages:
2022-06-15 10:07:34.254 7fdbbd44a700 2 mds.beacon.stmailmds01b-8 Session chead015:cephfs_client (2757628057) is not releasing caps fast enough. Recalled caps at 390118 > 262144 (mds_recall_warning_threshold).
The explanation was:
Every second (mds_cache_trim_interval config param) the mds is running "cache trim" procedure. One of the steps of this procedure is "recall client state". During this step it checks every client (session) if it needs to recall caps. There are several criteria for this: 1) the cache is full (exceeds mds_cache_memory_limit) and needs some inodes to be released; 2) the client exceeds mds_max_caps_per_client (1M by default); 3) the client is inactive. To determine a client (session) inactivity, the session's cache_liveness parameters is checked and compared with the value: (num_caps >> mds_session_cache_liveness_magnitude) where mds_session_cache_liveness_magnitude is a config param (10 by default). If cache_liveness is smaller than this calculated value the session is considered inactive and the mds sends "recall caps" request for all cached caps (actually the recall value is `num_caps - mds_min_caps_per_client(100)`). And if the client is not releasing the caps fast, the next second it repeats again, i.e. the mds will send "recall caps" with high value again and so on and the "total" counter of "recall caps" for the session will grow, eventually exceeding the mon warning limit. There is a throttling mechanism, controlled by mds_recall_max_decay_threshold parameter (126K by default), which should reduce the rate of "recall caps" counter grow but it looks like it is not enough for this case. From the collected sessions, I see that during that 30 minute period the total num_caps for that client decreased by about 3500. ... Here is an example. A client is having 20k caps cached. At some moment the server decides the client is inactive (because the session's cache_liveness value is low). It starts to ask the client to release caps down to mds_min_caps_per_client value (100 by default). For this every seconds it sends recall_caps asking to release `caps_num - mds_min_caps_per_client` caps (but not more than `mds_recall_max_caps`, which is 30k by default). A client is starting to release, but is releasing with a rate e.g. only 100 caps per second. So in the first second the mds sends recall_caps = 20k - 100 the second second recall_caps = (20k - 100) - 100 the third second recall_caps = (20k - 200) - 100 and so on. And every time it sends recall_caps it updates the session's recall_caps value, which is calculated how many recall_caps sent in the last minute. I.e. the counter is growing quickly, eventually exceeding mds_recall_warning_threshold, which is 128K by default, and ceph starts to report "failing to respond to cache pressure" warning in the status. Now, after we set mds_recall_max_caps to 3K, in this situation the mds server sends only 3K recall_caps per second, and the maximum value the session's recall_caps value may have (if the mds is sending 3K every second for at least one minute) is 60 * 3K = 180K. I.e. it is still possible to achieve mds_recall_warning_threshold but only if a client is not "responding" for long period, and as your experiments show it is not the case.
So we incrementally decreased mds_recall_max_caps until the warnings disappeared, that was when we set mds_recall_max_caps = 3000. We haven't seen this warning since then.
#4 Updated by Venky Shankar 2 months ago
Eugen Block wrote:
Venky Shankar wrote:
Eugen, thanks for the detailed explanation. It would be immensely help if you could carve all this into a PR :)
I wouldn't even know where to begin, to be honest. I'm not familiar with the procedures at all, I've never created a PR.
Fair enough. I'll push a PR Eugene.