Project

General

Profile

Actions

Bug #38031

closed

Monitor sent <16MB MOSDMap message cause kernel client instability.

Added by Xiaoxi Chen about 5 years ago. Updated about 5 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph Kernel clients (both CephFS and RBD) has a MAX_FRONT_SIZE (16 MB) limit for ceph messages. If such clients received a MOSDMap message with front portion bigger than MAX_FRONT_SIZE, client would terminate the TCP connection and retry.

However, the retry will never succeed, as the cluster side map epoch increase, the requests will request more and more map versions, the response will be bigger and bigger. What is worse, the endless retries with requesting hundreds of versions at a time saturates CPU of monitor, and fill up its message queue.

Under certain circumstances (in combination of cluster size and diff between versions) the total size of N maps together with a few other metadata can exceed MAX_FRONT_SIZE, such message would trigger above mentioned endless retries then exhausted ceph-mon.

We hit this issue in our production environments, and seen other victim's report like https://www.spinics.net/lists/ceph-users/msg50441.html

The proposed fix is lets add a `osd_mon_messages_max_bytes` configuration, so that in `build_incremental` we can cap on both # of maps and # of bytes.


Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #38040: osd_map_message_max default is too high?ResolvedSage Weil

Actions
Actions #1

Updated by Xiaoxi Chen about 5 years ago

  • Category set to Monitor
Actions #2

Updated by Xiaoxi Chen about 5 years ago

  • Related to Bug #38040: osd_map_message_max default is too high? added
Actions #3

Updated by Ilya Dryomov about 5 years ago

  • Related to deleted (Bug #38040: osd_map_message_max default is too high?)
Actions #4

Updated by Ilya Dryomov about 5 years ago

  • Is duplicate of Bug #38040: osd_map_message_max default is too high? added
Actions #5

Updated by Ilya Dryomov about 5 years ago

  • Status changed from New to Duplicate

Xiaoxi, this looks like a duplicate rather than a related issue. Feel free to reopen if I'm mistaken.

Actions

Also available in: Atom PDF