osd_map_message_max default is too high?
In a thread on ceph-users , three different users with fairly large clusters (~600 OSDs, ~3500 OSDs) reported running into a kernel client limit on the size of the front section of the message:
Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789 session lost, hunting for new mon Dec 26 19:28:53 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:28:58 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established
#define CEPH_MSG_MAX_FRONT_LEN (16*1024*1024)
The default for osd_map_message_max was reduced to 40 in luminous, but still appears to be too high. While CEPH_MSG_MAX_FRONT_LEN is just an arbitrary constant and I can certainly bump it, I'm not sure that's the right thing to do.
Should osd_map_message_max be further reduced to 20 or 10 or better yet expressed in bytes?
#14 Updated by Nathan Cutler 6 months ago
Luminous backport analysis:
- https://github.com/ceph/ceph/pull/26340 - two of three commits backported to luminous by https://github.com/ceph/ceph/pull/28640
- https://github.com/ceph/ceph/pull/26413 - not backported
- https://github.com/ceph/ceph/pull/26448 - not backported
So luminous should be OK, but it would be nice to get confirmation from a core dev - @Neha?