Bug #62560
openProvide a ceph subcommand for a client-side connectivity test
0%
Description
Recently, I had to troubleshoot a customer's cluster where, after adding new OSD hosts, reading certain files off CephFS resulted in a hang. We wasted a lot of time trying to troubleshoot this, but in the end, it boiled down to an ACL misconfiguration on one of the network switches. Namely, all cluster nodes were able to communicate with each other, but the client nodes were not authorized to communicate with new OSD hosts on the network level.
To make troubleshooting easier, please provide a new ceph client subcommand (e.g., "ceph connectivity") that tries to connect to every MON, MGR, OSD, MDS, and maybe also every RADOS gateway, and reports any failures such as connection resets and timeouts. Maybe it should also try to send some large (let's say 16 kilobytes) dummy requests or provoke a large dummy reply in order to test for MTU-related issues.
No data to display