improve logging statements
As discussed at some point, one of the annoyances is to have very cryptic logging messages (cryptic to those who have never seen anything like it
or are developers).
On the #ceph channel a day ago:
<user> can anyone help me interpret this line: 7fede0763700 0 -- :/1040921 >> 172.16.17.55:6789/0 pipe(0x7feddc022470 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7feddc0226e0).fault <other user> the machine youre on is unable to contact the monitor at 172.16.17.55
If the machine is unable to contact the monitor, it would be really useful to have something that says that.
#4 Updated by John Spray about 4 years ago
The log message is fine. What's users need is an added mechanism that recognises when a pipe fault is actually indicating that there is a network issue (the two are not the same).
Services with network problems are an intrinsically hard problem, because they can't phone home to register their complaints (i.e. we can't flag this issue in "ceph status", or have it appear in "ceph -w").
A workable solution might look like a health thread internal to the process, that watches for issues like this, and has special cases for things like "I've been up for a minute but none of my attempted mon connections worked so far", at which point it could emit more readable log messages that explained the situation.