https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2013-09-16T08:20:37ZCeph RADOS - Feature #6325: mon: mon_status should make it clear when the mon has connection issueshttps://tracker.ceph.com/issues/6325?journal_id=276142013-09-16T08:20:37ZJoao Eduardo Luis
<ul><li><strong>Subject</strong> changed from <i>mon_status should make it clear when the mon has connection issues</i> to <i>mon: mon_status should make it clear when the mon has connection issues</i></li><li><strong>Assignee</strong> set to <i>Joao Eduardo Luis</i></li><li><strong>Source</strong> changed from <i>other</i> to <i>Community (dev)</i></li></ul> RADOS - Feature #6325: mon: mon_status should make it clear when the mon has connection issueshttps://tracker.ceph.com/issues/6325?journal_id=276402013-09-17T02:04:01ZJoao Eduardo Luis
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Feature</i></li></ul> RADOS - Feature #6325: mon: mon_status should make it clear when the mon has connection issueshttps://tracker.ceph.com/issues/6325?journal_id=276422013-09-17T02:12:42ZJoao Eduardo Luis
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>4</i></li></ul><p>possible approach:</p>
<p>Considering the class Monitor is a dispatcher of the messenger, add a new courtesy function to the messenger 'ms_handle_error()'. Class Monitor will implement this function and add each error to a list. From time to time, the monitor may check if the elements on the list are still valid by attempting to reproduce them, although this feels a bit more under the responsibility of the messenger itself.</p>
<p>Showing these errors on mon_status is just a matter of going through the list.</p>
<p>Given a TTL per error. the monitor will periodically pop the head of the list.</p>
<p>Then again, while this is a simple way to get the errors to the monitor, it feels a lot like this is something that should be handled by the messenger itself: Having a list populated with the errors of the messenger, then periodically attempt to check if they are still a thing while making sure we do not attempt to reproduce an error more than a couple of times,</p>
<p>Just a thought,</p> RADOS - Feature #6325: mon: mon_status should make it clear when the mon has connection issueshttps://tracker.ceph.com/issues/6325?journal_id=276602013-09-17T12:19:34ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>Hmmm. The issue with doing this in the Messenger is that all those errors are expected to occur at some point — failures happen! And most of the time the failures we're seeing are the fault of the other guy and the cluster will route around them appropriately (but of course the daemon can't know that in the moment, period). We can add some sort of error reporting interface as you suggest, but we'll need to be careful in designing it — we'd probably want to associate the error with a Connection, but we need to be sure the Connection stays valid long enough. (I forget if we've got them properly ref-counted now or not.)<br />Then the daemon can use its greater knowledge to decide if the error is a problem or not. I don't think a list of errors that we spit out is the right answer, though — there are good odds of it just filling up with garbage from disappearing clients that we don't care about. Instead we'd design the interface well, and then the Monitor can look at incoming failures and keep track of failed connections to other monitors to do analysis on — eg, "lost connection to mon.x" or "getting connect errors for every monitor!"</p> RADOS - Feature #6325: mon: mon_status should make it clear when the mon has connection issueshttps://tracker.ceph.com/issues/6325?journal_id=948612017-07-11T13:54:23ZJoao Eduardo Luis
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>RADOS</i></li><li><strong>Category</strong> changed from <i>Monitor</i> to <i>Administration/Usability</i></li><li><strong>Status</strong> changed from <i>4</i> to <i>New</i></li></ul> RADOS - Feature #6325: mon: mon_status should make it clear when the mon has connection issueshttps://tracker.ceph.com/issues/6325?journal_id=948632017-07-11T13:55:41ZJoao Eduardo Luis
<ul><li><strong>Component(RADOS)</strong> <i>Monitor</i> added</li></ul> RADOS - Feature #6325: mon: mon_status should make it clear when the mon has connection issueshttps://tracker.ceph.com/issues/6325?journal_id=948682017-07-11T14:03:21ZJoao Eduardo Luis
<ul><li><strong>Target version</strong> set to <i>v13.0.0</i></li></ul> RADOS - Feature #6325: mon: mon_status should make it clear when the mon has connection issueshttps://tracker.ceph.com/issues/6325?journal_id=1721042020-07-31T11:38:24ZJoao Eduardo Luis
<ul><li><strong>Assignee</strong> deleted (<del><i>Joao Eduardo Luis</i></del>)</li></ul>