Bug #13173
Many ceph clusters connect to the calamari.There is another code bug in request_collection.py
0%
Description
When i test under many clusters. I got a problme:
I update one pool's pg_num of clusterA, and update another pool of clusterB in the same time.
Then i visit API:"/api/v2/request".I got the result as follow:
But these two request belongs to different clusters:
I view the code,find this:
In cluster_montior.py:
def _set_favorite(self, minion_id):
assert minion_id != self._favorite_mon
self.*_requests.fail_all*(minion_id)
self._favorite_mon = minion_id
and in request_collection.py:
def fail_all(self, failed_minion):
"""
For use when we lose contact with the minion that was in use for running
requests: assume all these requests are never going to return now.
"""
for request in self.get_all(UserRequest.SUBMITTED):
with self._update_index(request):
request.set_warn("Lost contact with server %s" % failed_minion)
if request.jid:
log.error("Giving up on JID %s" % request.jid)
request.jid = None
request.complete()
Here will killed all requests
So i think, This function should add a parament fsid. If not, when one cluster's _favorite_mon changed, here will killed all requests include these not belong to this cluster
Associated revisions
cthulhu: Fail correct UserRequests when choosing a new favorite mon
cluster_monitor watches a specific cluster FSID
request_collection is a collection of all requests in all clusters
This fix changes the filter that request_collection uses when fail_all()
is called.
Fixes: #13173
Signed-off-by: Gregory Meno <gmeno@redhat.com>
History
#1 Updated by ceph zte over 8 years ago
def _set_favorite(self, minion_id):
assert minion_id != self._favorite_mon
self._requests.fail_all(minion_id) #I think here should be self._favorite_mon
self._favorite_mon = minion_id
#2 Updated by Christina Meno over 8 years ago
for request in self.get_all(UserRequest.SUBMITTED): should be filtered on fsid. Is that correct?
#3 Updated by ceph zte over 8 years ago
I think should modify as follow, when one cluster changed the _favourite_mon, only complete the requets which have the same fsid, and the noets shoud be "Lost contact with server xxx old montior" ?
- if request.fsid != fsid:
continue *
with self._update_index(request):
request.set_warn("Lost contact with server %s" % failed_minion)
if request.jid:
log.error("Giving up on JID %s" % request.jid)
request.jid = None
request.complete()
def _set_favorite(self, minion_id):
assert minion_id != self._favorite_mon
self._requests.fail_all(_favorite_mon)
self._favorite_mon = minion_id
#4 Updated by ceph zte over 8 years ago
I think should modify as follow, when one cluster changed the _favourite_mon, only complete the requets which have the same fsid, and the noets shoud be "Lost contact with server xxx old montior"?
- if request.fsid != fsid:
continue*
with self._update_index(request):
request.set_warn("Lost contact with server %s" % failed_minion)
if request.jid:
log.error("Giving up on JID %s" % request.jid)
request.jid = None
request.complete()
def _set_favorite(self, minion_id):
assert minion_id != self._favorite_mon
self._requests.fail_all*(_favorite_mon)*
self._favorite_mon = minion_id
#5 Updated by Christina Meno over 8 years ago
Ok got it.
Seems like this module would benefit from some more test coverage.
You can wrap code samples with < pre > tags to maintain the formatting see http://tracker.ceph.com/help/wiki_syntax.html
#6 Updated by Christina Meno over 8 years ago
- Status changed from New to Resolved