Project

General

Profile

Bug #13173

Many ceph clusters connect to the calamari.There is another code bug in request_collection.py

Added by ceph zte over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

When i test under many clusters. I got a problme:
I update one pool's pg_num of clusterA, and update another pool of clusterB in the same time.
Then i visit API:"/api/v2/request".I got the result as follow:

But these two request belongs to different clusters:

I view the code,find this:
In cluster_montior.py:

def _set_favorite(self, minion_id):
assert minion_id != self._favorite_mon
self.*_requests.fail_all*(minion_id)
self._favorite_mon = minion_id

and in request_collection.py:

def fail_all(self, failed_minion):
"""
For use when we lose contact with the minion that was in use for running
requests: assume all these requests are never going to return now.
"""
for request in self.get_all(UserRequest.SUBMITTED):
with self._update_index(request):
request.set_warn("Lost contact with server %s" % failed_minion)
if request.jid:
log.error("Giving up on JID %s" % request.jid)
request.jid = None
request.complete()

Here will killed all requests

So i think, This function should add a parament fsid. If not, when one cluster's _favorite_mon changed, here will killed all requests include these not belong to this cluster

api_v2_cluster_fsid1_request.png View (22.7 KB) ceph zte, 09/19/2015 08:40 AM

api_v2_request.png View (37.4 KB) ceph zte, 09/19/2015 08:40 AM

api_v2_cluster_fsid2_request.png View (23.8 KB) ceph zte, 09/19/2015 08:40 AM

Associated revisions

Revision 4d3c5b96 (diff)
Added by Gregory Meno over 8 years ago

cthulhu: Fail correct UserRequests when choosing a new favorite mon

cluster_monitor watches a specific cluster FSID
request_collection is a collection of all requests in all clusters
This fix changes the filter that request_collection uses when fail_all()
is called.

Fixes: #13173

Signed-off-by: Gregory Meno <>

History

#1 Updated by ceph zte over 8 years ago

def _set_favorite(self, minion_id):
assert minion_id != self._favorite_mon
self._requests.fail_all(minion_id) #I think here should be self._favorite_mon
self._favorite_mon = minion_id

#2 Updated by Christina Meno over 8 years ago

for request in self.get_all(UserRequest.SUBMITTED): should be filtered on fsid. Is that correct?

#3 Updated by ceph zte over 8 years ago

oh?No?
I think should modify as follow, when one cluster changed the _favourite_mon, only complete the requets which have the same fsid, and the noets shoud be "Lost contact with server xxx old montior" ?
  • if request.fsid != fsid:
    continue *
    with self._update_index(request):
    request.set_warn("Lost contact with server %s" % failed_minion)
    if request.jid:
    log.error("Giving up on JID %s" % request.jid)
    request.jid = None
    request.complete()
def _set_favorite(self, minion_id):
assert minion_id != self._favorite_mon
self._requests.fail_all(_favorite_mon)
self._favorite_mon = minion_id

#4 Updated by ceph zte over 8 years ago

oh,No!
I think should modify as follow, when one cluster changed the _favourite_mon, only complete the requets which have the same fsid, and the noets shoud be "Lost contact with server xxx old montior"?
  • if request.fsid != fsid:
    continue*
with self._update_index(request):
request.set_warn("Lost contact with server %s" % failed_minion)
if request.jid:
log.error("Giving up on JID %s" % request.jid)
request.jid = None
request.complete()
def _set_favorite(self, minion_id):
assert minion_id != self._favorite_mon
self._requests.fail_all*(_favorite_mon)*
self._favorite_mon = minion_id

#5 Updated by Christina Meno over 8 years ago

Ok got it.

Seems like this module would benefit from some more test coverage.

You can wrap code samples with < pre > tags to maintain the formatting see http://tracker.ceph.com/help/wiki_syntax.html

#6 Updated by Christina Meno over 8 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF