Bug #22570
closed
out of order caused by letting old msg from down peer be processed to RESETSESSION
Added by mingxin liu over 6 years ago.
Updated over 4 years ago.
Description
1.slave ack two op(op1,op2) to primary
2.op1 was dropped by con reset
3.op2 was sent to primary
4.primary use service's map to determine if slave is down(need drop this kind of msg, but service's map is laggy then osd's)
5.primary process op2, causing out of order
assert(repop_queue.front() == repop);
Do you have logs or more about how this happened? There are a bunch of guards to prevent exactly this in cases where a connection reset happens. They might be leaky, but we'll need a little more to go on in identifying what went wrong.
- Subject changed from out of order caused by letting old msg from down peer be processed to RESETSESSION and OSD peer connections fundamentally racy
- Status changed from New to 12
- Related to Bug #21143: bad RESETSESSION between OSDs? added
- Subject changed from RESETSESSION and OSD peer connections fundamentally racy to out of order caused by letting old msg from down peer be processed to RESETSESSION
- Status changed from 12 to Resolved
actaully, see existing ticket #21143
- Project changed from RADOS to Messengers
- Status changed from Resolved to Pending Backport
- Backport set to luminous, mimic
- Copied to Backport #42586: luminous: out of order caused by letting old msg from down peer be processed to RESETSESSION added
- Pull request ID set to 19796
- Backport changed from luminous, mimic to luminous
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Also available in: Atom
PDF