Bug #8504: msgr: FAILED assert(0 == "old msgs despite reconnect_seq feature") - Messengers - Ceph

Actions

Copy link

Bug #8504

closed

msgr: FAILED assert(0 == "old msgs despite reconnect_seq feature")

Added by Greg Farnum almost 10 years ago. Updated about 5 years ago.

Status:

Resolved

Priority:

High

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

ubuntu@teuthology:/a/gregf-2014-05-29_16:19:17-rados-wip-client-fast-dispatch-testing-basic-plana/280874/remote/plana65/log/ceph-osd.0.log.gz

It looks to me like the other end restarted entirely and we didn't reset our in_seq appropriately.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Greg Farnum almost 10 years ago

Probably a bug I and Guang introduced in commit:bdee119076dd0eb65334840d141ccdf06091e3c9, but perhaps not?

Actions

Copy link

Updated by Samuel Just almost 10 years ago

Priority changed from High to Urgent

Actions

Copy link

Updated by Samuel Just almost 10 years ago

Assignee set to Greg Farnum

Actions

Copy link

Updated by Greg Farnum almost 10 years ago

Status changed from New to In Progress
Priority changed from Urgent to High

Not a bug we introduced, but the existing patch only fixes the bug on the accept side. These crashes are on the connect side, which does not reset its expectations.

Actions

Copy link

Updated by Greg Farnum almost 10 years ago

Okay, so the crashed OSD (osd.5) was marked down by the other endpoint (osd.4). Then osd.5 reconnected to osd.4, which for some reason allowed it. (osd.4 marked the connection down again as soon as it got a message delivered.)
osd.4 sent back an "existing_seq" of 0 as part of the final connection reply, which means that it had already been set to 0 in the existing Pipe's in_seq member. Which only happens if accept() detects "is_reset_from_peer" by comparing the provided and existing connect_seq (here, we get the connect_seq value we expect from a not-reset peer).

So it looks like osd.4 called mark_down on the addr-based interface and nuked the Pipe from handle_osd_map()...but then it sent a pg_info out along the Connection interface from a PG. Since it wasn't a lazy send_message(), that starts a new Pipe in connect().
Meanwhile, osd.5 tried to reconnect, and so osd.4's accept() races with its connect(), resulting in the behavior we're witnessing. That's definitely not supposed to happen; get_con_osd_cluster() uses get_nextmap_reserved() to make sure it's up-to-date. It could happen if we get that Connection (thereby dropping the map reservation), and then the Connection is marked down, and then we use it to send a Message — so that may be something we need to fix. But in this case, it doesn't look like that's what happened, because an earlier Message in the same thread appears to have been dropped from send_message_osd_cluster() due to the map marking the OSD out.
So I'm still trying to understand what's actually happened here; I'm going to check out the other failure and see if I can find anything else there.

Actions

Copy link

Updated by Greg Farnum almost 10 years ago

The one I originally reported is much simpler:
osd.0 got marked down
osd.1 called mark_down and then forgot about it
osd.0 sent another message, opening a session
osd.1 said "oh, a new connection" and opened it up (with CEPH_MSGR_TAG_READY)
osd.0 kept around its old state, because it didn't get a RESET of any kind.

So I think we have two issues:
1) The OSD is (somehow) going backwards in time in terms of who it wants to connect to
2) the use of (mid-2012) policy.resetcheck is preventing us from exposing necessary information about when connections are actually going away. I want to discuss with Sam and Sage on these, respectively.

Actions

Copy link

Updated by Greg Farnum almost 10 years ago

Greg Farnum wrote:

So it looks like osd.4 called mark_down on the addr-based interface and nuked the Pipe from handle_osd_map()...but then it sent a pg_info out along the Connection interface from a PG. Since it wasn't a lazy send_message(), that starts a new Pipe in connect().
Meanwhile, osd.5 tried to reconnect, and so osd.4's accept() races with its connect(), resulting in the behavior we're witnessing. That's definitely not supposed to happen; get_con_osd_cluster() uses get_nextmap_reserved() to make sure it's up-to-date. It could happen if we get that Connection (thereby dropping the map reservation), and then the Connection is marked down, and then we use it to send a Message — so that may be something we need to fix.

Realized in discussion with Sage that this can't happen — if the Connection still exists, it's marked as "failed" by the mark_down() call.

Actions

Copy link

Updated by Greg Farnum almost 10 years ago

OSD bug:
1) In handle_osd_map(), we need to wait until nobody is using a (previously-)reserved map prior to doing any mark_down() calls. I think we can just insert a call to await_reserved_maps() here, but need to make sure.
2) The interface around OSDService::get_con_osd_cluster() is basically broken; we need a reserved map to stick around for as long as the ConnectionRef does. We can probably pull an RAII trick here...?

Messenger bug:
There is still an issue, illustrated in my original set of logs:
osd.a marks down osd.b and forgets about it (by not having any outstanding refs to the Connection)
osd.b sends a new message to osd.a, and attempts to reopen the Connection
osd.a doesn't have anything about osd.b, but happily echoes back the global and connect seqs it is given
osd.b crashes on the message seq values from osd.a, never having had a way to find out its Connection was reset on the other side :(

Actions

Copy link

Updated by Greg Farnum almost 10 years ago

get_con_osd_cluster might be okay, since a marked_down Connection won't send any outgoing messages, which is what we want. The problem is doing the Connection lookups and having them race with a mark_down. (send_message_osd_cluster is broken in the same way.)

Actions

Copy link

#10

Updated by Greg Farnum almost 10 years ago

Of course the lookup is supposed to be protected by the map reservation system, but since we're doing mark_down prior to waiting for map reservations we don't have the guarantees we want. (That's just problem 1, making the rest of the stuff look busted.)

Actions

Copy link

#11

Updated by Greg Farnum almost 10 years ago

Status changed from In Progress to 7

Okay, and after more discussion (thanks Sage!), the "Messenger bug" is the same as the OSD bug — the surviving OSD otherwise wouldn't try and send the dead OSD any Messages, and it only does so here because of the mark_down/get_connection() ordering.

Actions

Copy link

#12

Updated by Greg Farnum almost 10 years ago

Status changed from 7 to Pending Backport

Merged to next in commit:8697d6a10ee822e8a032a559c78e211e2182799a

Unfortunately back porting it will require a little more effort; there was some groundwork for fast dispatch which made it easier but doesn't exist in Firefly or Dumpling.

Actions

Copy link

#13

Updated by Greg Farnum almost 10 years ago

Status changed from Pending Backport to Resolved

This doesn't actually require any backports. :)

Prior to the fast dispatch work, the pre_publish_lock was held by both send_message_osd_cluster() and pre_publish_map(); the issue was triggered by changing send_message_osd_cluster to use the map reservation system instead.

Actions

Copy link

#14

Updated by Greg Farnum about 5 years ago

Project changed from Ceph to Messengers
Category deleted (~~msgr~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Messengers

Custom queries

Bug #8504

msgr: FAILED assert(0 == "old msgs despite reconnect_seq feature")

Updated by Greg Farnum almost 10 years ago

Updated by Samuel Just almost 10 years ago

Updated by Samuel Just almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Greg Farnum about 5 years ago