Project

General

Profile

Actions

Cleanup #2432

closed

ceph-client: messenger: refactor to simplify state model

Added by Alex Elder almost 12 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:

Description

There is a mix of states and flags used in the client messenger code
to track what's going on. The result is a little fuzzy, and working
toward simplifying to a clear state model should make things cleaner
and should be more easily verified.

Actions #1

Updated by Alex Elder almost 12 years ago

I worked on doing this for a good month but the job really isn't
complete. Nevertheless I think there was some progress.
- The socket underlying a ceph connection now has a clear state
diagram as well as simple functions that verify transitions are
valid (or rather, only what is expected).
- The two phases of a connection sequence have been teased apart
a bit.
- Flags are now held in a field distinct from the one that keeps
track of connection state in a ceph connection structure.
- Some state-related information was not being managed correctly;
for example NEGOTIATING state was being set but never cleared
once a connection was established.
- Connection state is no longer changed without holding the mutex
in the socket event handler; instead a flag is set, and con_work()
changes the state.

And as is my wont, the code underwent an ongoing series of
refactorizations along the way.

Actions #2

Updated by Alex Elder almost 12 years ago

I had worked out on paper some notes about a longer-term state/event
model that could be used for the client messenger. I'm turning my
attention away from the messenger now and I finally got around to
documenting my notes more formally yesterday, fleshing it all out
a bit (but not completely) as I did.

I sent it to Sage, who did not respond. But in case it is helpful
I'm going to add it here so it's saved with the bug that documents
the desire for doing something like this.

Wed Jun 27 07:38:01 PDT 2012

This is pretty detailed but it might be missing big swaths of stuff.
First I list the states that I think are related to the process of
establishing a ceph connection, then I list the events that I think
can cause a state change.

Here are states related to the ceph connection process:

- NEW
Transient initial state
--> Just until data structure is initialized

- CLOSED
Connection data structure initialized
Socket closed
--> Waiting for an open request

- OPENING
Peer address has been set
Backoff delay has been reset <-- should happen when closed
--> Waiting for worker to begin connection process

- CONNECTING
Prepared to receive banner exchange response
Banner exchange request message queued
Socket gets created/opened and TCP connection initiated
--> Waiting for TCP connection to be established
<-- Should another state go here? I think so.
--> Then waiting for receipt of banner exchange response

- NEGOTIATING
Banner exchange response has been validated
Prepared to receive connection info exchange response
Connection info exchange request message queued
--> Then waiting for receipt of conn info exchange response

- CONNECTED
Connection info exchange response has been validated
--> Ready to use; waiting for send/receive activity. There
we may have yet another (sub-)state diagram here--which
might include backoff and standby

NEW -> CLOSED -> OPENING -> CONNECTING -> NEGOTIATING -> CONNECTED

In any of these after CLOSED we could get an event that would
change it to one of these additional states:
- DISCONNECTING
Socket event indicated externally-initiated close
--> Waiting for worker to complete the close operation under
mutex
Note: this might also be used for close initiated by a user of
the connection--like a monitor client.

Here are states related to the ceph connection process:

The "external" sources of events that can cause a state change are:
- externally-initiated requests (callers on local host)
- connection init request
- connection open request
- connection message send request
- (received messages are handled by connection's dispatch operation)
- connection message revoke request
- connection incoming message revoke request <-- not sure why
- connection close request
- socket events
- socket state change -> established
- socket state change -> closed/close_wait
- socket data ready
- socket write space

In addition:
- activation of the worker thread (con_work()) can lead to a change
- opened connection sending data
- disconnected connection closing down connection
- any errors while doing otherwise normal processing
- malloc failure
- socket send failure
- socket receive failure
- protocol/content errors
- unexpected tag during connection
- peer address returned not what was expected
- bad sequence number number received (we are ahead by > 1)
- crc does not match associated content (EBADMSG)
- out-of-range/invalid field value (-EIO)
non-READY tags from the peer during connection
- FEATURES (client lacks features needed by peer)
- BADPROTOVER (client proto version different from what peer
expected)
- BADAUTHORIZER (client-provided authorizer data was not
accepted by the peer)
- RESETSESSION (peer found client and peer sequence number did
not match--requesting reset)
- RETRY_SESSION (peer found client's message sequence number was
too old--requesting re-connect)
- RETRY_GLOBAL (peer found client's global sequence number was
too old--requesting re-connect)
- WAIT (connection race; client shouldn't get this)

Table shows next state, blanks are unexpected, and represent errors.
Lots of stuff neesd to be filled in, I've just started with the easy
ones... | connection init |CLOSED | | | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | connection open | |OPENING| | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | con_work() | | |CON'ING| | | |CLOSED | |---------------------+-------+-------+-------+-------+-------+-------+-------| | socket established | | | |(same) | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | banner response | | | |NEG'ING| | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | conn info response | | | | |CONN'ED| | | |=====================+=======+=======+=======+=======+=======+=======+=======| | socket closed | | | | | | | | |=====================+=======+=======+=======+=======+=======+=======+=======| | "normal" errors | | | | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | protocol errors | | | | | | | | |=====================+=======+=======+=======+=======+=======+=======+=======| | FEATURES tag | | | | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | BADAUTHORIZER tag | | | | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | RESETSESSION tag | | | | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | RETRY_SESSION tag | | | | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | RETRY_GLOBAL tag | | | | | | | | |=====================+=======+=======+=======+=======+=======+=======+=======| | connection close | | | | | |DISCON | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | message send | | | | | |(same) | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | message revoke | | | | | |(same) | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | incoming revoke | | | | | |(same) | | |=====================+=======+=======+=======+=======+=======+=======+=======| | socket data ready | | | | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | socket write space | | | | | | | | |---------------------+-------+-------+-------+-------+-------+-------+-------| | | | | | | | | | | | | | | | | | | | | | | | | | | |
--------------------------+-------+-------+-------+-------+-------+-------+

-----------------------------------------------------
STATES
-------------------------------------------------------
Event
NEW CLOSED OPENING CON'ING NEG'ING CONN'ED DISCON
--------------------------+-------+-------+-------+-------+-------+-------
Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF