RFC 1644






Network Working Group                                          R. Braden
Request for Comments: 1644                                           ISI
Category: Experimental                                         July 1994

                T/TCP -- TCP Extensions for Transactions
                        Functional Specification

Status of this Memo

   This memo describes an Experimental Protocol for the Internet
   community, and requests discussion and suggestions for improvements.
   It does not specify an Internet Standard.  Distribution is unlimited.

Abstract

   This memo specifies T/TCP, an experimental TCP extension for
   efficient transaction-oriented (request/response) service.  This
   backwards-compatible extension could fill the gap between the current
   connection-oriented TCP and the datagram-based UDP.

   This work was supported in part by the National Science Foundation
   under Grant Number NCR-8922231.

Table of Contents

 1. INTRODUCTION ..................................................  2
 2.  OVERVIEW .....................................................  3
    2.1  Bypassing the Three-Way Handshake ........................  4
    2.2  Transaction Sequences ....................................  6
    2.3  Protocol Correctness .....................................  8
    2.4  Truncating TIME-WAIT State ............................... 12
    2.5  Transition to Standard TCP Operation ..................... 14
 3.  FUNCTIONAL SPECIFICATION ..................................... 17
    3.1  Data Structures .......................................... 17
    3.2  New TCP Options .......................................... 17
    3.3  Connection States ........................................ 19
    3.4  T/TCP Processing Rules ................................... 25
    3.5  User Interface ........................................... 28
 4.  IMPLEMENTATION ISSUES ........................................ 30
    4.1  RFC-1323 Extensions ...................................... 30
    4.2  Minimal Packet Sequence .................................. 31
    4.3  RTT Measurement .......................................... 31
    4.4  Cache Implementation ..................................... 32
    4.5  CPU Performance .......................................... 32
    4.6  Pre-SYN Queue ............................................ 33
 6.  ACKNOWLEDGMENTS .............................................. 34
 7.  REFERENCES ................................................... 34
 APPENDIX A.  ALGORITHM SUMMARY ................................... 35



Braden                                                          [Page 1]

RFC 1644                    Transaction/TCP                    July 1994


 Security Considerations .......................................... 38
 Author's Address ................................................. 38

1. INTRODUCTION

   TCP was designed to around the virtual circuit model, to support
   streaming of data.  Another common mode of communication is a
   client-server interaction, a request message followed by a response
   message.  The request/response paradigm is used by application-layer
   protocols that implement transaction processing or remote procedure
   calls, as well as by a number of network control and management
   protocols (e.g., DNS and SNMP).  Currently, many Internet user
   programs that need request/response communication use UDP, and when
   they require transport protocol functions such as reliable delivery
   they must effectively build their own private transport protocol at
   the application layer.

   Request/response, or "transaction-oriented", communication has the
   following features:

   (a)  The fundamental interaction is a request followed by a response.

   (b)  An explicit open or close phase may impose excessive overhead.

   (c)  At-most-once semantics is required; that is, a transaction must
        not be "replayed" as the result of a duplicate request packet.

   (d)  The minimum transaction latency for a client should be RTT +
        SPT, where RTT is the round-trip time and SPT is the server
        processing time.

   (e)  In favorable circumstances, a reliable request/response
        handshake should be achievable with exactly one packet in each
        direction.

   This memo concerns T/TCP, an backwards-compatible extension of TCP to
   provide efficient transaction-oriented service in addition to
   virtual-circuit service.  T/TCP provides all the features listed
   above, except for (e); the minimum exchange for T/TCP is three
   segments.

   In this memo, we use the term "transaction" for an elementary
   request/response packet sequence.  This is not intended to imply any
   of the semantics often associated with application-layer transaction
   processing, like 3-phase commits.  It is expected that T/TCP can be
   used as the transport layer underlying such an application-layer
   service, but the semantics of T/TCP is limited to transport-layer
   services such as reliable, ordered delivery and at-most-once



Braden                                                          [Page 2]

RFC 1644                    Transaction/TCP                    July 1994


   operation.

   An earlier memo [RFC-1379] presented the concepts involved in T/TCP.
   However, the real-world usefulness of these ideas depends upon
   practical issues like implementation complexity and performance.  To
   help explore these issues, this memo presents a functional
   specification for a particular embodiment of the ideas presented in
   RFC-1379.  However, the specific algorithms in this memo represent a
   later evolution than RFC-1379.  In particular, Appendix A in RFC-1379
   explained the difficulties in truncating TIME-WAIT state.  However,
   experience with an implementation of the RFC-1379 algorithms in a
   workstation later showed that accumulation of TCB's in TIME-WAIT
   state is an intolerable problem; this necessity led to a simple
   solution for truncating TIME-WAIT state, described in this memo.

   Section 2 introduces the T/TCP extensions, and section 3 contains the
   complete specification of T/TCP.  Section 4 discusses some
   implementation issues, and Appendix A contains an algorithmic
   summary.  This document assumes familiarity with the standard TCP
   specification [STD-007].

2.  OVERVIEW

   The TCP protocol is highly symmetric between the two ends of a
   connection.  This symmetry is not lost in T/TCP; for example, T/TCP
   supports TCP's symmetric simultaneous open from both sides (Section
   2.3 below).  However, transaction sequences use T/TCP in a highly
   unsymmetrical manner.  It is convenient to use the terms "client
   host" and "server host" for the host that initiates a connection and
   the host that responds, respectively.

   The goal of T/TCP is to allow each transaction, i.e., each
   request/response sequence, to be efficiently performed as a single
   incarnation of a TCP connection.  Standard TCP imposes two
   performance problems for transaction-oriented communication.  First,
   a TCP connection is opened with a "3-way handshake", which must
   complete successfully before data can be transferred.  The 3-way
   handshake adds an extra RTT (round trip time) to the latency of a
   transaction.

   The second performance problem is that closing a TCP connection
   leaves one or both ends in TIME-WAIT state for a time 2*MSL, where
   MSL is the maximum segment lifetime (defined to be 120 seconds).
   TIME-WAIT state severely limits the rate of successive transactions
   between the same (host,port) pair, since a new incarnation of the
   connection cannot be opened until the TIME-WAIT delay expires.  RFC-
   1379 explained why the alternative approach, using a different user
   port for each transaction between a pair of hosts, also limits the



Braden                                                          [Page 3]

RFC 1644                    Transaction/TCP                    July 1994


   transaction rate: (1) the 16-bit port space limits the rate to
   2**16/240 transactions per second, and (2) more practically, an
   excessive amount of kernel space would be occupied by TCP state
   blocks in TIME-WAIT state [RFC-1379].

   T/TCP solves these two performance problems for transactions, by (1)
   bypassing the 3-way handshake (3WHS) and (2) shortening the delay in
   TIME-WAIT state.

   2.1  Bypassing the Three-Way Handshake

      T/TCP introduces a 32-bit incarnation number, called a "connection
      count" (CC), that is carried in a TCP option in each segment.  A
      distinct CC value is assigned to each direction of an open
      connection.  A T/TCP implementation assigns monotonically
      increasing CC values to successive connections that it opens
      actively or passively.

      T/TCP uses the monotonic property of CC values in initial 
      segments to bypass the 3WHS, using a mechanism that we call TCP
      Accelerated Open (TAO).  Under the TAO mechanism, a host caches a
      small amount of state per remote host.  Specifically, a T/TCP host
      that is acting as a server keeps a cache containing the last valid
      CC value that it has received from each different client host.  If
      an initial  segment (i.e., a segment containing a SYN bit but
      no ACK bit) from a particular client host carries a CC value
      larger than the corresponding cached value, the monotonic property
      of CC's ensures that the  segment must be new and can
      therefore be accepted immediately.  Otherwise, the server host
      does not know whether the  segment is an old duplicate or was
      simply delivered out of order; it therefore executes a normal 3WHS
      to validate the .  Thus, the TAO mechanism provides an
      optimization, with the normal TCP mechanism as a fallback.

      The CC value carried in non- segments is used to protect
      against old duplicate segments from earlier incarnations of the
      same connection (we call such segments 'antique duplicates' for
      short).  In the case of short connections (e.g., transactions),
      these CC values allow TIME-WAIT state delay to be safely discuss
      in Section 2.3.

      T/TCP defines three new TCP options, each of which carries one
      32-bit CC value.  These options are named CC, CC.NEW, and CC.ECHO.
      The CC option is normally used; CC.NEW and CC.ECHO have special
      functions, as follows.






Braden                                                          [Page 4]

RFC 1644                    Transaction/TCP                    July 1994


      (a)  CC.NEW

           Correctness of the TAO mechanism requires that clients
           generate monotonically increasing CC values for successive
           connection initiations.  These values can be generated using
           a simple global counter.  There are certain circumstances
           (discussed below in Section 2.2) when the client knows that
           monotonicity may be violated; in this case, it sends a CC.NEW
           rather than a CC option in the initial  segment.
           Receiving a CC.NEW causes the server to invalidate its cache
           entry and do a 3WHS.

      (b)  CC.ECHO

           When a server host sends a  segment, it echoes the
           connection count from the initial  in a CC.ECHO option,
           which is used by the client host to validate the 
           segment.

      Figure 1 illustrates the TAO mechanism bypassing a 3WHS.  The
      cached CC values, denoted by cache.CC[host], are shown on each
      side.  The server host compares the new CC value x in segment #1
      against x0, its cached value for client host A; this comparison is
      called the "TAO test".  Since x > x0, the  must be new and
      can be accepted immediately; the data in the segment can therefore
      be delivered to the user process B, and the cached value is
      updated.  If the TAO test failed (x <= x0), the server host would
      do a normal three-way handshake to validate the  segment, but
      the cache would not be updated.






















Braden                                                          [Page 5]

RFC 1644                    Transaction/TCP                    July 1994



          TCP A  (Client)                              TCP B (Server)
          _______________                              ______________

                                                          cache.CC[A]
                                                            V

                                                          [ x0 ]

        #1        -->   -->  (TAO test OK (x > x0) =>
                                                     data1->user_B and
                                                     cache.CC[A]= x; )

                                                           [ x ]
        #2       <--  <--
            (data2->user_A;)


              Figure 1. TAO: Three-Way Handshake is Bypassed


      The CC value x is echoed in a CC.ECHO option in the 
      segment (#2); the client side uses this option to validate the
      segment.  Since segment #2 is valid, its data2 is delivered to the
      client user process.  Segment #2 also carries B's CC value; this
      is used by A to validate non-SYN segments from B, as explained in
      Section 2.4.

      Implementing the T/TCP extensions expands the connection control
      block (TCB) to include the two CC values for the connection; call
      these variables TCB.CCsend and TCB.CCrecv (or CCsend, CCrecv for
      short).  For example, the sequence shown in Figure 1 sets
      TCB.CCsend = x and TCB.CCrecv = y at host A, and vice versa at
      host B.  Any segment that is received with a CC option containing
      a value SEG.CC different from TCB.CCsend will be rejected as an
      antique duplicate.

   2.2  Transaction Sequences

      T/TCP applies the TAO mechanism described in the previous section
      to perform a transaction sequence.  Figure 2 shows a minimal
      transaction, when the request and response data can each fit into
      a single segment.  This requires three segments and completes in
      one round-trip time (RTT).  If the TAO test had failed on segment
      #1, B would have queued data1 and the FIN for later processing,
      and then it would have returned a  segment to A, to
      perform a normal 3WHS.




Braden                                                          [Page 6]

RFC 1644                    Transaction/TCP                    July 1994



       TCP A  (Client)                                    TCP B (Server)
       _______________                                    ______________

       CLOSED                                                     LISTEN

   #1  SYN-SENT*        -->  -->         CLOSE-WAIT*
                                                           (TAO test OK)
                                                         (data1->user_B)

                                                           <-- LAST-ACK*
   #2  TIME-WAIT   <-- 
     (data2->user_A)


   #3  TIME-WAIT          -->  -->                 CLOSED

       (timeout)
         CLOSED

             Figure 2: Minimal T/TCP Transaction Sequence


      T/TCP extensions require additional connection states, e.g., the
      SYN-SENT*, CLOSE-WAIT*, and LAST-ACK* states shown in Figure 2.
      Section 3.3 describes these new connection states.

      To obtain the minimal 3-segment sequence shown in Figure 2, the
      server host must delay acknowledging segment #1 so the response
      may be piggy-backed on segment #2.  If the application takes
      longer than this delay to compute the response, the normal TCP
      retransmission mechanism in TCP B will send an acknowledgment to
      forestall a retransmission from TCP A.  Figure 3 shows an example
      of a slow server application.  Although the sequence in Figure 3
      does contain a 3-way handshake, the TAO mechanism has allowed the
      request data to be accepted immediately, so that the client still
      sees the minimum latency.














Braden                                                          [Page 7]

RFC 1644                    Transaction/TCP                    July 1994



       TCP A  (Client)                                    TCP B (Server)
       _______________                                    ______________

       CLOSED                                                     LISTEN

   #1  SYN-SENT*       -->  -->          CLOSE-WAIT*
                                                        (TAO test OK =>
                                                          data1->user_B)

                                                               (timeout)
   #2  FIN-WAIT-1  <--  <--     CLOSE-WAIT*


   #3  FIN-WAIT-1      -->  -->            CLOSE-WAIT


   #4  TIME-WAIT   <--  <--            LAST-ACK
       (data2->user_A)

   #5  TIME_WAIT       -->  -->                    CLOSED

         (timeout)
        CLOSED

                  Figure 3: Acknowledgment Timeout in Server


   2.3  Protocol Correctness

      This section fills in more details of the TAO mechanism and
      provides an informal sketch of why the T/TCP protocol works.

      CC values are 32-bit integers.  The TAO test requires the same
      kind of modular arithmetic that is used to compare two TCP
      sequence numbers.  We assume that the boundary between y < z and z
      < y for two CC values y and z occurs when they differ by 2**31,
      i.e., by half the total CC space.

      The essential requirement for correctness of T/TCP is this:

           CC values must advance at a rate slower than 2**31      [R1]
           counts per 2*MSL

      where MSL denotes the maximum segment lifetime in the Internet.
      The requirement [R1] is easily met with a 32-bit CC.  For example,
      it will allow 10**6 transactions per second with the very liberal
      MSL of 1000 seconds [RFC-1379].  This is well in excess of the



Braden                                                          [Page 8]

RFC 1644                    Transaction/TCP                    July 1994


      transaction rates achievable with current operating systems and
      network latency.

      Assume for the present that successive connections from client A
      to server B contain only monotonically increasing CC values.  That
      is, if x(i) and x(i+1) are CC values carried in two successive
      initial  segments from the same host, then x(i+1) > x(i).
      Assuming the requirement [R1], the CC space cannot wrap within the
      range of segments that can be outstanding at one time.  Therefore,
      those successive  segments from a given host that have not
      exceeded their MSL must contain an ordered set of CC values:

             x(1) < x(2) < x(3) ... < x(n),

      where the modular comparisons have been replaced by simple
      arithmetic comparisons. Here x(n) is the most recent acceptable
      , which is cached by the server.  If the server host receives
      a  segment containing a CC option with value y where y >
      x(n), that  must be newer; an antique duplicate SYN with CC
      value greater than x(n) must have exceeded its MSL and vanished.
      Hence, monotonic CC values and the TAO test prevent erroneous
      replay of antique s.

      There are two possible reasons for a client to generate non-
      monotonic CC values: (a) the client may have crashed and
      restarted, causing the generated CC values to jump backwards; or
      (b) the generated CC values may have wrapped around the finite
      space.  Wraparound may occur because CC generation is global to
      all connections.  Suppose that host A sends a transaction to B,
      then sends more than 2**31 transactions to other hosts, and
      finally sends another transaction to B.  From B's viewpoint, CC
      will have jumped backward relative to its cached value.

      In either of these two cases, the server may see the CC value jump
      backwards only after an interval of at least MSL since the last
       segment from the same client host.  In case (a), client host
      restart, this is because T/TCP retains TCP's explicit "Quiet Time"
      of an MSL interval [STD-007].  In case (b). wrap around, [R1]
      ensures that a time of at least MSL must have passed before the CC
      space wraps around.  Hence, there is no possibility that a TAO
      test will succeed erroneously due to either cause of non-
      monotonicity; i.e., there is no chance of replays due to TAO.

      However, although CC values jumping backwards will not cause an
      error, it may cause a performance degradation due to unnecessary
      3WHS's.  This results from the generated CC values jumping
      backwards through approximately half their range, so that all
      succeeding TAO tests fail until the generated CC values catch up



Braden                                                          [Page 9]

RFC 1644                    Transaction/TCP                    July 1994


      to the cached value.  To avoid this degradation, a client host
      sends a CC.NEW option instead of a CC option in the case of either
      system restart or CC wraparound.  Receiving CC.NEW forces a 3WHS,
      but when this 3WHS completes successfully the server cache is
      updated to the new CC value.  To detect CC wraparound, the client
      must cache the last CC value it sent to each server.  It therefore
      maintains cache.CCsent[B] for each server B.  If this cached value
      is undefined or if it is larger than the next CC value generated
      at the client, then the client sends a CC.NEW instead of a CC
      option in the next SYN segment.

      This is illustrated in Figure 4, which shows the scenario for the
      first transaction from A to B after the client host A has crashed
      and recovered.  A similar sequence occurs if x is not greater than
      cache.CCsent[B], i.e., if there is a wraparound of the generated
      CC values.  Because segment #1 contains a CC.NEW option, the
      server host invalidates the cache entry and does a 3WHS; however,
      it still sets B's TCB.CCrecv for this connection to x.  TCP B uses
      this CCrecv value to validate the  segment (#3) that
      completes the 3WHS.  Receipt of this segment updates cache.CC[A],
      since the cache entry was previously undefined.  (If a 3WHS always
      updated the cache, then out-of-order SYN segments could cause the
      cached value to jump backwards, possibly allowing replays).
      Finally, the CC.ECHO option in the  segment #2 defines
      A's cache.CCsent entry.

      This algorithm delays updating cache.CCsent[] until the  has
      been ACK'd.  This allows the undefined cache.CCsent value to used
      as a a "first-time switch" to reliable resynchronization of the
      cached value at the server after a crash or wraparound.

      When we use the term "cache", we imply that the value can be
      discarded at any time without introducing erroneous behavior
      although it may degrade performance.

      (a)  If a server host receives an initial  from client A but
           has no cached value cache.CC[A], the server simply forces a
           3WHS to validate the  segment.

      (b)  If a client host has no cached value cache.CCsent[B] when it
           needs to send an initial  segment, the client simply
           sends a CC.NEW option in the segment.  This forces a 3WHS at
           the server.








Braden                                                         [Page 10]

RFC 1644                    Transaction/TCP                    July 1994


          TCP A  (Client)                                TCP B (Server)
          _______________                                ______________

          cache.CCsent[B]                                   cache.CC[A]
              V                                                  V

        (Crash and restart)
            [ ?? ]                                            [ x0 ]

        #1         -->  -->      (invalidate cache;
                                                            queue data1;
                                                        3-way handshake)

            [ ?? ]                                            [ ?? ]
        #2          <--  <--
          (cache.CCsent[B]= x;)

            [ x ]                                             [ ?? ]

        #3                  -->  -->       data1->user_B;
                                                         cache.CC[A]= x;

            [ x ]                                              [ x ]

                      Figure 4.  Client Host Restarting


      So far, we have considered only correctness of the TAO mechanism
      for bypassing the 3WHS.  We must also protect a connection against
      antique duplicate non-SYN segments.  In standard TCP, such
      protection is one of the functions of the TIME-WAIT state delay.
      (The other function is the TCP full-duplex close semantics, which
      we need to preserve; that is discussed below in Section 2.5).  In
      order to achieve a high rate of transaction processing, it must be
      possible to truncate this TIME-WAIT state delay without exposure
      to antique duplicate segments [RFC-1379].

      For short connections (e.g., transactions), the CC values assigned
      to each direction of the connection can be used to protect against
      antique duplicate non-SYN segments.  Here we define "short" as a
      duration less than MSL.  Suppose that there is a connection that
      uses the CC values TCB.CCsend = x and TCB.CCrecv = y.  By the
      requirement [R1], neither x nor y can be reused for a new
      connection from the same remote host for a time at least 2*MSL.
      If the connection has been in existence for a time less than MSL,
      then its CC values will not be reused for a period that exceeds
      MSL, and therefore all antique duplicates with that CC value must
      vanish before it is reused.  Thus, for "short" connections we can



Braden                                                         [Page 11]

RFC 1644                    Transaction/TCP                    July 1994


      guard against antique non-SYN segments by simply checking the CC
      value in the segment againsts TCB.CCrecv.  Note that this check
      does not use the monotonic property of the CC values, only that
      they not cycle in less than 2*MSL.  Again, the quiet time at
      system restart protects against errors due to crash with loss of
      state.

      If the connection duration exceeds MSL, safety from old duplicates
      still requires a TIME-WAIT delay of 2*MSL.  Thus, truncation of
      TIME-WAIT state is only possible for short connections.  (This
      problem has also been noticed by Shankar and Lee [ShankarLee93]).
      This difference in behavior for long and for short connections
      does create a slightly complex service model for applications
      using T/TCP.  An application has two different strategies for
      multiple connections.  For "short" connections, it should use a
      fixed port pair and use the T/TCP mechanism to get rapid and
      efficient transaction processing.  For connections whose durations
      are of the order of MSL or longer, it should use a different user
      port for each successive connection, as is the current practice
      with unmodified TCP.  The latter strategy will cause excessive
      overhead (due to TCB's in TIME-WAIT state) if it is applied to
      high-frequency short connections.  If an application makes the
      wrong choice, its attempt to open a new connection may fail with a
      "busy" error.  If connection durations may range between long and
      short, an application may have to be able to switch strategies
      when one fails.

   2.4  Truncating TIME-WAIT State

      Truncation of TIME-WAIT state is necessary to achieve high
      transaction rates.  As Figure 2 illustrates, a standard
      transaction leaves the client end of the connection in TIME-WAIT
      state.  This section explains the protocol implications of
      truncating TIME-WAIT state, when it is allowed (i.e., when the
      connection has been in existence for less than MSL).  In this
      case, the client host should be able to interrupt TIME-WAIT state
      to initiate a new incarnation of the same connection (i.e., using
      the same host and ports).  This will send an initial 
      segment.

      It is possible for the new  to arrive at the server before
      the retransmission state from the previous incarnation is gone, as
      shown in Figure 5.  Here the final  (segment #3) from the
      previous incarnation is lost, leaving retransmission state at B.
      However, the client received segment #2 and thinks the transaction
      completed successfully, so it can initiate a new transaction by
      sending  segment #4.  When this  arrives at the server
      host, it must implicitly acknowledge segment #2, signalling



Braden                                                         [Page 12]

RFC 1644                    Transaction/TCP                    July 1994


      success to the server application, deleting the old TCB, and
      creating a new TCB, as shown in Figure 5.  Still assuming that the
      new  is known to be valid, the server host marks the new
      connection half-synchronized and delivers data3 to the server
      application.  (The details of how this is accomplished are
      presented in Section 3.3.)

      The earlier discussion of the TAO mechanism assumed that the
      previous incarnation was closed before a new  arrived at the
      server.  However, TAO cannot be used to validate the  if
      there is still state from the previous incarnation, as shown in
      Figure 5; in this case, it would be exceedingly awkward to perform
      a 3WHS if the TAO test should fail.  Fortunately, a modified
      version of the TAO test can still be performed, using the state in
      the earlier TCB rather than the cached state.

      (A)  If the  segment contains a CC or CC.NEW option, the
           value SEG.CC from this option is compared with TCB.CCrecv,
           the CC value in the still-existing state block of the
           previous incarnation.  If SEG.CC > TCB.CCrecv, the new 
           segment must be valid.

      (B)  Otherwise, the  is an old duplicate and is simply
           discarded.

      Truncating TIME-WAIT state may be looked upon as composing an
      extended state machine that joins the state machines of the two
      incarnations, old and new.  It may be described by introducing new
      intermediate states (which we call I-states), with transitions
      that join the two diagrams and share some state from each.  I-
      states are detailed in Section 3.3.

      Notice also segment #2' in Figure 5.  TCP's mechanism to recover
      from half-open connections (see Figure 10 of [STD-007]) cause TCP
      A to send a RST when 2' arrives, which would incorrectly make B
      think that the previous transaction did not complete successfully.
      The half-open recovery mechanism must be defeated in this case, by
      A ignoring segment #2'.













Braden                                                         [Page 13]

RFC 1644                    Transaction/TCP                    July 1994



      TCP A  (Client)                                     TCP B (Server)
      _______________                                     ______________

      CLOSED                                                      LISTEN

  #1                --> <...,FIN,CC=x> -->                     LAST-ACK*

  #2         <-- <...ACK(FIN),data2,FIN,CC=y,CC.ECHO=x>  <---  LAST-ACK*
      TIME-WAIT
    (data2->user_A)


  #3  TIME-WAIT          -->  --> X (DROP)

      (New Active Open)                           (New Passive Open)

  #4  SYN-SENT*    -->   ...

                                                               LISTEN-LA
  #2' (discard) <-- <...ACK(FIN),data2,FIN,CC=y> <--- (retransmit)

  #4  SYN-SENT*        ...  -->            ESTABLISHED*
                                                    SYN OK (see text) =>
                                                            {Ack seg #2;
                                                         Delete old TCB;
                                                         Create new TCB;
                                                        data3 -> user_B;
                                                        cache.CC[A]= z;}

        Figure 5: Truncating TIME-WAIT State: SYN as Implicit ACK


   2.5  Transition to Standard TCP Operation

      T/TCP includes all normal TCP semantics, and it will continue to
      operate exactly like TCP when the particular assumptions for
      transactions do not hold.  There is no limit on the size of an
      individual transaction, and behavior of T/TCP should merge
      seamlessly from pure transaction operation as shown in Figure 2,
      to pure streaming mode for sending large files.  All the sequences
      shown in [STD-007] are still valid, and the inherent symmetry of
      TCP is preserved.

      Figure 6 shows a possible sequence when the request and response
      messages each require two segments.  Segment #2 is a non-SYN
      segment that contains a TCP option.  To avoid compatibility
      problems with existing TCP implementations, the client side should



Braden                                                         [Page 14]

RFC 1644                    Transaction/TCP                    July 1994


      send segment #2 only if cache.CCsent[B] is defined, i.e., only if
      host A knows that host B plays the new game.



          TCP A  (Client)                                 TCP B (Server)
          _______________                                 ______________

          CLOSED                                                  LISTEN


       #1  SYN-SENT*       -->   -->        ESTABLISHED*
                                                       (TAO test OK =>
                                                        data1-> user)

       #2  SYN-SENT*       -->   -->         CLOSE-WAIT*
                                                       (data2-> user)

                                                             CLOSE-WAIT*
       #3  FIN-WAIT-2  <--  <--
            (data3->user)

       #4  TIME_WAIT   <--  <--       LAST-ACK*
            (data4->user)

       #5  TIME-WAIT       -->  -->                CLOSED


            Figure 6. Multi-Packet Request/Response Sequence

      Figure 7 shows a more complex example, one possible sequence with
      TAO combined with simultaneous open and close.  This may be
      compared with Figure 8 of [STD-007].


















Braden                                                         [Page 15]

RFC 1644                    Transaction/TCP                    July 1994



          TCP A                                                    TCP B
          _______________                                 ______________

          CLOSED                                                  CLOSED

      #1  SYN-SENT*         -->  ...

      #2  CLOSING*     <--  <--            SYN-SENT*
          (TAO test OK =>
           data2->user_A

      #3  CLOSING*      -->  ...

      #1'                       ...  -->    CLOSING*
                                                       (TAO test OK =>
                                                        data1->user_B)

      #4  TIME-WAIT   <--  <--     CLOSING*

      #5  TIME-WAIT    -->  ...

      #3'              ...  -->   TIME-WAIT

      #6  TIME-WAIT            <--  <---        TIME-WAIT

      #5' TIME-WAIT               ...  -->      TIME-WAIT

          (timeout)                                            (timeout)
            CLOSED                                                CLOSED

                  Figure 7: Simultaneous Open and Close



















Braden                                                         [Page 16]

RFC 1644                    Transaction/TCP                    July 1994


3.  FUNCTIONAL SPECIFICATION

   3.1  Data Structures

      A connection count is an unsigned 32-bit integer, with the value
      zero excluded.  Zero is used to denote an undefined value.

      A host maintains a global connection count variable CCgen, and
      each connection control block (TCB) contains two new connection
      count variables, TCB.CCsend and TCB.CCrecv.  Whenever a TCB is
      created for the active or passive end of a new connection, CCgen
      is incremented by 1 and placed in TCB.CCsend of the TCB; however,
      if the previous CCgen value was 0xffffffff (-1), then the next
      value should be 1.  TCB.CCrecv is initialized to zero (undefined).

      T/TCP adds a per-host cache to TCP.  An entry in this cache for
      foreign host fh includes two CC values, cache.CC[fh] and
      cache.CCsent[fh].  It may include other values, as discussed in
      Sections 4.3 and 4.4.  According to [STD-007], a TCP is not
      permitted to send a segment larger than the default size 536,
      unless it has received a larger value in an MSS (Maximum Segment
      Size) option.  This could constrain the client to use the default
      MSS of 536 bytes for every request.  To avoid this constraint, a
      T/TCP may cache the MSS option values received from remote hosts,
      and we allow a TCP to use a cached MSS option value for the
      initial SYN segment.

      When the client sends an initial  segment containing data, it
      does not have a send window for the server host.  This is not a
      great difficulty; we simply define a default initial window; our
      current suggestion is 4K.  Such a non-zero default should be be
      conditioned upon the existence of a cached connection count for
      the foreign host, so that data may be included on an initial SYN
      segment only if cache.CC[foreign host] is non-zero.

      In TCP, the window is dynamically adjusted to provide congestion
      control/avoidance [Jacobson88].  It is possible that a particular
      path might not be able to absorb an initial burst of 4096 bytes
      without congestive losses.  If this turns out to be a problem, it
      should be possible to cache the congestion threshold for the path
      and use this value to determine the maximum size of the initial
      packet burst created by a request.

   3.2  New TCP Options

      Three new TCP options are defined: CC, CC.NEW, and CC.ECHO.  Each
      carries a connection count SEG.CC.  The complete rules for sending
      and processing these options are given in Section 3.4 below.



Braden                                                         [Page 17]

RFC 1644                    Transaction/TCP                    July 1994


      CC Option

         Kind: 11

         Length: 6

            +--------+--------+--------+--------+--------+--------+
            |00001011|00000110|    Connection Count:  SEG.CC      |
            +--------+--------+--------+--------+--------+--------+
             Kind=11  Length=6

         This option may be sent in an initial SYN segment, and it may
         be sent in other segments if a CC or CC.NEW option has been
         received for this incarnation of the connection.  Its SEG.CC
         value is the TCB.CCsend value from the sender's TCB.

      CC.NEW Option

         Kind: 12

         Length: 6

            +--------+--------+--------+--------+--------+--------+
            |00001100|00000110|    Connection Count:  SEG.CC      |
            +--------+--------+--------+--------+--------+--------+
             Kind=12  Length=6

         This option may be sent instead of a CC option in an initial
          segment (i.e., SYN but not ACK bit), to indicate that the
         SEG.CC value may not be larger than the previous value.  Its
         SEG.CC value is the TCB.CCsend value from the sender's TCB.

      CC.ECHO Option

         Kind: 13

         Length: 6

            +--------+--------+--------+--------+--------+--------+
            |00001101|00000110|    Connection Count:  SEG.CC      |
            +--------+--------+--------+--------+--------+--------+
             Kind=13  Length=6

         This option must be sent (in addition to a CC option) in a
         segment containing both a SYN and an ACK bit, if the initial
         SYN segment contained a CC or CC.NEW option.  Its SEG.CC value
         is the SEG.CC value from the initial SYN.




Braden                                                         [Page 18]

RFC 1644                    Transaction/TCP                    July 1994


         A CC.ECHO option should be sent only in a  segment and
         should be ignored if it is received in any other segment.

   3.3  Connection States

      T/TCP requires new connection states and state transitions.
      Figure 8 shows the resulting finite state machine; see [RFC-1379]
      for a detailed development.  If all state names ending in stars
      are removed from Figure 8, the state diagram reduces to the
      standard TCP state machine (see Figure 6 of [STD-007]), with two
      exceptions:

      *    STD-007 shows a direct transition from SYN-RECEIVED to FIN-
           WAIT-1 state when the user issues a CLOSE call.  This
           transition is suspect; a more accurate description of the
           state machine would seem to require the intermediate SYN-
           RECEIVED* state shown in Figure 8.

      *    In STD-007, a user CLOSE call in SYN-SENT state causes a
           direct transition to CLOSED state.  The extended diagram of
           Figure 8 forces the connection to open before it closes,
           since calling CLOSE to terminate the request in SYN-SENT
           state is normal behavior for a transaction client.  In the
           case that no data has been sent in SYN-SENT state, it is
           reasonable for a user CLOSE call to immediately enter CLOSED
           state and delete the TCB.

      Each of the new states in Figure 8 bears a starred name, created
      by suffixing a star onto a standard TCP state.  Each "starred"
      state bears a simple relationship to the corresponding "unstarred"
      state.

      o    SYN-SENT* and SYN-RECEIVED* differ from the SYN-SENT and
           SYN-RECEIVED state, respectively, in recording the fact that
           a FIN needs to be sent.

      o    The other starred states indicate that the connection is
           half-synchronized (hence, a SYN bit needs to be sent).













Braden                                                         [Page 19]

RFC 1644                    Transaction/TCP                    July 1994


      ________      g        ________
     |        |<------------|        |
     | CLOSED |------------>| LISTEN |
     |________|  h    ------|________|
          |          /        |     |
          |         /        i|    j|
          |        /          |     |
         a|     a'/           |    _V______               ________
          |      /     j      |   |ESTAB-  |       e'    | CLOSE- |
          |     /  -----------|-->| LISHED*|------------>|   WAIT*|
          |    /  /           |   |________|             |________|
          |   /  /            |    |     |                |     |
          |  /  /             |    |    c|              d'|    c|
      ____V_V_ /       _______V    |   __V_____           |   __V_____
     | SYN-   |   b'  |  SYN-  |c  |  |ESTAB-  |  e       |  | CLOSE- |
     |   SENT |------>|RECEIVED|---|->|  LISHED|----------|->|   WAIT |
     |________|       |________|   |  |________|          |  |________|
        |               |          |     |                |        |
        |               |          |     |              __V_____   |
        |               |          |     |             | LAST-  |  |
      d'|             d'|        d'|    d|             |  ACK*  |  |
        |               |          |     |             |________|  |
        |               |          |     |                    |    |
        |               |    ______V_    |        ________    |c'  |d
        |          k    |   |  FIN-  |   |  e''' |        |   |    |
        |        -------|-->| WAIT-1*|---|------>|CLOSING*|   |    |
        |       /       |   |________|   |       |________|   |    |
        |      /        |          |     |            |       |    |
        |     /         |        c'|     |          c'|       |    |
     ___V___ /      ____V___       V_____V_       ____V___    V____V__
    | SYN-   | b'' |  SYN-  |  c  |  FIN-  | e'' |        |  | LAST-  |
    |  SENT* |---->|RECEIVD*|---->| WAIT-1 |---->|CLOSING |  |   ACK  |
    |________|     |________|     |________|     |________|  |________|
                                        |               |           |
                                       f|              f|         f'|
                                     ___V____       ____V___     ___V____
                                    |  FIN-  | e   |TIME-   | T |        |
                                    | WAIT-2 |---->|   WAIT |-->| CLOSED |
                                    |________|     |________|   |________|


                 Figure 8A: Basic T/TCP State Diagram









Braden                                                         [Page 20]

RFC 1644                    Transaction/TCP                    July 1994


    ________________________________________________________________
   |                                                                |
   |        Label          Event / Action                           |
   |        _____          ________________________                 |
   |                                                                |
   |          a            Active OPEN / create TCB, snd SYN        |
   |          a'           Active OPEN / snd SYN                    |
   |          b            rcv SYN [no TAO]/ snd ACK(SYN)           |
   |          b'           rcv SYN [no TAO]/ snd SYN,ACK(SYN)       |
   |          b''          rcv SYN [no TAO]/ snd SYN,FIN,ACK(SYN)   |
   |          c            rcv ACK(SYN) /                           |
   |          c'           rcv ACK(SYN) / snd FIN                   |
   |          d            CLOSE / snd FIN                          |
   |          d'           CLOSE / snd SYN,FIN                      |
   |          e            rcv FIN / snd ACK(FIN)                   |
   |          e'           rcv FIN / snd SYN,ACK(FIN)               |
   |          e''          rcv FIN / snd FIN,ACK(FIN)               |
   |          e'''         rcv FIN / snd SYN,FIN,ACK(FIN)           |
   |          f            rcv ACK(FIN) /                           |
   |          f'           rcv ACK(FIN) / delete TCB                |
   |          g            CLOSE / delete TCB                       |
   |          h            passive OPEN / create TCB                |
   |          i (= b')     rcv SYN [no TAO]/ snd SYN,ACK(SYN)       |
   |          j            rcv SYN [TAO OK] / snd SYN,ACK(SYN)      |
   |          k            rcv SYN [TAO OK] / snd SYN,FIN,ACK(SYN)  |
   |          T            timeout=2MSL / delete TCB                |
   |                                                                |
   |                                                                |
   |          Figure 8B.  Definition of State Transitions           |
   |________________________________________________________________|

      This simple correspondence leads to an alternative state model,
      which makes it easy to incorporate the new states in an existing
      implementation.  Each state in the extended FSM is defined by the
      triplet:

          (old_state, SENDSYN, SENDFIN)

      where 'old_state' is a standard TCP state and SENDFIN and SENDSYN
      are Boolean flags see Figure 9.  The SENDFIN flag is turned on (on
      the client side) by a SEND(...  EOF=YES) call, to indicate that a
      FIN should be sent in a state which would not otherwise send a
      FIN.  The SENDSYN flag is turned on when the TAO test succeeds to
      indicate that the connection is only half synchronized; as a
      result, a SYN will be sent in a state which would not otherwise
      send a SYN.





Braden                                                         [Page 21]

RFC 1644                    Transaction/TCP                    July 1994


       ________________________________________________________________
      |                                                                |
      |   New state:         Old_state:    SENDSYN:      SENDFIN:      |
      |  __________         __________      ______        ______       |
      |                                                                |
      |  SYN-SENT*     =>   SYN-SENT        FALSE          TRUE        |
      |                                                                |
      |  SYN-RECEIVED* =>   SYN-RECEIVED    FALSE          TRUE        |
      |                                                                |
      |  ESTABLISHED*  =>   ESTABLISHED      TRUE         FALSE        |
      |                                                                |
      |  CLOSE-WAIT*   =>   CLOSE-WAIT       TRUE         FALSE        |
      |                                                                |
      |  LAST-ACK*     =>   LAST-ACK         TRUE         FALSE        |
      |                                                                |
      |  FIN-WAIT-1*   =>   FIN-WAIT-1       TRUE         FALSE        |
      |                                                                |
      |  CLOSING*      =>   CLOSING          TRUE         FALSE        |
      |                                                                |
      |                                                                |
      |           Figure 9: Alternative State Definitions              |
      |________________________________________________________________|


      Here is a more complete description of these boolean variables.

      *    SENDFIN

           SENDFIN is turned on by the SEND(...EOF=YES) call, and turned
           off when FIN-WAIT-1 state is entered.  It may only be on in
           SYN-SENT* and SYN-RECEIVED* states.

           SENDFIN has two effects.  First, it causes a FIN to be sent
           on the last segment of data from the user.  Second, it causes
           the SYN-SENT[*] and SYN-RECEIVED[*] states to transition
           directly to FIN-WAIT-1, skipping ESTABLISHED state.

      *    SENDSYN

           The SENDSYN flag is turned on when an initial SYN segment is
           received and passes the TAO test.  SENDSYN is turned off when
           the SYN is acknowledged (specifically, when there is no RST
           or SYN bit and SEG.UNA < SND.ACK).

           SENDSYN has three effects.  First, it causes the SYN bit to
           be set in segments sent with the initial sequence number
           (ISN).  Second, it causes a transition directly from LISTEN
           state to ESTABLISHED*, if there is no FIN bit, or otherwise



Braden                                                         [Page 22]

RFC 1644                    Transaction/TCP                    July 1994


           to CLOSE-WAIT*.  Finally, it allows data to be received and
           processed (passed to the application) even if the segment
           does not contain an ACK bit.

      According to the state model of the basic TCP specification [STD-
      007], the server side must explicitly issued a passive OPEN call,
      creating a TCB in LISTEN state, before an initial SYN may be
      accepted.  To accommodate truncation of TIME-WAIT state within
      this model, it is necessary to add the five "I-states" shown in
      Figure 10.  The I-states are:  LISTEN-LA, LISTEN-LA*, LISTEN-CL,
      LISTEN-CL*, and LISTEN-TW.  These are 'bridge states' between two
      successive the state diagrams of two successive incarnations.
      Here D is the duration of the previous connection, i.e., the
      elapsed time since the connection opened.  The transitions labeled
      with lower-case letters are taken from Figure 8.

      Fortunately, many TCP implementations have a different user
      interface model, in which the use can issue a generic passive open
      ("listen") call; thereafter, when a matching initial SYN arrives,
      a new TCB in LISTEN state is automatically generated.  With this
      user model, the I-states of Figure 10 are unnecessary.

      For example, suppose an initial SYN segment arrives for a
      connection that is in LAST-ACK state.  If this segment carries a
      CC option and if SEG.CC is greater than TCB.CCrecv in the existing
      TCB, the "q" transition shown in Figure 10 can be made directly
      from the LAST-ACK state.  That is, the previous TCB is processed
      as if an ACK(FIN) had arrived, causing the user to be notified of
      a successful CLOSE and the TCB to be deleted.  Then processing of
      the new SYN segment is repeated, using a new TCB that is generated
      automatically.  The same principle can be used to avoid
      implementing any of the I-states.



















Braden                                                         [Page 23]

RFC 1644                    Transaction/TCP                    July 1994


 ______________________________
| P: Passive OPEN /            |
|                              |
| Q: Rcv SYN, special TAO test |                     d'|     d|
|     (see text) / Delete TCB, |    ________        ___V____  |
|     create TCB, snd SYN      |   |LISTEN- |  P   | LAST-  | |
|                              |   |   LA*  |<-----|  ACK*  | |
| Q': (same as Q) if D < MSL   |   |________|      |________| |
|                              |    |     |            |      |
| R: Rcv ACK(FIN) / Delete TCB,|   Q|   c'|          c'|      |
|     create TCB               |    |     |            |      |
|                              |    |  ___V____        V______V
| S': Active OPEN if D < MSL / |    | |LISTEN- |  P   | LAST-  |
|     Delete TCB, create TCB,  |    | |  LA    |<-----|   ACK  |
|     snd SYN.                 |    | |________|      |________|
|______________________________|    |  |     |            |
                                    | Q|    R|           f|
         ________        ________   |  |     |            |
   e''' |        |  P   |LISTEN- |  |  |     V            V
   ---->|CLOSING*|----->|   CL*  |  |  |   LISTEN       CLOSED
        |________|      |________|  |  |
             |            |   Q|    |  |
           c'|          c'|    V    V  V
             |            |   ESTABLISHED*
         ____V___         V_______
    e'' |        |  P    |LISTEN- |
   ---->|CLOSING |------>|   CL   |
        |________|       |________|
             |           R|     Q|
            f|            V      V
             |         LISTEN   ESTABLISHED*
         ____V___                _________
     e  |TIME-   |  P           | LISTEN- |
   ---->|   WAIT |------------->|    TW   |
        |________|              |_________|
        /     |                  |    |  |
     S'/     T|                 T|  Q'|  |S'
      |  _____V_      h     _____V__  |  V
      | |        |-------->|        | |  SYN-SENT
      | | CLOSED |<--------| LISTEN | |
      | |________|   ------|________| |
      |   |        /        |   j|    |
      |  a|     a'/        i|    V    V
      |   |      /          |   ESTABLISHED*
      V   V     V           V
        SYN-SENT           ...

             Figure 10: I-States for TIME-WAIT Truncation



Braden                                                         [Page 24]

RFC 1644                    Transaction/TCP                    July 1994


   3.4  T/TCP Processing Rules

      This section summarizes the rules for sending and processing the
      T/TCP options.

      INITIALIZATION

         I1:  All cache entries cache.CC[*] and cache.CCsent[*] are
              undefined (zero) when a host system initializes, and CCgen
              is set to a non-zero value.

         I2:  A new TCB is initialized with TCB.CCrecv = 0 and
              TCB.CCsend = current CCgen value; CCgen is then
              incremented.  If the result is zero, CCgen is incremented
              again.


      SENDING SEGMENTS

         S1:  Sending initial  Segment

              An initial  segment is sent with either a CC option
              or a CC.NEW option.  If cache.CCsent[fh] is undefined or
              if TCB.CCsend < cache.CCsent[fh], then the option
              CC.NEW(TCB.CCsend) is sent and cache.CCsent[fh] is set to
              zero.  Otherwise, the option CC(TCB.CCsend) is sent and
              cache.CCsent[fh] is set to CCsend.

         S2:  Sending  Segment

              If the sender's TCB.CCrecv is non-zero, then a 
              segment is sent with both a CC(TCB.CCsend) option and a
              CC.ECHO (TCB.CCrecv) option.

         S3:  Sending Non-SYN Segment

              A non-SYN segment is sent with a CC(TCB.CCsend) option if
              the TCB.CCrecv value is non-zero, or if the state is SYN-
              SENT or SYN-SENT* and cache.CCsent[fh] is non-zero (this
              last is required to send CC options in the segments
              following the first of a multi-segment request message;
              see segment #2 in Figure 6).

      RECEIVING INITIAL  SEGMENT

         Suppose that a server host receives a segment containing a SYN
         bit but no ACK bit in LISTEN, SYN-SENT, or SYN-SENT* state.




Braden                                                         [Page 25]

RFC 1644                    Transaction/TCP                    July 1994


         R1.1:If the  segment contains a CC or CC.NEW option,
              SEG.CC is stored into TCB.CCrecv of the new TCB.

         R1.2:If the segment contains a CC option and if the local cache
              entry cache.CC[fh] is defined and if
              SEG.CC > cache.CC[fh], then the TAO test is passed and the
              connection is half-synchronized in the incoming direction.
              The server host replaces the cache.CC[fh] value by SEG.CC,
              passes any data in the segment to the user, and processes
              a FIN bit if present.

              Acknowledgment of the SYN is delayed to allow piggybacking
              on a response segment.

         R1.3:If SEG.CC <= cache.CC[fh] (the TAO test has failed), or if
              cache.CC[fh] is undefined, or if there is no CC option
              (but possibly a CC.NEW option), the server host proceeds
              with normal TCP processing.  If the connection was in
              LISTEN state, then the host executes a 3-way handshake
              using the standard TCP rules.  In the SYN-SENT or SYN-
              SENT* state (i.e., the simultaneous open case), the TCP
              sends ACK(SYN) and enters SYN-RECEIVED state.

         R1.4:If there is no CC option (but possibly a CC.NEW option),
              then the server host sets cache.CC[fh] undefined (zero).
              Receiving an ACK for a SYN (following application of rule
              R1.3) will update cache.CC[fh], by rule R3.

         Suppose that an initial  segment containing a CC or CC.NEW
         option arrives in an I-state (i.e., a state with a name of the
         form 'LISTEN-xx', where xx is one of TW, LA, L8, CL, or CL*):

         R1.5:If the state is LISTEN-TW, then the duration of the
              current connection is compared with MSL.  If duration >
              MSL then send a RST:

                

              drop the packet, and return.

         R1.6:Perform a special TAO test: compare SEG.CC with
              TCB.CCrecv.

              If SEG.CC is greater, then processing is performed as if
              an ACK(FIN) had arrived:  signal the application that the
              previous close completed successfully and delete the
              previous TCB.  Then create a new TCB in LISTEN state and
              reprocess the SYN segment against the new TCB.



Braden                                                         [Page 26]

RFC 1644                    Transaction/TCP                    July 1994


              Otherwise, silently discard the segment.

      RECEIVING  SEGMENT

         Suppose that a client host receives a  segment for a
         connection in SYN-SENT or SYN-SENT* state.

         R2.1:If SEG.ACK is not acceptable (see [STD-007]) and
              cache.CCsent[fh] is non-zero, then simply drop the segment
              without sending a RST.  (The new SYN that the client is
              (re-)transmitting will eventually acknowledge any
              outstanding data and FIN at the server.)

         R2.2:If the segment contains a CC.ECHO option whose SEG.CC is
              different from TCB.CCsend, then the segment is
              unacceptable and is dropped.

         R2.3:If cache.CCsent[fh] is zero, then it is set to TCB.CCsend.

         R2.4:If the segment contains a CC option, its SEG.CC is stored
              into TCB.CCrecv of the TCB.

      RECEIVING  SEGMENT IN SYN-RECEIVED STATE

         R3.1:If a segment contains a CC option whose SEG.CC differs
              from TCB.CCrecv, then the segment is unacceptable and is
              dropped.

         R3.2:Otherwise, a 3-way handshake has completed successfully at
              the server side.  If the segment contains a CC option and
              if cache.CC[fh] is zero, then cache.CC[fh] is replaced by
              TCB.CCrecv.

      RECEIVING OTHER SEGMENT

         R4:  Any other segment received with a CC option is
              unacceptable if SEG.CC differs from TCB.CCrecv.  However,
              a RST segment is exempted from this test.

      OPEN REQUEST

         To allow truncation of TIME-WAIT state, the following changes
         are made in the state diagram for OPEN requests (see Figure
         10):

         O1.1:A new passive open request is allowed in any of the
              states: LAST-ACK, LAST-ACK*, CLOSING, CLOSING*, or TIME-
              WAIT.  This causes a transition to the corresponding I-



Braden                                                         [Page 27]

RFC 1644                    Transaction/TCP                    July 1994


              state (see Figure 10), which retains the previous state,
              including the retransmission queue and timer.

         O1.2 A new active open request is allowed in TIME-WAIT or
              LISTEN-TW state, if the elapsed time since the current
              connection opened is less than MSL.  The result is to
              delete the old TCB and create a new one, send a new SYN
              segment, and enter SYN-SENT or SYN-SENT* state (depending
              upon whether or not the SYN segment contains a FIN bit).

      Finally, T/TCP has a provision to improve performance for the case
      of a client that "sprays" transactions rapidly using many
      different server hosts and/or ports.  If TCB.CCrecv in the TCB is
      non-zero (and still assuming that the connection duration is less
      than MSL), then the TIME-WAIT delay may be set to min(K*RTO,
      2*MSL).  Here RTO is the measured retransmission timeout time and
      the constant K is currently specified to be 8.

   3.5  User Interface

      STD-007 defines a prototype user interface ("transport service")
      that implements the virtual circuit service model [STD-007,
      Section 3.8].  One addition to this interface in required for
      transaction processing: a new Boolean flag "end-of-file" (EOF),
      added to the SEND call.  A generic SEND call becomes:

        Send

          Format:  SEND (local connection name, buffer address,
               byte count, PUSH flag, URGENT flag, EOF flag [,timeout])

      The following text would be added to the description of SEND in
      [STD-007]:

          If the EOF (End-Of-File) flag is set, any remaining queued
          data is pushed and the connection is closed.  Just as with the
          CLOSE call, all data being sent is delivered reliably before
          the close takes effect, and data may continue to be received
          on the connection after completion of the SEND call.

      Figure 8A shows a skeleton sequence of user calls by which a
      client could initiate a transaction.  The SEND call initiates a
      transaction request to the foreign socket (host and port)
      specified in the passive OPEN call.  The predicate "recv_EOF"
      tests whether or not a FIN has been received on the connection;
      this might be implemented using the STATUS command of [STD-007],
      or it might be implemented by some operating-system-dependent
      mechanism.  When recv_EOF returns TRUE, the connection has been



Braden                                                         [Page 28]

RFC 1644                    Transaction/TCP                    July 1994


      completely closed and the client end of the connection is in
      TIME-WAIT state.

     __________________________________________________________________
    |                                                                  |
    |                                                                  |
    | OPEN(local_port, foreign_socket, PASSIVE) -> conn_name;          |
    |                                                                  |
    | SEND(conn_name, request_buffer, length,                          |
    |                                    PUSH=YES, URG=NO, EOF=YES);   |
    |                                                                  |
    | while (not recv_EOF(conn_name)) {                                |
    |                                                                  |
    |    RECEIVE(conn_name, reply_buffer, length) -> count;            |
    |                                                                  |
    |                                           |
    | }                                                                |
    |                                                                  |
    |                                                                  |
    |             Figure 8A: Client Side User Interface                |
    |__________________________________________________________________|

      If a client is going to send a rapid series of such requests to
      the same foreign_socket, it should use the same local_port for
      all.  This will allow truncation of TIME-WAIT state.  Otherwise,
      it could leave local_port wild, allowing TCP to choose successive
      local ports for each call, realizing that each transaction may
      leave behind a significant control block overhead in the kernel.

      Figure 8B shows a basic sequence of server calls.  The server
      application waits for a request to arrive and then reads and
      processes it until a FIN arrives (recv_EOF returns TRUE).  At this
      time, the connection is half-closed.  The SEND call used to return
      the reply completes the close in the other direction.  It should
      be noted that the use of SEND(... EOF=YES) in Figure 4B instead of
      a SEND, CLOSE sequence is only an optimization; it allows
      piggybacking the FIN in order to minimize the number of segments.
      It should have little effect on transaction latency.













Braden                                                         [Page 29]

RFC 1644                    Transaction/TCP                    July 1994


     __________________________________________________________________
    |                                                                  |
    |                                                                  |
    | OPEN(local_port, ANY_SOCKET, PASSIVE) -> conn_name;              |
    |                                                                  |
    |                                    |
    |                                                                  |
    | STATUS(conn_name) -> foreign_socket                              |
    |                                                                  |
    | while (not recv_EOF(conn_name)) {                                |
    |                                                                  |
    |    RECEIVE(conn_name, request_buffer, length) -> count;          |
    |                                                                  |
    |                                         |
    | }                                                                |
    |                                                                  |
    |                      |
    |                                                                  |
    | SEND(conn_name, reply_buffer, length,                            |
    |                                  PUSH=YES, URG=NO, EOF=YES);     |
    |                                                                  |
    |                                                                  |
    |             Figure 8B: Server Side User Interface                |
    |__________________________________________________________________|


4.  IMPLEMENTATION ISSUES

   4.1  RFC-1323 Extensions

      A recently-proposed set of TCP enhancements [RFC-1323] defines a
      Timestamps option, which carries two 32-bit timestamp values.
      This option is used to accurately measure round-trip time (RTT).
      The same option is also used in a procedure known as "PAWS"
      (Protect Against Wrapped Sequence) to prevent erroneous data
      delivery due to a combination of old duplicate segments and
      sequence number reuse at very high bandwidths.  The approach to
      transactions specified in this memo is independent of the RFC-1323
      enhancements, but implementation of RFC-1323 is desirable for all
      TCP's.

      The RFC-1323 extensions share several common implementation issues
      with the T/TCP extensions.  Both require that TCP headers carry
      options.  Accommodating options in TCP headers requires changes in
      the way that the maximum segment size is determined, to prevent
      inadvertent IP fragmentation.  Both require some additional state
      variable in the TCB, which may or may not cause implementation
      difficulties.



Braden                                                         [Page 30]

RFC 1644                    Transaction/TCP                    July 1994


   4.2  Minimal Packet Sequence

      Most TCP implementations will require some small modifications to
      allow the minimal packet sequence for a transaction shown in
      Figure 2.

      Many TCP implementations contain a mechanism to delay
      acknowledgments of some subset of the data segments, to cut down
      on the number of acknowledgment segments and to allow piggybacking
      on the reverse data flow (typically character echoes).  To obtain
      minimal packet exchanges for transactions, it is necessary to
      delay the acknowledgment of some control bits, in an analogous
      manner.  In particular, the  segment that is to be sent
      in ESTABLISHED* or CLOSE-WAIT* state should be delayed.  Note that
      the amount of delay is determined by the minimum RTO at the
      transmitter; it is a parameter of the communication protocol,
      independent of the application.  We propose to use the same delay
      parameter (and if possible, the same mechanism) that is used for
      delaying data acknowledgments.

      To get the FIN piggy-backed on the reply data (segment #3 in
      Figure 2), thos implementations that have an implied PUSH=YES on
      all SEND calls will need to augment the user interface so that
      PUSH=NO can be set for transactions.

   4.3  RTT Measurement

      Transactions introduce new issues into the problem of measuring
      round trip times [Jacobson88].

      (a)  With the minimal 3-segment exchange, there can be exactly one
           RTT measurement in each direction for each transaction.
           Since dynamic estimation of RTT cannot take place within a
           single transaction, it must take place across successive
           transactions.  Therefore, cacheing the measured RTT and RTT
           variance values is essential for transaction processing; in
           normal virtual circuit communication, such cacheing is only
           desirable.

      (b)  At the completion of a transaction, the values for RTT and
           RTT variance that are retained in the cache must be some
           average of previous values with the values measured during
           the transaction that is completing.  This raises the question
           of the time constant for this average; quite different
           dynamic considerations hold for transactions than for file
           transfers, for example.

      (c)  An RTT measurement by the client will yield the value:



Braden                                                         [Page 31]

RFC 1644                    Transaction/TCP                    July 1994


                  T = RTT + min(SPT, ATO),

           where SPT (server processing time) was defined in the
           introduction, and ATO is the timeout period for sending a
           delayed ACK.  Thus, the measured RTT includes SPT, which may
           be arbitrarily variable; however, the resulting variability
           of the measured T cannot exceed ATO. (In a popular TCP
           implementation, for example, ATO = 200ms, so that the
           variance of SPT makes a relatively small contribution to the
           variance of RTT.)

      (d)  Transactions sample the RTT at random times, which are
           determined by the client and the server applications rather
           than by the network dynamics.  When there are long pauses
           between transactions, cached path properties will be poor
           predictors of current values in the network.

      Thus, the dynamics of RTT measurement for transactions differ from
      those for virtual circuits.  RTT measurements should work
      correctly for very short connections but reduce to the current TCP
      algorithms for long-lasting connections.  Further study is this
      issue is needed.

   4.4  Cache Implementation

      This extension requires a per-host cache of connection counts.
      This cache may also contain values of the smoothed RTT, RTT
      variance, congestion avoidance threshold, and MSS values.
      Depending upon the implementation details, it may be simplest to
      build a new cache for these values; another possibility is to use
      the routing cache that should already be included in the host
      [RFC-1122].

      Implementation of the cache may be simplified because it is
      consulted only when a connection is established; thereafter, the
      CC values relevant to the connection are kept in the TCB.  This
      means that a cache entry may be safely reused during the lifetime
      of a connection, avoiding the need for locking.

   4.5  CPU Performance

      TCP implementations are customarily optimized for streaming of
      data at high speeds, not for opening or closing connections.
      Jacobson's Header Prediction algorithm [Jacobson90] handles the
      simple common cases of in-sequence data and ACK segments when
      streaming data.  To provide good performance for transactions, an
      implementation might be able to do an analogous "header
      prediction" specifically for the minimal request and the response



Braden                                                         [Page 32]

RFC 1644                    Transaction/TCP                    July 1994


      segments.

      The overhead of UDP provides a lower bound on the overhead of
      TCP-based transaction processing.  It will probably not be
      possible to reach this bound for TCP transactions, since opening a
      TCP connection involves creating a significant amount of state
      that is not required by UDP.

      McKenney and Dove [McKenney92] have pointed out that transaction
      processing applications of TCP can stress the performance of the
      demultiplexing algorithm, i.e., the algorithm used to look up the
      TCB when a segment arrives.  They advocate the use of hash-table
      techniques rather than a linear search.  The effect of
      demultiplexing on performance may become especially acute for a
      transaction client using the extended TCP described here, due to
      TCB's left in TIME-WAIT state.  A high rate of transactions from a
      given client will leave a large number of TCB's in TIME-WAIT
      state, until their timeout expires.  If the TCP implementation
      uses a linear search for demultiplexing, all of these control
      blocks must be traversed in order to discover that the new
      association does not exist.  In this circumstance, performance of
      a hash table lookup should not degrade severely due to
      transactions.

   4.6  Pre-SYN Queue

      Suppose that segment #1 in Figure 4 is lost in the network; when
      segment #2 arrives in LISTEN state, it will be ignored by the TCP
      rules (see [STD-007] p.66, "fourth other text and control"), and
      must be retransmitted.  It would be possible for the server side
      to queue any ACK-less data segments received in LISTEN state and
      to "replay" the segments in this queue when a SYN segment does
      arrive.  A data segment received with an ACK bit, which is the
      normal case for existing TCP's, would still a generate RST
      segment.

      Note that queueing segments in LISTEN state is different from
      queueing out-of-order segments after the connection is
      synchronized.  In LISTEN state, the sequence number corresponding
      to the left window edge is not yet known, so that the segment
      cannot be trimmed to fit within the window before it is queued.
      In fact, no processing should be done on a queued segment while
      the connection is still in LISTEN state.  Therefore, a new "pre-
      SYN queue" would be needed.  A timeout would be required, to flush
      the Pre-SYN Queue in case a SYN segment was not received.

      Although implementation of a pre-SYN queue is not difficult in BSD
      TCP, its limited contribution to throughput probably does not



Braden                                                         [Page 33]

RFC 1644                    Transaction/TCP                    July 1994


      justify the effort.

6.  ACKNOWLEDGMENTS

   I am very grateful to Dave Clark for pointing out bugs in RFC-1379
   and for helping me to clarify the model.  I also wish to thank Greg
   Minshall, whose probing questions led to further elucidation of the
   issues in T/TCP.

7.  REFERENCES

    [Jacobson88] Jacobson, V., "Congestion Avoidance and Control", ACM
      SIGCOMM '88, Stanford, CA, August 1988.

    [Jacobson90] Jacobson, V., "4BSD Header Prediction", Comp Comm
      Review, v. 20, no. 2, April 1990.

    [McKenney92]  McKenney, P., and K. Dove, "Efficient Demultiplexing
      of Incoming TCP Packets", ACM SIGCOMM '92, Baltimore, MD, October
      1992.

    [RFC-1122]  Braden, R., Ed., "Requirements for Internet Hosts --
      Communications Layers", STD-3, RFC-1122, USC/Information Sciences
      Institute, October 1989.

    [RFC-1323]  Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
      for High Performance, RFC-1323, LBL, USC/Information Sciences
      Institute, Cray Research, February 1991.

    [RFC-1379]  Braden, R., "Transaction TCP -- Concepts", RFC-1379,
      USC/Information Sciences Institute, September 1992.

    [ShankarLee93]  Shankar, A. and D. Lee, "Modulo-N Incarnation
      Numbers for Cache-Based Transport Protocols", Report CS-TR-3046/
      UIMACS-TR-93-24, University of Maryland, March 1993.

    [STD-007]  Postel, J., "Transmission Control Protocol - DARPA
      Internet Program Protocol Specification", STD-007, RFC-793,
      USC/Information Sciences Institute, September 1981.












Braden                                                         [Page 34]

RFC 1644                    Transaction/TCP                    July 1994


APPENDIX A.  ALGORITHM SUMMARY

   This appendix summarizes the additional processing rules introduced
   by T/TCP.  We define the following symbols:

   Options

       CC(SEG.CC):         TCP Connection Count (CC) Option
       CC.NEW(SEG.CC):     TCP CC.NEW option
       CC.ECHO(SEG.CC):    TCP CC.ECHO option

           Here SEG.CC is option value in segment.

   Per-Connection State Variables in TCB

       CCsend:             CC value to be sent in segments
       CCrecv:             CC value to be received in segments
       Elapsed:            Duration of connection

   Global Variables:

       CCgen:              CC generator variable
       cache.CC[fh]:       Cache entry: Last CC value received.
       cache.CCsent[fh]:   Cache entry: Last CC value sent.


   PSEUDO-CODE SUMMARY:

   Passive OPEN => {
       Create new TCB;
   }

   Active OPEN => {
       
       CCrecv = 0;
       CCsend = CCgen;
       If (CCgen == 0xffffffff) then Set CCgen = 1;
                                else Set CCgen = CCgen + 1.
       
   }


   Send initial {SYN} segment => {

       If (cache.CCsent[fh] == 0 OR CCsend < cache.CCsent[fh] ) then {

             Include CC.NEW(CCsend) option in segment;
             Set cache.CCsent[fh] = 0;



Braden                                                         [Page 35]

RFC 1644                    Transaction/TCP                    July 1994


       }
       else {

             Include CC(CCsend) option in segment;
             Set cache.CCsent[fh] = CCsend;
       }
    }


   Send {SYN,ACK} segment => {

       If (CCrecv != 0) then
             Include CC(CCsend), CC.ECHO(CCrecv) options in segment.
   }


   Receive {SYN} segment in LISTEN, SYN-SENT, or SYN-SENT* state => {

       If state == LISTEN then {
             CCrecv = 0;
             CCsend = CCgen;
             If (CCgen == 0xffffffff) then Set CCgen = 1;
                                      else Set CCgen = CCgen + 1.
       }

       If (Segment contains CC option  OR
             Segment contains CC.NEW option) then
                   Set CCrecv = SEG.CC.

       if (Segment contains CC option  AND
             cache.CC[fh] != 0  AND
                   SEG.CC > cache.CC[fh] ) then {  /* TAO Test OK */

             Set cache.CC[fh] = CCrecv;
             
             
       }


       If (Segment does not contain CC option)  then
             Set cache.CC[fh] = 0;

       .
   }

   Receive {SYN} segment in LISTEN-TW, LISTEN-LA, LISTEN-LA*, LISTEN-CL,
       or LISTEN-CL* state => {




Braden                                                         [Page 36]

RFC 1644                    Transaction/TCP                    July 1994


       If ( (Segment contains CC option AND CCrecv != 0 )  then  {

             If (state = LISTEN-TW AND Elapsed > MSL ) then
                   .

             if (SEG.CC > CCrecv )  then {
                   ;
                   ;
                   .
                           /* Expect to match new TCB
                            * in LISTEN state.
                            */
              }
       }
       else
             .
   }


   Receive {SYN,ACK} segment => {

       if (Segment contains CC.ECHO option  AND
                   SEG.CC != CCsend) then
             .

       if (Segment contains CC option) then {
             Set CCrecv = SEG.CC.

             if (cache.CC[fh] is undefined) then
                   Set cache.CC[fh] = CCrecv.
       }
   }


   Send non-SYN segment => {

       if (CCrecv != 0  OR
             (cache.CCsent[fh] != 0  AND
              state is SYN-SENT or SYN-SENT*)) then
                  Include CC(CCsend) option in segment.
   }


   Receive non-SYN segment in SYN-RECEIVED state => {

       if (Segment contains CC option  AND  RST bit is off) {
               if (SEG.CC != CCrecv)  then
                     .

               if (cache.CC[fh] is undefined)  then
                     Set cache.CC[fh] = CCrecv.
       }
   }


   Receive non-SYN segment in (state >= ESTABLISHED) => {

       if (Segment contains CC option  AND  RST bit is off) {
               if (SEG.CC != CCrecv)  then
                     .
       }
   }


Security Considerations

   Security issues are not discussed in this memo.

Author's Address

   Bob Braden
   University of Southern California
   Information Sciences Institute
   4676 Admiralty Way
   Marina del Rey, CA 90292

   Phone: (310) 822-1511
   EMail: Braden@ISI.EDU



















Braden                                                         [Page 38]