Internet Draft
INTERNET-DRAFT                                                  Lan Wang
Expires: October 1999                                     Andreas Terzis
<draft-wang-rsvp-state-compression-00.txt>                   Lixia Zhang
                                                                    UCLA
                                                              April 1999
         A Proposal for reducing RSVP Refresh Overhead using State
                                Compression

                  <draft-wang-rsvp-state-compression-00.txt>

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   As a soft-state protocol, RSVP specifies that each RSVP node sends
   periodic RSVP messages for each existing RSVP session.  The overhead
   due to such periodic refreshes goes up linearly with the number of
   active RSVP sessions.  One can reduce the overhead by using a longer
   refresh period, which unfortunately leads to longer delays in re-
   synchronizing RSVP state.  To overcome the dilemma, this draft
   describes the use of a "state-compression" approach to soft-state
   protocol refreshes.  In the absence of state changes this compression
   mechanism has a constant overhead while retaining the soft-state
   nature of RSVP.

draft-wang-rsvp-state-compression-00.txt                        [Page 1]

INTERNET-DRAFT                                                April 1999

1.  Introduction

   As a soft-state setup protocol, RSVP specifies that each RSVP node
   sends control messages for each active RSVP session to each of its
   neighbors serving the same session.  These control messages are sent
   immediately whenever a change of state occurs, for example when a new
   session starts, when the TSPEC or RSPEC parameters of an existing
   session change, or when a route change occurs that affects the paths
   of existing RSVP sessions.  In addition, an RSVP node also sends
   periodic refresh messages to its neighbors about all existing
   sessions.  Such periodic refreshes serve the purpose of keeping
   consistent RSVP reservation state along data flow paths.  There are
   potentially a number of causes that can make the reservation state
   between neighbor RSVP nodes out-of-synchronization. For example:

      - An RSVP message that carries state-change information gets lost.
      - A rare local event inadvertently changed RSVP state values.
      - Other locally induced state changes occur, for example when the
        current Kerberos key expires, a new key is obtained which needs
        to be carried to the next hop by the POLICY_DATA object in the
        next refresh.

   Instead of attempting to identify all the potential causes of changes
   and handle them individually, RSVP simply uses periodic refreshes to
   persistently re-synchronize the reservation state along the path.
   This approach assures consistent reservation state along the data
   flow path, even when the routers do not have an exhaustive list of
   all the possible causes of state inconsistency or how to correct them
   individually.

   The drawback of this simple approach is that its overhead grows
   linearly with the number of active RSVP sessions.  A common mechanism
   to reduce the periodic refresh overhead is by using a longer refresh
   period (example: RTCP, SDR announcements).  A longer refresh period,
   however, leads to longer delays in re-synchronizing state.

   In this draft we propose a "compression" mechanism for refresh
   overhead reduction.  The goal of this scheme is to improve the
   efficiency of soft-state protocols in their communication behavior,
   not to further simplify RSVP.  Rather, we applied *additional*
   processing (compression) to the protocol state information so that
   the number of refresh messages is reduced from being proportional to
   the number of sessions to being proportional to the number of
   neighbor nodes, while keeping the soft-state nature of RSVP.  When a
   state change is notified by explicit RSVP messages, or detected by
   the periodic refreshes, the cost of updating the "compressed" state
   is proportional to LOG(T) where T is the total number of RSVP
   sessions.  Section 7 presents a detailed overhead analysis of our

draft-wang-rsvp-state-compression-00.txt                        [Page 2]

INTERNET-DRAFT                                                April 1999

   compression mechanism.

2.  Refresh Overhead Reduction by State Compression

   Even in the absence of new control information generated by sources
   or destinations, an RSVP node sends one message per refresh period to
   its neighboring nodes for each of all the RSVP sessions to keep the
   state consistency.  Instead of sending one message per refresh period
   per session, one way to reduce the overhead is to compress the state
   of all RSVP sessions to a single signature, which can then be sent to
   neighbor nodes in place of all the "raw" refresh messages.  All we
   need is an algorithm that can produce a one-to-one mapping from the
   given set of RSVP state to a signature.

   In this draft we propose to use the MD5 algorithm [MD5] for RSVP
   state compression.  As stated in [MD5], "it is computationally
   infeasible to produce two messages having the same message digest, or
   to produce any message having a given pre-specified target message
   digest". We can therefore conclude, with a high level of assurance,
   that no two sets of different RSVP states should result in the same
   signature, and whenever the computed signatures at two neighboring
   nodes differ, the two nodes will be able to detect the differences
   from the states of shared RSVP sessions.  Whenever a state change is
   detected (the digests of two neighbors disagree), raw RSVP messages
   are transmitted to re-synchronize the state.

   As a further optimization, we also add an acknowledgment component
   (ACK) to the RSVP protocol.  The ACK is used for the following two
   purposes.

     1. To minimize the re-synchronization delay after an explicit state
        change request.  A node can request an ACK for each RSVP message
        that carries state-change information, and promptly retransmit
        the message until an acknowledgment is received.
     2. To minimize the re-synchronization cost when inconsistent state
        between RSVP neighboring nodes is detected but the exact
        difference is unknown.  A node should request an ACK for each
        digest sent, and it should log all the state changes since the
        last agreed-upon digest (which indicates that the two nodes have
        consistent state at that point in time).  When an inconsistency
        is detected, the first recovery effort of a node is to send
        those latest changes.

   We propose to compute the MD5 signature in a structured way (see
   Section 3.2), so that in cases where the retransmission of latest
   changes does not result in state re-synchronization, two neighbors

draft-wang-rsvp-state-compression-00.txt                        [Page 3]

INTERNET-DRAFT                                                April 1999

   can efficiently locate the portion of inconsistent state and send raw
   RSVP messages to re-synchronize.

2.1.  Definitions

      - Input Message

           The input to the MD5 Algorithm.  In this document an input
           message is some portion of the RSVP state stored at the local
           node.

      - MD5 Signature

           The result of the computation of the MD5 algorithm on an
           input message.

      - Digest

           A set of MD5 Signatures that represents a compressed version
           of the RSVP state shared between two neighboring RSVP nodes.

   The following definitions are taken from RFC 2209.

      - PSB

           Path State Block. Each PSB holds path state for a particular
           (session, sender) pair, defined by SESSION and
           SENDER_TEMPLATE objects, respectively.

      -RSB

           Reservation State Block. Each RSB holds a reservation request
           that arrived in a particular RESV message, corresponding to
           the triple:  (session, next hop, Filter_spec_list).

2.2.  Limitations of the New Approach

   The ability for two RSVP neighbors to exchange a digest instead of
   raw RSVP messages relies on the assumption that the two nodes know
   precisely all the RSVP sessions that go through these two nodes in
   sequence.  Whenever a unicast route change occurs, the upstream node

draft-wang-rsvp-state-compression-00.txt                        [Page 4]

INTERNET-DRAFT                                                April 1999

   must be able to detect the change and synchronize the state with the
   new downstream node (as well as tearing down the session with the old
   downstream neighbor). Similar actions need to be taken whenever a
   change in multicast routing occurs, which includes not only route
   changes but also multicast group membership changes (either addition
   or deletion).

   When two neighbor nodes of an RSVP session are directly connected,
   RSVP can expect to receive notification of route changes through the
   interface to the local routing module.  When the two nodes are
   interconnected through a non-RSVP cloud, however, there seems no
   known way to reliably detect all routing changes without the per
   session periodic refreshes.

   Furthermore, due to the fact that each multicast RSVP session may
   involve a different set of up- and down-stream neighbors than any
   other multicast session, it is difficult to compute a digest for
   aggregate multicast sessions.  Thus this first attempt focuses on
   refresh reduction for unicast RSVP sessions only.

   Finally, because an RSVP session is identified by the destination
   address, a unicast RSVP session can be a many-to-one session with
   multiple senders, thus the reservation parameters may change when
   crossing a merging point that has multiple upstream branches towards
   different senders. Consequently, computing the digest for each RSVP
   neighbor requires that the local node store data value that is not in
   the current protocol specification.  We further discuss the
   difficulty in handling multicast and many-to-one unicast session, as
   well as possible solutions, in appendix A.

   Therefore this first draft of the refresh compression mechanism
   limits the problem space to handling one-to-one unicast sessions
   whose paths do not cross non-RSVP clouds.  An RSVP node resorts to
   the current refresh scheme for all the sessions that have the
   "non_RSVP_cloud" bit set, or that have a multicast destination
   address.

2.3.  Benefits and Constraints of the New Scheme

   The design of the new scheme aims at the following features

      - efficient state re-synchronization when two RSVP neighbors
        discover state inconsistency.
      - choice of individual nodes to use either original RSVP refreshes
        or the refresh reduction, or switch in between over time to
        achieve the best tradeoff.

draft-wang-rsvp-state-compression-00.txt                        [Page 5]

INTERNET-DRAFT                                                April 1999

      - backward compatibility with the current RSVP specification.
      - incremental digest computation when part of the session(s)
        changes state.

   We also recognize the following constraints of the new scheme.

      - Generally speaking, individual RSVP sessions follow different
        paths.  However to reduce the refresh overhead to one refresh
        message per refresh period per neighbor, a digest represents the
        compressed state of aggregate sessions.  The only way to do so
        is to first group all RSVP sessions by each neighbor, then
        compute the digest.  That is, refreshing of compressed state
        must be done on a per neighbor basis.
      - In its current design, RSVP PATH messages carry the destination
        address of the session, thus all refresh messages automatically
        follow the latest unicast or multicast routing changes
        (including multicast group membership changes).  Although RSVP
        has an interface to the local routing module for prompt
        notification of all the changes to minimize the adaptation
        delay, all RSVP reservations eventually converge on new paths
        even in the absence of such notifications.  When we move from
        per session refresh to per hop refresh by digest, however, the
        protocol loses its automatic routing adaptation.  Thus the new
        scheme relies entirely on specific notification to adjust to any
        routing changes.
      - To detect all the RSVP state changes, the digest must be
        computed over the RSVP sessions state rather than over the
        messages that are exchanged. Because those messages represent
        partial, but not all causes of state changes, digests of the
        messages alone are inadequate in detecting all the state
        changes; two neighbor nodes may end up in inconsistent state
        even when all RSVP messages are delivered correctly.

   One challenge we encountered many times during proposing the
   compression refresh mechanism is that when an RSVP node processes the
   objects carried in a refresh message, this processing step may take
   other factors as part of the input, and produce new object values to
   be passed on to next neighbor; what is saved at the local node is the
   value of the received objects.  Change of RSPEC is a typical example
   in a multicast RSVP session.  The same issue exists even when we
   limit our consideration to unicast sessions only; ADSPEC and
   POLICY_DATA are two such objects whose new values sent to the next
   hop may be different from that stored at the local node.  To compare
   digest between neighbor node pairs a node must keep the latest values
   of all the objects it has sent to the neighbor.  This requirement
   leads to a necessary change to the current protocol specification.

draft-wang-rsvp-state-compression-00.txt                        [Page 6]

INTERNET-DRAFT                                                April 1999

3.  Digest Mechanism Description

3.1.  State Organization for Digest Computation

   RSVP maintains path state and reservation state for each active
   session [RFC2205].  A session is uniquely identified by a session
   object which contains the IP destination address, protocol ID and
   optionally a destination port number of the associated data packets.
   A Path State Block (PSB) is comprised of a sender template (i.e. IP
   address and port number of the sender), a Tspec that describes the
   sender's traffic characteristics, and possibly objects for policy
   control and advertisements.  A Reservation State Block (RSB) contains
   filterspecs (i.e. sender templates) of the senders for which the
   reservation is intended, the reservation style and a flowspec that
   quantifies the reservation.  It may also contain objects for policy
   control and confirmation.

   To "compress" a set of RSVP states into an MD5 signature, neighboring
   RSVP routers must order their states using the same rules since
   different orders of the states may result in different signatures.
   We propose that sessions be ordered by the increasing order of IP
   address, protocol ID and port number in session objects.  Ordering
   rules for multicast and many-to-one unicast sessions are described in
   appendix A.

3.2.  Structured Digest Computation

   As the number of sessions increases, a node may experience state
   changes more frequently and, with a flat structure, the overhead of
   digest computation increases linearly with the number of sessions.
   Therefore we need a scheme to facilitate partial state changes by
   incremental MD5 computation.

   As mentioned in section 3.1, a node orders all the RSVP sessions it
   shares with each neighbor node.  The node first computes a MD5
   signature for each session.  Table 1 shows the objects that should be
   included in the digest computation.  These objects should be fed into
   the computation procedure in the order as presented in the table.
   Although PSB and RSB also contain a few other fields such as incoming
   interface and outgoing interfaces, those fields have only local
   meaning to a specific router and therefore are excluded from the
   digest computation.

draft-wang-rsvp-state-compression-00.txt                        [Page 7]

INTERNET-DRAFT                                                April 1999

        RSVP Structure ||    Objects to Include
      _________________||_______________________________________
                       ||
           Session     ||  session object
      _________________||_______________________________________
                       ||
             PSB       ||  sender template, sender tspec,
                       ||  adspec, policy
      _________________||_______________________________________
                       ||
             RSB       ||  filterspec, flowspec, style, policy
      _________________||_______________________________________

           Table 1. Objects to Include in Digest Computation

   Assuming that N is the number of MD5 signatures that can be packed in
   one Digest message, a node constructs a digest tree for each of its
   neighbors in the following way.  First, it computes the signature of
   each session, making these signatures as the leaf node of the tree.
   If there are more than N leaf-level signatures, it groups each N
   signatures in one block and compute a level-1 signature for the
   group.  With a total of T sessions, this grouping results in
   Ceiling[T/N] level-1 signatures.  If the number of level-1 signatures
   is still larger than N, the node continues on to group each of N
   level-1 signatures to compute a level-2 signature to get
   Ceiling[T/N^2] level-2 signatures.  Let Ci be the number of level-i
   signatures, we repeat the grouping until Ci is smaller than or equal
   to N.

                              o   o   o  digest
                             /|\ /|\ /|\
                             ooo ooo ooo  level-1 signature
                            /|\ ....../|\
                           o o o ....o o o  leaf-level signatures

                    Figure 1. A Digest Tree with N=3

   There are two ways to define the value for N.  The first way is to
   define it statically in the specification.  One would like the
   largest N value that can fit a digest in the link MTU, but the value
   of MTU may be different at different places.  A second approach is to
   let two RSVP neighbor nodes negotiate N's value at startup time and
   then keep it constant.  This second approach may require new
   handshake message exchanges between two neighbor RSVP nodes.  In this
   draft we assume a configured value for N.

draft-wang-rsvp-state-compression-00.txt                        [Page 8]

INTERNET-DRAFT                                                April 1999

   A digest needs to be recomputed when the following events occur:
     o  a new session is added;
     o  a session is changed, i.e. a state is added to a session, or a
        state is modified or deleted from a session;
     o  an existing session is deleted.
   The cost of digest computation is summarized in section 7.

3.3.  Neighbor Data Structure

   A router needs to compute two digests for each neighbor, one digest
   for the states that are refreshed by messages from that neighbor and
   one for the states that the local sends refresh messages to the
   neighbor.  We refer to the two digests InDigest and OutDigest,
   respectively.  The OutDigest is used in the Digest message and the
   InDigest is sent in a DigestErr message to the corresponding
   neighbor.  Section 4 specifies the format of the new RSVP message
   types proposed for our scheme.

   To avoid having to traverse all the RSVP sessions to compute the
   digest for a specific neighbor, a router maintains a linked list that
   points to all the sessions shared with the neighbor. An example of
   the per neighbor structure is shown in Figure 2.

   The Neighbor structure holds all the information needed to send and
   receive digests to/from a RSVP neighbor node. A description of each
   field in the Neighbor structure follows:

   IP Address

      Holds the IP address of the interface this neighbor uses to
      communicate with the local node.

   OutSession

      OutSession contains the sessions to be included in the computation
      of the digest the local node is sending to this neighbor. The
      sessions included are those that have PSBs whose next downstream
      hop is the neighbor in question and RSBs that have that neighbor
      as the immediate upstream hop.

   OutDigest

      The digest sent to this neighbor.

   RefreshOutTimer

      Contains the next time a digest should be sent to this neighbor.

draft-wang-rsvp-state-compression-00.txt                        [Page 9]

INTERNET-DRAFT                                                April 1999

                            Session Struct       Session Struct
                                   ^                    ^
                                   |                    |
   Neighbor Struct                 |                    | SessDigest Struct
   +----------------- +    +-----------------+    +-----------------+
   | IP Address       |    | Session*        |    | Session*        |
   +----------------- +    +-----------------+    +-----------------+
   | OutSession       |--->| Next SessDigest |--->| Next SessDigest |---->...
   +----------------- +    +-----------------+    +-----------------+
   | OutDigest        |    | P/R             |    | P/R             |
   +----------------- +    +-----------------+    +-----------------+
   | RefreshOutTimer  |    | MD5 Signature   |    | MD5 Signature   |
   +----------------- +    +-----------------+    +-----------------+
   | TimestampLastSent|
   +------------------+
   |TimestampLastAcked|
   +------------------+
   | OutDigestTimeout |
   +------------------+
   | InSession        |---> Linked List of SessDigest structs
   +------------------+
   | InDigest         |
   +------------------+
   | CleanupInTimer   |
   +------------------+
   |   ...            |
   +------------------+

                   Figure 2.  Neighbor Structure

   TimestampLastSent

      Contains the timestamp of the latest digest sent to this neighbor.

   TimestampLastAcked

      This is the timestamp of the latest digest acknowledged by the
      neighbor, i.e. the time instant when state, between the local node
      and this neighbor, was last synchronized.

   OutDigestTimeout

      Contains the timeout period in milliseconds for digests sent to
      this neighbor. If an acknowledgment is not received by the end of
      this period, the digest should be retransmitted.

draft-wang-rsvp-state-compression-00.txt                       [Page 10]

INTERNET-DRAFT                                                April 1999

   InSession

      A pointer to a linked list of SessDigest objects. This list
      contains all the sessions that are refreshed by digests received
      from this neighbor. Specifically, this includes sessions whose
      PSBs have this neighbor as previous hop and sessions whose RSBs
      have this neighbor as next hop.

   InDigest

      The digest expected to be received from this neighbor.

   CleanupInTimer

      If no digest is received from this neighbor by the time contained
      in the CleanupInTimer field, all associated state should be
      cleaned up.

   The SessDigest structure contains information about a session that
   has the neighbor in question as a Next Hop. The SessDigest structure
   has the following fields:

   Session*

      A pointer to the portion of the RSVP state that holds information
      for the session represented by this structure.

   P/R

      This field indicates whether the PSB or the RSB of a Session will
      be used in the signature operation.

   MD5 signature

      Contains the MD5 signature for this session.

3.4.  Neighbor Discovery

   To use the compressed refresh scheme, a router needs to discover all
   the neighbors that are capable of exchanging digests (compression-
   capable).  For this purpose, when a RSVP router starts sending (raw)
   RSVP messages of a session, it should request that the neighbor(s)
   acknowledge the messages by including a TIMESTAMP object with the
   ACK_Requested flag on (see section 4).  If the local router receives
   an ACK message from a neighbor whose address is not currently on the

draft-wang-rsvp-state-compression-00.txt                       [Page 11]

INTERNET-DRAFT                                                April 1999

   Neighbor Structure list, it has discovered a new digest-capable
   neighbor. If it receives an error message from a neighbor who does
   not understand the new TIMESTAMP object, it discovers a neighbor that
   is compression-incapable.

   When a non-RSVP cloud exists between two RSVP neighbor nodes,
   although they can discover each other using the acknowledgments
   during initial message exchanges, the upstream node may not be able
   to detect a change of the downstream neighbor caused by a unicast
   route change.  As shown in Figure 3, node A originally has a
   downstream neighbor B for a particular unicast RSVP session.  After a
   route change, node C becomes A's downstream neighbor.  However, since
   A's outgoing interface is unchanged, A will not notice the route
   change, hence it will continue to include the path state of sender S
   when calculating the digest to B. Node B will not be aware of the
   change either as long as A sends B the same digest.  C may never get
   a path message from A.  Therefore, the digest scheme cannot be used
   crossing non-RSVP clouds until an effective way of detecting route
   changes is found.  RSVP routers can detect the existence of non-RSVP
   clouds between neighbors by comparing the TTL value in the IP header
   with the Send_TTL in the RSVP common header, as described in
   [RFC2209].

                           original route
                         ___________ --->
                 ->   ->|           |-- B ----| ->
          S - - --- A --| non-RSVP  |         |------- D
                        |  cloud    |         |
                        |___________|-- C ----|
                            new route --->

           Figure 3. An RSVP Session with a non-RSVP cloud

3.5.  Digest Refreshes

   RSVP states are refreshed by regular RSVP messages as well as by
   digest Messages.  However, different from a regular RSVP message
   which refreshes the state of one RSVP session, a digest message may
   carry multiple MD5 signature values, each representing the compressed
   state of multiple sessions.  If an individual signature in the digest
   matches the corresponding local value, all the RSVP states covered by
   that signature should be refreshed.  That is the local node should
   carry out the same operation as if one refresh message for each of
   all the covered sessions is received.

draft-wang-rsvp-state-compression-00.txt                       [Page 12]

INTERNET-DRAFT                                                April 1999

   The refresh rate of digest messages can be different from the refresh
   rates of individual sessions covered by that digest. For digest
   messages, the default refresh rate is 30 seconds as RFC 2205
   suggests.  If a different refresh rate is used, it SHOULD be carried
   in the TIME_VALUES object of the digest message. After a digest is
   received, all sessions covered by this digest MUST be refreshed at
   this rate.

4.  New RSVP Message Types and Objects

4.1.  TIMESTAMP Object

   TIMESTAMP Class =  Value to be assigned by IANA of form 0bbbbbbb

   TIMESTAMP object

   Class = TIMESTAMP Class, C_Type = 1

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Flags     |                  reserved                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Timestamp                                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Flags: 8 bits

      0x80 = ACK_Requested flag

      Indicates that the sender is requesting a message acknowledgment.
      Acknowledgments MUST be silently ignored when they are sent in
      response to messages whose ACK_Requested flag is not set.  This
      flag MUST be set when the message is a Digest message.  It MUST
      not be set in an ACK or DigestErr message.

   Reserved: 24 bits

      This field is reserved.  It MUST be set to zero on transmission
      and MUST be ignored on receipt.

draft-wang-rsvp-state-compression-00.txt                       [Page 13]

INTERNET-DRAFT                                                April 1999

   Timestamp: 32 bits

      It is the middle 32 bits (lower 16 bits of the integer part and
      higher 16 bits of the fractional part) of the NTP time value when
      the message is sent. If NTP time is not available, system timer
      can also be used.  When combined with the message generator's IP
      address, it uniquely identifies a message.

   We chose to use a timestamp to identify a message because it is
   always increasing and so a sender does not have to check if the value
   is in use or has been used before.  It also helps the receiver to
   identify which of the RSVP messages for the same state is the most
   recent one.  However, depending on the processing speed of the router
   and the existing timer granularity, we may have two consecutive
   messages with the same timestamp value.  Therefore, we define the
   timestamp to be max(current_timestamp, last_used_timestamp+1), where
   last_used_timestamp is the timestamp of the latest RSVP message that
   contains a TIMESTAMP object.

4.2.  DIGEST Object

   DIGEST Class =  Value to be assigned by IANA of form 10bbbbbb

   DIGEST object

   Class = DIGEST Class, C_Type = 1

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Level    |                      Group                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Reserved            |     Number of Signatures      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   //                   signature (16 bytes)                      //
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Level: 8 bits

      This is the level of this set of signatures in the corresponding
      digest tree.  The level of a signature is defined as follows:
        1. Leaf signatures are assigned level 0;

draft-wang-rsvp-state-compression-00.txt                       [Page 14]

INTERNET-DRAFT                                                April 1999

        2. The level of a non-leaf signature is (1 + the level of its
           immediate children).

   Group: 24 bits

      This is the group this set of signatures belong to (recall we
      divide signatures into groups of size N and compute a signature
      for each group to build a tree).  We number the groups at the same
      level of a digest tree increasingly from left to right and the
      left-most group is assigned group 0.  Level and group uniquely
      identify the location of a set of signatures in a digest tree.

   Reserved: 16 bits

      This field is reserved.  It MUST be set to zero on transmission
      and MUST be ignored on receipt.

   Number of Signatures: 16 bits

      Number of MD5 signatures contained in this object.

   Signature:

      a 16-byte MD5 signature.

4.3.  Digest Message

   The format of a Digest message is as follows:

      ::=  [  ]
                            [  ]

   A Digest message has a common RSVP header with the message type set
   to 14.  It MUST contain a TIMESTAMP object with the ACK_Requested
   flag set and a digest object.  The TIME_VALUES object may be included
   so that a router can inform its neighbor what timer value to use to
   time out states when using a digest scheme.

draft-wang-rsvp-state-compression-00.txt                       [Page 15]

INTERNET-DRAFT                                                April 1999

4.4.  ACK Message

   The format of an ACK message is as follows:

      ::=  [  ]
                        [  ... ]

   An ACK message has a message type of 15.  It MUST contain at least
   one TIMESTAMP object.  With multiple TIMESTAMP objects, the ACK
   message can be used to acknowledge multiple messages.

4.5.  DigestErr Message

   A DigestErr message has the following format:

      ::=  [  ]
                              

   The message type of DigestErr message is 16.  It MUST contain a
   TIMESTAMP object and a DIGEST object.  A DigestErr message acts as a
   negative acknowledgment to a Digest message.  The TIMESTAMP object
   identifies the Digest message being acknowledged.  The DIGEST object
   carries the local digest value of the sender of the DigestErr
   message.

5.  RSVP Message Operations

5.1.  Non-digest RSVP Messages

   A node sends non-digest RSVP messages in the same way as before, with
   the new option of asking for an ACK.  To request an acknowledgment,
   the node sends the message with a TIMESTAMP object and sets the
   ACK_Requested bit in the object.  It also sets a retransmission
   timer.  When the neighbor receives a message that requires
   acknowledgment, it processes the message as usual and sends back an
   ACK message which contains the timestamp of the message being
   acknowledged.  If there is any change in the state, it then updates
   the signature.

   If no ACK is received after timeout, the node retransmits the message

draft-wang-rsvp-state-compression-00.txt                       [Page 16]

INTERNET-DRAFT                                                April 1999

   up to a configured number of times.  The retransmission period SHOULD
   be shorter than regular refresh period and be proportional to the
   round-trip time between the two neighboring nodes.  Each
   retransmission MUST carry the same timestamp as that in the original
   message.  If the corresponding state is changed before an ACK is
   received, the original message becomes obsolete and no longer needs
   to be retransmitted.  If changing of the state triggers a new
   message, the timestamp of the new message should be
   max(current_timestamp, last_used_timestamp + 1). If the node does not
   receive an ACK after all the retransmissions, this message needs to
   be retransmitted later in the recovery process (see section 6.2).

5.2.  Digest Message

   Once a compression-capable node discovers a compression-capable
   neighbor, instead of sending one refresh message for each PATH/RESV
   state, it sends a digest, in one RSVP message per refresh period to
   each of the neighbors. Special treatments for RSVP states containing
   ADSPEC and POLICY_DATA objects are discussed in section 5.3.  Each
   digest MUST carry a timestamp object with the ACK_Requested flag set.

   When the neighbor receives the Digest message, it first locates the
   corresponding set of signatures in the local digest tree and then
   performs a binary comparison between each signature in the message
   and that in the local digest tree.

      o If any of the signatures does not match the local one, it should
        respond with a DigestErr message containing the local digest
        value after processing the Digest message.  Note that if there
        is any matching signature, RSVP states represented by that
        signature should be refreshed.
      o Otherwise, it is in a consistent state with the sender, so it
        sends an ACK message and refreshes all the states shared with
        the sender.

   The retransmission of a Digest message is handled in the same way as
   non-digest messages.  However, as in the original RSVP design where
   an RSVP node never stops sending refresh messages for each active
   session, a node should not stop sending digest refreshes even when a
   previous one failed to receive an acknowledgment.   If a neighbor
   node crashed and becomes alive again, it will find the digest value
   different from its own and the two routers will start
   resynchronization step.  When the digest value is changed, any
   pending retransmission of the obsolete Digest message should be
   canceled.

draft-wang-rsvp-state-compression-00.txt                       [Page 17]

INTERNET-DRAFT                                                April 1999

5.3.  Refresh of ADSPEC and POLICY_DATA

   An RSVP message may carry an ADSPEC object that contains resource
   information used by receivers to predict the end-to-end service.
   This object is updated hop-by-hop to gather information along a
   route.  Since it is opaque to RSVP, RSVP simply records the received
   ADSPEC in the corresponding state and, whenever it needs to send a
   refresh message for a state, it calls the traffic control module with
   the recorded ADSPEC to get an updated ADSPEC object. Therefore, the
   ADSPEC object sent by a node may be different from that received by
   the node.  POLICY_DATA object is handled in the same way except that
   it is submitted to the policy control module.

   To handle ADSPEC or POLICY_DATA objects correctly with the digest
   scheme, we need to solve the following two problems.  First, for each
   session with ADSPEC or POLICY_DATA object, we must trigger periodic
   updating of ADSPEC or POLICY_DATA objects.  Second, since ADSPEC and
   POLICY_DATA are opaque to RSVP, RSVP cannot tell whether the objects
   returned by the traffic control (or policy control) module changes
   after each call.

   For the first problem, we treat digest refreshes the same way as
   regular RSVP refresh messages.  In other words, before a node sends a
   digest, the node calls corresponding modules to update these objects.

   There are three possible solutions to the second problem.  First, a
   node can simply assume that the object returned by the traffic
   control (or policy control) module changes after each call at refresh
   period, which represents a change in RSVP state and hence a regular
   RSVP PATH state should be sent to carry the new object value to the
   next hop.  This approach requires no changes to RSVP specification
   but it essentially falls back to the original RSVP refresh mechanism
   whenever ADSPEC or POLICY_DATA object exists in an RSVP session.

   The second option is to modify the RSVP/traffic control and
   RSVP/policy control interfaces so that they return a change bit that
   indicates whether the updated object is different from the previously
   returned one.  The previously returned object can be an argument
   passed to the interface.  For example, the original RSVP/traffic
   control interface for ADSPEC is

          Call: TC_Advertise( Interface, Adspec,
                              Non_RSVP_Hop_flag ) -> New_Adspec,

   where Adspec is the ADSPEC from received RSVP messages.  We can
   change it to

draft-wang-rsvp-state-compression-00.txt                       [Page 18]

INTERNET-DRAFT                                                April 1999

          Call: TC_Advertise( Interface, Adspec, Old_Adspec
                              Non_RSVP_Hop_flag ) -> New_Adspec, change bit,

   and Old_Adspec is the ADSPEC object returned by the previous call.
   When the changed bit is set, a regular RSVP refresh must be sent.

   The third option is to let RSVP perform a binary comparison between
   the previously and newly returned objects after each call to traffic
   control (policy control) module to find out whether the object has
   changed.  This third option can be a short-term fix while we work on
   the proposed changes in RSVP interfaces to other modules.

5.4.  Error Message for Unknown Object

   The class number of TIMESTAMP object is of the form 0bbbbbbb.
   According to RFC 2205, the receiver of a message containing this
   object should send back an error message (PathErr or ResvErr
   depending on the original message) if it does not understand the
   object.  Upon receiving the error message, the sender should do the
   following things:

     1. stop sending digests to the receiver;
     2. withhold sending TIMESTAMP object in future messages until it
        receives from that neighbor a message containing a TIMESTAMP
        object;
     3. continue regular RSVP refreshes.

6.  RSVP State Re-synchronization

6.1.  Discovery of Inconsistency

   Two RSVP neighbors may become out-of-sync due to a number of reasons.

      - A state-changing RSVP message is lost, and the sender did not
        ask for ACK.
      - A neighbor crashed and lost part or all of its state.
      - Other unknown reasons.

   Receipt of a DigestErr message indicates inconsistency between two
   nodes.  The timestamp and digest value in the DigestErr message help
   the two neighbors to localize the problem.

draft-wang-rsvp-state-compression-00.txt                       [Page 19]

INTERNET-DRAFT                                                April 1999

      o If the timestamp acknowledged is smaller than the timestamp of
        the last Digest message sent (TimestampLastSent), this error
        message is for an obsolete message.  This message should be
        ignored since it may not represent the current state of the
        neighbor.
      o If they are equal, the node should start the recovery process
        described in the next section.

6.2.  Recovery

   To help state synchronization, each node needs to keep a log of the
   RSVP messages sent after the last synchronization point
   (TimestampLastAcked) for each neighbor.  We record the following
   information for a message:

      o message type
      o a flag indicating whether the message contains a TIMESTAMP
        object
      o timestamp of the message (if no TIMESTAMP object is present in
        the message, use the timestamp that could be assigned when the
        message is sent)
      o a flag indicating whether ACK is required for the message
      o a pointer to the corresponding RSVP state
      o objects required to reconstruct the message that cannot be
        obtained from the RSVP state (e.g. ERROR_SPEC).  Note that if
        the message is a teardown message, all the objects in the
        message should be stored here since the corresponding state has
        already been deleted.)

   When a node receives an ACK message, it first checks if the message
   acknowledges the latest Digest message.  If so, it should update
   TimestampLastAcked to the value carried by the TIMESTAMP object in
   the ACK message and remove all the RSVP messages sent before
   TimestampLastAcked. Otherwise, only the acked message needs to be
   removed. Any message rendered obsolete by a new RSVP message should
   be removed from the log.

   If the node detects any discrepancy with a neighbor, it can
   retransmit all the messages on the log followed by a Digest message.
   It may set the ACK_Requested flag in these messages and a message
   should be removed from the log if the corresponding ack message is
   received.  There is a tradeoff between the number of messages
   contained in the log and the bandwidth used to acknowledge messages.
   The more messages with ACK_Requested set, the less messages we may
   have on the log.  However, this requires more ACK messages sent from
   the neighbor.  In the extreme case where the log is full, older
   messages should be purged to accommodate new RSVP messages.

draft-wang-rsvp-state-compression-00.txt                       [Page 20]

INTERNET-DRAFT                                                April 1999

   If retransmission of the logged messages cannot lead to consistency,
   i.e.  the node still receives a DigestErr, regular refresh messages
   should be sent for the problematic part of the digest tree.  One
   possible cause of the problem is that some of the "synchronized"
   states were accidentally modified or erased by the neighbor, while
   RSVP messages for these states have already been deleted from the
   log.  Another possible scenario is that some messages were not
   received by the neighbor and they are removed from the log because of
   limited logging space.

   To find the states that are inconsistent, the node first compares the
   digest value in the DigestErr message with its own.  When it finds
   the first mismatching signature (call it S1), it sends a Digest
   message containing the signatures used to compute S1.  A DigestErr is
   expected for this Digest message since at least one of the children
   signatures does not match.  The node again looks for the first
   mismatching signature (S2) in the DigestErr message and sends the
   children of S2 in a Digest message.  This procedure is repeated until
   the leaf session (Sn) causing the problem is found.

   Following refreshing of Sn, the node sends the top-level digest to
   the neighbor.  If the inconsistency is caused by a single session,
   walking down the tree once can solve the problem.  Otherwise, the
   node will receive another DigestErr and it has to continue to
   identify the other problematic sessions by traversing the tree.

   There is a tradeoff between the latency of the recovery procedure and
   the transmission efficiency.  For example, if the tree has many
   levels, we need many RTTs to exchange the signatures to find the
   leaf-level sessions that contribute to the inconsistency.  However,
   if we have sufficient resources, we can stop at an intermediate level
   of the tree and refresh all the states represented by the mismatching
   signature at that level.

7.  MD5 Computation Cost

   RFC 1231 [MD5] provides a detailed description of the MD5 algorithm
   computation. The algorithm divides the message in 64-byte chunks and
   applies a four step process to each one of the 64-byte chunks. In
   each of the four steps, a number of bit-wise logical operations are
   applied to the 64-byte chunk. The results from the computation on the
   nth chunk are used as input for the computation of the (n+1)th chunk.
   The computation cost of MD5 is a linear function of the size of the
   message as measured in number of bytes.

   Given T total unicast RSVP sessions shared between two RSVP neighbors

draft-wang-rsvp-state-compression-00.txt                       [Page 21]

INTERNET-DRAFT                                                April 1999

   the time needed to create the digest is proportional to the size of
   the RSVP state in bytes. The node first computes the MD5 signature
   for each of all the sessions, then computes the MD5 signature for
   each of N session signatures. So on and so forth, this computation
   continues to build a forest of signature trees from bottom up, until
   the number of trees is less than or equal to N, the number of
   signatures that can be carried in one packet. The list of all the
   highest level signatures, the digest, should be carried in one IP
   packet without fragmentation.

   When the parameters in one of the sessions change, all the signatures
   on the path from that leaf to the root of the tree must be
   recomputed. The number of signatures in this path, is bounded by the
   height of the tree. Assuming T total sessions and an N-ary tree, this
   number is:

           Ceiling[log_N(T)]

   (log_N() is the logarithm of base N)

   For example with T=1000 and N=80 (80 MD5 signature is 1280 bytes),
   the number of signatures to recompute is 2.

   Insertion and deletion times will depend on the particular data
   structure used to implement the digest tree. In the N-ary tree
   described above the worst case time for insertion or deletion maybe
   O(T) since a insertion or deletion may cause a complete tree
   restructure. On the other hand, other data structures, such as the BB
   tree described in [BB1],[BB2],  have insertion and deletion times of
   O(log_2(T)). The tradeoff in this case is between the increase in the
   recomputation time (from log_N(T) to log_2(T)) and the decrease in
   the insertion/deletion time. State ordering after an arbitrary number
   of session insertions and deletions, provided that these changes are
   reliably delivered to both neighbors, is guaranteed if both neighbors
   start from a consistent state and use the same insertion/deletions
   algorithms.

   Neighboring RSVP nodes SHOULD use the same data structure to store
   the digest tree. otherwise even though RSVP state between two
   neighboring nodes is consistent the two neighbors will possibly
   compute different digests.

draft-wang-rsvp-state-compression-00.txt                       [Page 22]

INTERNET-DRAFT                                                April 1999

8.  Discussion

   Motivated by the requirements from using RSVP for MPLS traffic
   engineering, Berger et al have proposed a few extensions to the basic
   RSVP protocol [REDUCT] for two purposes.

     1. to assure prompt delivery of new RSVP control message by adding
        an acknowledgment option.
     2. to reduce the state refresh overhead by eliminating refresh
        messages.  Instead, a "Hello" exchange is added between RSVP
        neighbors to keep track of the liveness of each other.

   The overhead reduction by compression scheme proposed in this draft
   shares a similar ACK component with the scheme proposed in [REDUCT].
   Both schemes also face the same challenges posed by non-RSVP clouds
   and multicast RSVP sessions; when facing either of these conditions,
   both schemes resort back to the current RSVP refresh mechanism.

   However we believe that stopping refreshes entirely represents a
   fundamental departure from the soft-state approach adopted in RSVP
   design.  As we have argued at a few places in this draft, reliable
   RSVP message delivery alone is neither necessary nor sufficient for
   assuring consistent reservation state inside the network.  To keep
   the soft-state approach while making the protocol scale with the
   number of sessions, the proposed state compression mechanism pays the
   cost of computing and maintaining a tree of session signatures.

   When sessions or session parameters change, maintaining the tree of
   session signatures has a computational overhead that is proportional
   to the log of the total number of sessions (see a quick analysis in
   Section 7).  When large numbers of RSVP sessions exist in a network,
   we believe that the proposed scheme brings substantial performance
   gain from reduced refresh overhead, especially when most of the
   sessions are relatively stable.  However the feasibility of using
   this approach in a highly dynamic environment where sessions are
   constantly added and deleted remains an open issue.

9.  Compatibility

   The extensions introduced here are fully compatible with the existing
   version of  RSVP. The TIMESTAMP has an assigned value whose form is
   0bbbbbbb. The RSVP specification dictates that objects of this class
   should generate an error message when received by nodes that do not
   support them. An RSVP node that sends a message with a TIMESTAMP
   object and subsequently receives an "Unknown Object Class" error,
   MUST NOT send any more messages with attached TIMESTAMP object and

draft-wang-rsvp-state-compression-00.txt                       [Page 23]

INTERNET-DRAFT                                                April 1999

   MUST NOT start sending Digest objects but SHOULD use regular
   refreshes instead.

   Regarding Digest messages, a node SHOULD only start sending Digest
   messages only when it discovers that its particular peer is
   compression-capable using the procedure outlined in Section 3.4.
   Implementations that support the TIMESTAMP object, MUST also support
   the digest mechanism and vice-versa.

10.  IANA Considerations

   IANA must provide three new Msg Types for the three new messages
   defined in this draft: Digest, ACK and DigestErr. Also IANA must
   provide Class Numbers for the objects defined here. Specifically:

      The TIMESTAMP object requires a Class-Num value of the form
        0bbbbbbb.
      The DIGEST object requires a a Class-Num value of the form
        10bbbbbb.

11.  Security Considerations

   No new security issues are raised in this document.  See [RFC2205]
   for a general discussion on RSVP security issues.

12.  Acknowledgments

   The phrase "overhead reduction by state compression" was suggested by
   Van Jacobson of cisco.  Our design also benefited from discussion
   with Steve McCanne of UC Berkeley regarding the roles of
   acknowledgment in soft-state protocol design.

13.  References

   [BB1] R. Bayer. "Symmetric binary B-trees: Data structure and
   maintenance algorithms". Acta Informatica, 1:290-306, 1972

   [BB2] L. Guibas, R. Sedgewick, "A diocrhomatic framework for balance
   trees", In Proceedings of the 19th Annual Symposium on Foundations of
   Computer Science, pages 8-21. IEEE Computer Society, 1978.

draft-wang-rsvp-state-compression-00.txt                       [Page 24]

INTERNET-DRAFT                                                April 1999

   [REDUCT] Lou Berger et al, "RSVP Refresh Reduction Extensions",
   draft-berger-rsvp-refresh-reduct-00.txt, March 1999.

   [MD5]    R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321,
   April 1992.

   [RFC2205] Braden, R. Ed., Zhang, L., et al, "Resource ReserVation
   Protocol -- Version 1 Functional Specification", RFC 2205, September
   1997.

   [RFC2209] Braden, R., Zhang, L, "Resource ReSerVation Protocol (RSVP)
   -- Version 1 Message Processing Rules", RFC 2209, September 1997.

Appendix A.  Considerations for Multicast and Many-to-one Unicast
   Sessions

   In a multicast session or many-to-one unicast session, there may be
   multiple path states and reservation states in each session, so
   routers also need to order path and reservation states in the same
   way.  The order of path states can be determined by sender templates
   and the order of reservation states can be based on filterspecs and
   Rspecs.

   In addition to ordering path and reservation states, a router may
   need to keep information about the reservation request sent to each
   upstream neighbor if it is a merging or splitting point because
     1. the reservation request it received from a downstream neighbor
        may not be the same as the one sent to an upstream neighbor;
     2. currently reservation states only record reservation made on a
        link to a downstream neighbor;
     3. the sender of a digest has to compute the digest based on what
        flowspec and filterspec were sent to its upstream neighbor.

   For a multicast session, complication arises for neighbor discovery
   if a router is attached to a broadcast LAN.  First, there may be both
   compression-capable and compression-incapable downstream neighbors on
   the LAN.  In this case, both regular RSVP refreshes and digests
   should be sent to the same interface.  Furthermore, a router may not
   detect a change in the downstream neighbors when a downstream router
   on the broadcast LAN joins or leaves a group without affecting the
   list of outgoing interfaces of the associated RSVP states.  One
   solution for this problem is to change the RSVP/routing interface so
   that it reports a route change when the list of downstream neighbor
   routers changes.

   If a message with ACK request is sent to a multicast session over a

draft-wang-rsvp-state-compression-00.txt                       [Page 25]

INTERNET-DRAFT                                                April 1999

   multi-access interface (PATH and PATH-TEAR), the receiving node must
   add a random delay before sending the ACK to avoid ACK implosion.
   The random interval SHOULD be between zero (0) and a configured
   maximum time.  The configured maximum SHOULD be set in proportion to
   the refresh and retransmission interval.

14.  Authors' Addresses

      Lan Wang
      UCLA
      4805 Boelter Hall
      Los Angeles, CA 90095

      Phone:    310-267-2190
      Email:    lanw@cs.ucla.edu

      Andreas Terzis
      UCLA
      4677 Boelter Hall
      Los Angeles, CA 90095

      Phone:    310-267-2190
      Email:    terzis@cs.ucla.edu

      Lixia Zhang
      UCLA
      4531G Boelter Hall
      Los Angeles, CA  90095

      Phone:    310-825-2695
      EMail:    lixia@cs.ucla.edu

draft-wang-rsvp-state-compression-00.txt                       [Page 26]