Internet Draft
INTERNET-DRAFT Lan Wang
Expires: October 1999 Andreas Terzis
<draft-wang-rsvp-state-compression-00.txt> Lixia Zhang
UCLA
April 1999
A Proposal for reducing RSVP Refresh Overhead using State
Compression
<draft-wang-rsvp-state-compression-00.txt>
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
As a soft-state protocol, RSVP specifies that each RSVP node sends
periodic RSVP messages for each existing RSVP session. The overhead
due to such periodic refreshes goes up linearly with the number of
active RSVP sessions. One can reduce the overhead by using a longer
refresh period, which unfortunately leads to longer delays in re-
synchronizing RSVP state. To overcome the dilemma, this draft
describes the use of a "state-compression" approach to soft-state
protocol refreshes. In the absence of state changes this compression
mechanism has a constant overhead while retaining the soft-state
nature of RSVP.
draft-wang-rsvp-state-compression-00.txt [Page 1]
INTERNET-DRAFT April 1999
1. Introduction
As a soft-state setup protocol, RSVP specifies that each RSVP node
sends control messages for each active RSVP session to each of its
neighbors serving the same session. These control messages are sent
immediately whenever a change of state occurs, for example when a new
session starts, when the TSPEC or RSPEC parameters of an existing
session change, or when a route change occurs that affects the paths
of existing RSVP sessions. In addition, an RSVP node also sends
periodic refresh messages to its neighbors about all existing
sessions. Such periodic refreshes serve the purpose of keeping
consistent RSVP reservation state along data flow paths. There are
potentially a number of causes that can make the reservation state
between neighbor RSVP nodes out-of-synchronization. For example:
- An RSVP message that carries state-change information gets lost.
- A rare local event inadvertently changed RSVP state values.
- Other locally induced state changes occur, for example when the
current Kerberos key expires, a new key is obtained which needs
to be carried to the next hop by the POLICY_DATA object in the
next refresh.
Instead of attempting to identify all the potential causes of changes
and handle them individually, RSVP simply uses periodic refreshes to
persistently re-synchronize the reservation state along the path.
This approach assures consistent reservation state along the data
flow path, even when the routers do not have an exhaustive list of
all the possible causes of state inconsistency or how to correct them
individually.
The drawback of this simple approach is that its overhead grows
linearly with the number of active RSVP sessions. A common mechanism
to reduce the periodic refresh overhead is by using a longer refresh
period (example: RTCP, SDR announcements). A longer refresh period,
however, leads to longer delays in re-synchronizing state.
In this draft we propose a "compression" mechanism for refresh
overhead reduction. The goal of this scheme is to improve the
efficiency of soft-state protocols in their communication behavior,
not to further simplify RSVP. Rather, we applied *additional*
processing (compression) to the protocol state information so that
the number of refresh messages is reduced from being proportional to
the number of sessions to being proportional to the number of
neighbor nodes, while keeping the soft-state nature of RSVP. When a
state change is notified by explicit RSVP messages, or detected by
the periodic refreshes, the cost of updating the "compressed" state
is proportional to LOG(T) where T is the total number of RSVP
sessions. Section 7 presents a detailed overhead analysis of our
draft-wang-rsvp-state-compression-00.txt [Page 2]
INTERNET-DRAFT April 1999
compression mechanism.
2. Refresh Overhead Reduction by State Compression
Even in the absence of new control information generated by sources
or destinations, an RSVP node sends one message per refresh period to
its neighboring nodes for each of all the RSVP sessions to keep the
state consistency. Instead of sending one message per refresh period
per session, one way to reduce the overhead is to compress the state
of all RSVP sessions to a single signature, which can then be sent to
neighbor nodes in place of all the "raw" refresh messages. All we
need is an algorithm that can produce a one-to-one mapping from the
given set of RSVP state to a signature.
In this draft we propose to use the MD5 algorithm [MD5] for RSVP
state compression. As stated in [MD5], "it is computationally
infeasible to produce two messages having the same message digest, or
to produce any message having a given pre-specified target message
digest". We can therefore conclude, with a high level of assurance,
that no two sets of different RSVP states should result in the same
signature, and whenever the computed signatures at two neighboring
nodes differ, the two nodes will be able to detect the differences
from the states of shared RSVP sessions. Whenever a state change is
detected (the digests of two neighbors disagree), raw RSVP messages
are transmitted to re-synchronize the state.
As a further optimization, we also add an acknowledgment component
(ACK) to the RSVP protocol. The ACK is used for the following two
purposes.
1. To minimize the re-synchronization delay after an explicit state
change request. A node can request an ACK for each RSVP message
that carries state-change information, and promptly retransmit
the message until an acknowledgment is received.
2. To minimize the re-synchronization cost when inconsistent state
between RSVP neighboring nodes is detected but the exact
difference is unknown. A node should request an ACK for each
digest sent, and it should log all the state changes since the
last agreed-upon digest (which indicates that the two nodes have
consistent state at that point in time). When an inconsistency
is detected, the first recovery effort of a node is to send
those latest changes.
We propose to compute the MD5 signature in a structured way (see
Section 3.2), so that in cases where the retransmission of latest
changes does not result in state re-synchronization, two neighbors
draft-wang-rsvp-state-compression-00.txt [Page 3]
INTERNET-DRAFT April 1999
can efficiently locate the portion of inconsistent state and send raw
RSVP messages to re-synchronize.
2.1. Definitions
- Input Message
The input to the MD5 Algorithm. In this document an input
message is some portion of the RSVP state stored at the local
node.
- MD5 Signature
The result of the computation of the MD5 algorithm on an
input message.
- Digest
A set of MD5 Signatures that represents a compressed version
of the RSVP state shared between two neighboring RSVP nodes.
The following definitions are taken from RFC 2209.
- PSB
Path State Block. Each PSB holds path state for a particular
(session, sender) pair, defined by SESSION and
SENDER_TEMPLATE objects, respectively.
-RSB
Reservation State Block. Each RSB holds a reservation request
that arrived in a particular RESV message, corresponding to
the triple: (session, next hop, Filter_spec_list).
2.2. Limitations of the New Approach
The ability for two RSVP neighbors to exchange a digest instead of
raw RSVP messages relies on the assumption that the two nodes know
precisely all the RSVP sessions that go through these two nodes in
sequence. Whenever a unicast route change occurs, the upstream node
draft-wang-rsvp-state-compression-00.txt [Page 4]
INTERNET-DRAFT April 1999
must be able to detect the change and synchronize the state with the
new downstream node (as well as tearing down the session with the old
downstream neighbor). Similar actions need to be taken whenever a
change in multicast routing occurs, which includes not only route
changes but also multicast group membership changes (either addition
or deletion).
When two neighbor nodes of an RSVP session are directly connected,
RSVP can expect to receive notification of route changes through the
interface to the local routing module. When the two nodes are
interconnected through a non-RSVP cloud, however, there seems no
known way to reliably detect all routing changes without the per
session periodic refreshes.
Furthermore, due to the fact that each multicast RSVP session may
involve a different set of up- and down-stream neighbors than any
other multicast session, it is difficult to compute a digest for
aggregate multicast sessions. Thus this first attempt focuses on
refresh reduction for unicast RSVP sessions only.
Finally, because an RSVP session is identified by the destination
address, a unicast RSVP session can be a many-to-one session with
multiple senders, thus the reservation parameters may change when
crossing a merging point that has multiple upstream branches towards
different senders. Consequently, computing the digest for each RSVP
neighbor requires that the local node store data value that is not in
the current protocol specification. We further discuss the
difficulty in handling multicast and many-to-one unicast session, as
well as possible solutions, in appendix A.
Therefore this first draft of the refresh compression mechanism
limits the problem space to handling one-to-one unicast sessions
whose paths do not cross non-RSVP clouds. An RSVP node resorts to
the current refresh scheme for all the sessions that have the
"non_RSVP_cloud" bit set, or that have a multicast destination
address.
2.3. Benefits and Constraints of the New Scheme
The design of the new scheme aims at the following features
- efficient state re-synchronization when two RSVP neighbors
discover state inconsistency.
- choice of individual nodes to use either original RSVP refreshes
or the refresh reduction, or switch in between over time to
achieve the best tradeoff.
draft-wang-rsvp-state-compression-00.txt [Page 5]
INTERNET-DRAFT April 1999
- backward compatibility with the current RSVP specification.
- incremental digest computation when part of the session(s)
changes state.
We also recognize the following constraints of the new scheme.
- Generally speaking, individual RSVP sessions follow different
paths. However to reduce the refresh overhead to one refresh
message per refresh period per neighbor, a digest represents the
compressed state of aggregate sessions. The only way to do so
is to first group all RSVP sessions by each neighbor, then
compute the digest. That is, refreshing of compressed state
must be done on a per neighbor basis.
- In its current design, RSVP PATH messages carry the destination
address of the session, thus all refresh messages automatically
follow the latest unicast or multicast routing changes
(including multicast group membership changes). Although RSVP
has an interface to the local routing module for prompt
notification of all the changes to minimize the adaptation
delay, all RSVP reservations eventually converge on new paths
even in the absence of such notifications. When we move from
per session refresh to per hop refresh by digest, however, the
protocol loses its automatic routing adaptation. Thus the new
scheme relies entirely on specific notification to adjust to any
routing changes.
- To detect all the RSVP state changes, the digest must be
computed over the RSVP sessions state rather than over the
messages that are exchanged. Because those messages represent
partial, but not all causes of state changes, digests of the
messages alone are inadequate in detecting all the state
changes; two neighbor nodes may end up in inconsistent state
even when all RSVP messages are delivered correctly.
One challenge we encountered many times during proposing the
compression refresh mechanism is that when an RSVP node processes the
objects carried in a refresh message, this processing step may take
other factors as part of the input, and produce new object values to
be passed on to next neighbor; what is saved at the local node is the
value of the received objects. Change of RSPEC is a typical example
in a multicast RSVP session. The same issue exists even when we
limit our consideration to unicast sessions only; ADSPEC and
POLICY_DATA are two such objects whose new values sent to the next
hop may be different from that stored at the local node. To compare
digest between neighbor node pairs a node must keep the latest values
of all the objects it has sent to the neighbor. This requirement
leads to a necessary change to the current protocol specification.
draft-wang-rsvp-state-compression-00.txt [Page 6]
INTERNET-DRAFT April 1999
3. Digest Mechanism Description
3.1. State Organization for Digest Computation
RSVP maintains path state and reservation state for each active
session [RFC2205]. A session is uniquely identified by a session
object which contains the IP destination address, protocol ID and
optionally a destination port number of the associated data packets.
A Path State Block (PSB) is comprised of a sender template (i.e. IP
address and port number of the sender), a Tspec that describes the
sender's traffic characteristics, and possibly objects for policy
control and advertisements. A Reservation State Block (RSB) contains
filterspecs (i.e. sender templates) of the senders for which the
reservation is intended, the reservation style and a flowspec that
quantifies the reservation. It may also contain objects for policy
control and confirmation.
To "compress" a set of RSVP states into an MD5 signature, neighboring
RSVP routers must order their states using the same rules since
different orders of the states may result in different signatures.
We propose that sessions be ordered by the increasing order of IP
address, protocol ID and port number in session objects. Ordering
rules for multicast and many-to-one unicast sessions are described in
appendix A.
3.2. Structured Digest Computation
As the number of sessions increases, a node may experience state
changes more frequently and, with a flat structure, the overhead of
digest computation increases linearly with the number of sessions.
Therefore we need a scheme to facilitate partial state changes by
incremental MD5 computation.
As mentioned in section 3.1, a node orders all the RSVP sessions it
shares with each neighbor node. The node first computes a MD5
signature for each session. Table 1 shows the objects that should be
included in the digest computation. These objects should be fed into
the computation procedure in the order as presented in the table.
Although PSB and RSB also contain a few other fields such as incoming
interface and outgoing interfaces, those fields have only local
meaning to a specific router and therefore are excluded from the
digest computation.
draft-wang-rsvp-state-compression-00.txt [Page 7]
INTERNET-DRAFT April 1999
RSVP Structure || Objects to Include
_________________||_______________________________________
||
Session || session object
_________________||_______________________________________
||
PSB || sender template, sender tspec,
|| adspec, policy
_________________||_______________________________________
||
RSB || filterspec, flowspec, style, policy
_________________||_______________________________________
Table 1. Objects to Include in Digest Computation
Assuming that N is the number of MD5 signatures that can be packed in
one Digest message, a node constructs a digest tree for each of its
neighbors in the following way. First, it computes the signature of
each session, making these signatures as the leaf node of the tree.
If there are more than N leaf-level signatures, it groups each N
signatures in one block and compute a level-1 signature for the
group. With a total of T sessions, this grouping results in
Ceiling[T/N] level-1 signatures. If the number of level-1 signatures
is still larger than N, the node continues on to group each of N
level-1 signatures to compute a level-2 signature to get
Ceiling[T/N^2] level-2 signatures. Let Ci be the number of level-i
signatures, we repeat the grouping until Ci is smaller than or equal
to N.
o o o digest
/|\ /|\ /|\
ooo ooo ooo level-1 signature
/|\ ....../|\
o o o ....o o o leaf-level signatures
Figure 1. A Digest Tree with N=3
There are two ways to define the value for N. The first way is to
define it statically in the specification. One would like the
largest N value that can fit a digest in the link MTU, but the value
of MTU may be different at different places. A second approach is to
let two RSVP neighbor nodes negotiate N's value at startup time and
then keep it constant. This second approach may require new
handshake message exchanges between two neighbor RSVP nodes. In this
draft we assume a configured value for N.
draft-wang-rsvp-state-compression-00.txt [Page 8]
INTERNET-DRAFT April 1999
A digest needs to be recomputed when the following events occur:
o a new session is added;
o a session is changed, i.e. a state is added to a session, or a
state is modified or deleted from a session;
o an existing session is deleted.
The cost of digest computation is summarized in section 7.
3.3. Neighbor Data Structure
A router needs to compute two digests for each neighbor, one digest
for the states that are refreshed by messages from that neighbor and
one for the states that the local sends refresh messages to the
neighbor. We refer to the two digests InDigest and OutDigest,
respectively. The OutDigest is used in the Digest message and the
InDigest is sent in a DigestErr message to the corresponding
neighbor. Section 4 specifies the format of the new RSVP message
types proposed for our scheme.
To avoid having to traverse all the RSVP sessions to compute the
digest for a specific neighbor, a router maintains a linked list that
points to all the sessions shared with the neighbor. An example of
the per neighbor structure is shown in Figure 2.
The Neighbor structure holds all the information needed to send and
receive digests to/from a RSVP neighbor node. A description of each
field in the Neighbor structure follows:
IP Address
Holds the IP address of the interface this neighbor uses to
communicate with the local node.
OutSession
OutSession contains the sessions to be included in the computation
of the digest the local node is sending to this neighbor. The
sessions included are those that have PSBs whose next downstream
hop is the neighbor in question and RSBs that have that neighbor
as the immediate upstream hop.
OutDigest
The digest sent to this neighbor.
RefreshOutTimer
Contains the next time a digest should be sent to this neighbor.
draft-wang-rsvp-state-compression-00.txt [Page 9]
INTERNET-DRAFT April 1999
Session Struct Session Struct
^ ^
| |
Neighbor Struct | | SessDigest Struct
+----------------- + +-----------------+ +-----------------+
| IP Address | | Session* | | Session* |
+----------------- + +-----------------+ +-----------------+
| OutSession |--->| Next SessDigest |--->| Next SessDigest |---->...
+----------------- + +-----------------+ +-----------------+
| OutDigest | | P/R | | P/R |
+----------------- + +-----------------+ +-----------------+
| RefreshOutTimer | | MD5 Signature | | MD5 Signature |
+----------------- + +-----------------+ +-----------------+
| TimestampLastSent|
+------------------+
|TimestampLastAcked|
+------------------+
| OutDigestTimeout |
+------------------+
| InSession |---> Linked List of SessDigest structs
+------------------+
| InDigest |
+------------------+
| CleanupInTimer |
+------------------+
| ... |
+------------------+
Figure 2. Neighbor Structure
TimestampLastSent
Contains the timestamp of the latest digest sent to this neighbor.
TimestampLastAcked
This is the timestamp of the latest digest acknowledged by the
neighbor, i.e. the time instant when state, between the local node
and this neighbor, was last synchronized.
OutDigestTimeout
Contains the timeout period in milliseconds for digests sent to
this neighbor. If an acknowledgment is not received by the end of
this period, the digest should be retransmitted.
draft-wang-rsvp-state-compression-00.txt [Page 10]
INTERNET-DRAFT April 1999
InSession
A pointer to a linked list of SessDigest objects. This list
contains all the sessions that are refreshed by digests received
from this neighbor. Specifically, this includes sessions whose
PSBs have this neighbor as previous hop and sessions whose RSBs
have this neighbor as next hop.
InDigest
The digest expected to be received from this neighbor.
CleanupInTimer
If no digest is received from this neighbor by the time contained
in the CleanupInTimer field, all associated state should be
cleaned up.
The SessDigest structure contains information about a session that
has the neighbor in question as a Next Hop. The SessDigest structure
has the following fields:
Session*
A pointer to the portion of the RSVP state that holds information
for the session represented by this structure.
P/R
This field indicates whether the PSB or the RSB of a Session will
be used in the signature operation.
MD5 signature
Contains the MD5 signature for this session.
3.4. Neighbor Discovery
To use the compressed refresh scheme, a router needs to discover all
the neighbors that are capable of exchanging digests (compression-
capable). For this purpose, when a RSVP router starts sending (raw)
RSVP messages of a session, it should request that the neighbor(s)
acknowledge the messages by including a TIMESTAMP object with the
ACK_Requested flag on (see section 4). If the local router receives
an ACK message from a neighbor whose address is not currently on the
draft-wang-rsvp-state-compression-00.txt [Page 11]
INTERNET-DRAFT April 1999
Neighbor Structure list, it has discovered a new digest-capable
neighbor. If it receives an error message from a neighbor who does
not understand the new TIMESTAMP object, it discovers a neighbor that
is compression-incapable.
When a non-RSVP cloud exists between two RSVP neighbor nodes,
although they can discover each other using the acknowledgments
during initial message exchanges, the upstream node may not be able
to detect a change of the downstream neighbor caused by a unicast
route change. As shown in Figure 3, node A originally has a
downstream neighbor B for a particular unicast RSVP session. After a
route change, node C becomes A's downstream neighbor. However, since
A's outgoing interface is unchanged, A will not notice the route
change, hence it will continue to include the path state of sender S
when calculating the digest to B. Node B will not be aware of the
change either as long as A sends B the same digest. C may never get
a path message from A. Therefore, the digest scheme cannot be used
crossing non-RSVP clouds until an effective way of detecting route
changes is found. RSVP routers can detect the existence of non-RSVP
clouds between neighbors by comparing the TTL value in the IP header
with the Send_TTL in the RSVP common header, as described in
[RFC2209].
original route
___________ --->
-> ->| |-- B ----| ->
S - - --- A --| non-RSVP | |------- D
| cloud | |
|___________|-- C ----|
new route --->
Figure 3. An RSVP Session with a non-RSVP cloud
3.5. Digest Refreshes
RSVP states are refreshed by regular RSVP messages as well as by
digest Messages. However, different from a regular RSVP message
which refreshes the state of one RSVP session, a digest message may
carry multiple MD5 signature values, each representing the compressed
state of multiple sessions. If an individual signature in the digest
matches the corresponding local value, all the RSVP states covered by
that signature should be refreshed. That is the local node should
carry out the same operation as if one refresh message for each of
all the covered sessions is received.
draft-wang-rsvp-state-compression-00.txt [Page 12]
INTERNET-DRAFT April 1999
The refresh rate of digest messages can be different from the refresh
rates of individual sessions covered by that digest. For digest
messages, the default refresh rate is 30 seconds as RFC 2205
suggests. If a different refresh rate is used, it SHOULD be carried
in the TIME_VALUES object of the digest message. After a digest is
received, all sessions covered by this digest MUST be refreshed at
this rate.
4. New RSVP Message Types and Objects
4.1. TIMESTAMP Object
TIMESTAMP Class = Value to be assigned by IANA of form 0bbbbbbb
TIMESTAMP object
Class = TIMESTAMP Class, C_Type = 1
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Flags | reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Flags: 8 bits
0x80 = ACK_Requested flag
Indicates that the sender is requesting a message acknowledgment.
Acknowledgments MUST be silently ignored when they are sent in
response to messages whose ACK_Requested flag is not set. This
flag MUST be set when the message is a Digest message. It MUST
not be set in an ACK or DigestErr message.
Reserved: 24 bits
This field is reserved. It MUST be set to zero on transmission
and MUST be ignored on receipt.
draft-wang-rsvp-state-compression-00.txt [Page 13]
INTERNET-DRAFT April 1999
Timestamp: 32 bits
It is the middle 32 bits (lower 16 bits of the integer part and
higher 16 bits of the fractional part) of the NTP time value when
the message is sent. If NTP time is not available, system timer
can also be used. When combined with the message generator's IP
address, it uniquely identifies a message.
We chose to use a timestamp to identify a message because it is
always increasing and so a sender does not have to check if the value
is in use or has been used before. It also helps the receiver to
identify which of the RSVP messages for the same state is the most
recent one. However, depending on the processing speed of the router
and the existing timer granularity, we may have two consecutive
messages with the same timestamp value. Therefore, we define the
timestamp to be max(current_timestamp, last_used_timestamp+1), where
last_used_timestamp is the timestamp of the latest RSVP message that
contains a TIMESTAMP object.
4.2. DIGEST Object
DIGEST Class = Value to be assigned by IANA of form 10bbbbbb
DIGEST object
Class = DIGEST Class, C_Type = 1
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Level | Group |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | Number of Signatures |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
// signature (16 bytes) //
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Level: 8 bits
This is the level of this set of signatures in the corresponding
digest tree. The level of a signature is defined as follows:
1. Leaf signatures are assigned level 0;
draft-wang-rsvp-state-compression-00.txt [Page 14]
INTERNET-DRAFT April 1999
2. The level of a non-leaf signature is (1 + the level of its
immediate children).
Group: 24 bits
This is the group this set of signatures belong to (recall we
divide signatures into groups of size N and compute a signature
for each group to build a tree). We number the groups at the same
level of a digest tree increasingly from left to right and the
left-most group is assigned group 0. Level and group uniquely
identify the location of a set of signatures in a digest tree.
Reserved: 16 bits
This field is reserved. It MUST be set to zero on transmission
and MUST be ignored on receipt.
Number of Signatures: 16 bits
Number of MD5 signatures contained in this object.
Signature:
a 16-byte MD5 signature.
4.3. Digest Message
The format of a Digest message is as follows:
::= [ ]
[ ]
A Digest message has a common RSVP header with the message type set
to 14. It MUST contain a TIMESTAMP object with the ACK_Requested
flag set and a digest object. The TIME_VALUES object may be included
so that a router can inform its neighbor what timer value to use to
time out states when using a digest scheme.
draft-wang-rsvp-state-compression-00.txt [Page 15]
INTERNET-DRAFT April 1999
4.4. ACK Message
The format of an ACK message is as follows:
::= [ ]
[ ... ]
An ACK message has a message type of 15. It MUST contain at least
one TIMESTAMP object. With multiple TIMESTAMP objects, the ACK
message can be used to acknowledge multiple messages.
4.5. DigestErr Message
A DigestErr message has the following format:
::= [ ]
The message type of DigestErr message is 16. It MUST contain a
TIMESTAMP object and a DIGEST object. A DigestErr message acts as a
negative acknowledgment to a Digest message. The TIMESTAMP object
identifies the Digest message being acknowledged. The DIGEST object
carries the local digest value of the sender of the DigestErr
message.
5. RSVP Message Operations
5.1. Non-digest RSVP Messages
A node sends non-digest RSVP messages in the same way as before, with
the new option of asking for an ACK. To request an acknowledgment,
the node sends the message with a TIMESTAMP object and sets the
ACK_Requested bit in the object. It also sets a retransmission
timer. When the neighbor receives a message that requires
acknowledgment, it processes the message as usual and sends back an
ACK message which contains the timestamp of the message being
acknowledged. If there is any change in the state, it then updates
the signature.
If no ACK is received after timeout, the node retransmits the message
draft-wang-rsvp-state-compression-00.txt [Page 16]
INTERNET-DRAFT April 1999
up to a configured number of times. The retransmission period SHOULD
be shorter than regular refresh period and be proportional to the
round-trip time between the two neighboring nodes. Each
retransmission MUST carry the same timestamp as that in the original
message. If the corresponding state is changed before an ACK is
received, the original message becomes obsolete and no longer needs
to be retransmitted. If changing of the state triggers a new
message, the timestamp of the new message should be
max(current_timestamp, last_used_timestamp + 1). If the node does not
receive an ACK after all the retransmissions, this message needs to
be retransmitted later in the recovery process (see section 6.2).
5.2. Digest Message
Once a compression-capable node discovers a compression-capable
neighbor, instead of sending one refresh message for each PATH/RESV
state, it sends a digest, in one RSVP message per refresh period to
each of the neighbors. Special treatments for RSVP states containing
ADSPEC and POLICY_DATA objects are discussed in section 5.3. Each
digest MUST carry a timestamp object with the ACK_Requested flag set.
When the neighbor receives the Digest message, it first locates the
corresponding set of signatures in the local digest tree and then
performs a binary comparison between each signature in the message
and that in the local digest tree.
o If any of the signatures does not match the local one, it should
respond with a DigestErr message containing the local digest
value after processing the Digest message. Note that if there
is any matching signature, RSVP states represented by that
signature should be refreshed.
o Otherwise, it is in a consistent state with the sender, so it
sends an ACK message and refreshes all the states shared with
the sender.
The retransmission of a Digest message is handled in the same way as
non-digest messages. However, as in the original RSVP design where
an RSVP node never stops sending refresh messages for each active
session, a node should not stop sending digest refreshes even when a
previous one failed to receive an acknowledgment. If a neighbor
node crashed and becomes alive again, it will find the digest value
different from its own and the two routers will start
resynchronization step. When the digest value is changed, any
pending retransmission of the obsolete Digest message should be
canceled.
draft-wang-rsvp-state-compression-00.txt [Page 17]
INTERNET-DRAFT April 1999
5.3. Refresh of ADSPEC and POLICY_DATA
An RSVP message may carry an ADSPEC object that contains resource
information used by receivers to predict the end-to-end service.
This object is updated hop-by-hop to gather information along a
route. Since it is opaque to RSVP, RSVP simply records the received
ADSPEC in the corresponding state and, whenever it needs to send a
refresh message for a state, it calls the traffic control module with
the recorded ADSPEC to get an updated ADSPEC object. Therefore, the
ADSPEC object sent by a node may be different from that received by
the node. POLICY_DATA object is handled in the same way except that
it is submitted to the policy control module.
To handle ADSPEC or POLICY_DATA objects correctly with the digest
scheme, we need to solve the following two problems. First, for each
session with ADSPEC or POLICY_DATA object, we must trigger periodic
updating of ADSPEC or POLICY_DATA objects. Second, since ADSPEC and
POLICY_DATA are opaque to RSVP, RSVP cannot tell whether the objects
returned by the traffic control (or policy control) module changes
after each call.
For the first problem, we treat digest refreshes the same way as
regular RSVP refresh messages. In other words, before a node sends a
digest, the node calls corresponding modules to update these objects.
There are three possible solutions to the second problem. First, a
node can simply assume that the object returned by the traffic
control (or policy control) module changes after each call at refresh
period, which represents a change in RSVP state and hence a regular
RSVP PATH state should be sent to carry the new object value to the
next hop. This approach requires no changes to RSVP specification
but it essentially falls back to the original RSVP refresh mechanism
whenever ADSPEC or POLICY_DATA object exists in an RSVP session.
The second option is to modify the RSVP/traffic control and
RSVP/policy control interfaces so that they return a change bit that
indicates whether the updated object is different from the previously
returned one. The previously returned object can be an argument
passed to the interface. For example, the original RSVP/traffic
control interface for ADSPEC is
Call: TC_Advertise( Interface, Adspec,
Non_RSVP_Hop_flag ) -> New_Adspec,
where Adspec is the ADSPEC from received RSVP messages. We can
change it to
draft-wang-rsvp-state-compression-00.txt [Page 18]
INTERNET-DRAFT April 1999
Call: TC_Advertise( Interface, Adspec, Old_Adspec
Non_RSVP_Hop_flag ) -> New_Adspec, change bit,
and Old_Adspec is the ADSPEC object returned by the previous call.
When the changed bit is set, a regular RSVP refresh must be sent.
The third option is to let RSVP perform a binary comparison between
the previously and newly returned objects after each call to traffic
control (policy control) module to find out whether the object has
changed. This third option can be a short-term fix while we work on
the proposed changes in RSVP interfaces to other modules.
5.4. Error Message for Unknown Object
The class number of TIMESTAMP object is of the form 0bbbbbbb.
According to RFC 2205, the receiver of a message containing this
object should send back an error message (PathErr or ResvErr
depending on the original message) if it does not understand the
object. Upon receiving the error message, the sender should do the
following things:
1. stop sending digests to the receiver;
2. withhold sending TIMESTAMP object in future messages until it
receives from that neighbor a message containing a TIMESTAMP
object;
3. continue regular RSVP refreshes.
6. RSVP State Re-synchronization
6.1. Discovery of Inconsistency
Two RSVP neighbors may become out-of-sync due to a number of reasons.
- A state-changing RSVP message is lost, and the sender did not
ask for ACK.
- A neighbor crashed and lost part or all of its state.
- Other unknown reasons.
Receipt of a DigestErr message indicates inconsistency between two
nodes. The timestamp and digest value in the DigestErr message help
the two neighbors to localize the problem.
draft-wang-rsvp-state-compression-00.txt [Page 19]
INTERNET-DRAFT April 1999
o If the timestamp acknowledged is smaller than the timestamp of
the last Digest message sent (TimestampLastSent), this error
message is for an obsolete message. This message should be
ignored since it may not represent the current state of the
neighbor.
o If they are equal, the node should start the recovery process
described in the next section.
6.2. Recovery
To help state synchronization, each node needs to keep a log of the
RSVP messages sent after the last synchronization point
(TimestampLastAcked) for each neighbor. We record the following
information for a message:
o message type
o a flag indicating whether the message contains a TIMESTAMP
object
o timestamp of the message (if no TIMESTAMP object is present in
the message, use the timestamp that could be assigned when the
message is sent)
o a flag indicating whether ACK is required for the message
o a pointer to the corresponding RSVP state
o objects required to reconstruct the message that cannot be
obtained from the RSVP state (e.g. ERROR_SPEC). Note that if
the message is a teardown message, all the objects in the
message should be stored here since the corresponding state has
already been deleted.)
When a node receives an ACK message, it first checks if the message
acknowledges the latest Digest message. If so, it should update
TimestampLastAcked to the value carried by the TIMESTAMP object in
the ACK message and remove all the RSVP messages sent before
TimestampLastAcked. Otherwise, only the acked message needs to be
removed. Any message rendered obsolete by a new RSVP message should
be removed from the log.
If the node detects any discrepancy with a neighbor, it can
retransmit all the messages on the log followed by a Digest message.
It may set the ACK_Requested flag in these messages and a message
should be removed from the log if the corresponding ack message is
received. There is a tradeoff between the number of messages
contained in the log and the bandwidth used to acknowledge messages.
The more messages with ACK_Requested set, the less messages we may
have on the log. However, this requires more ACK messages sent from
the neighbor. In the extreme case where the log is full, older
messages should be purged to accommodate new RSVP messages.
draft-wang-rsvp-state-compression-00.txt [Page 20]
INTERNET-DRAFT April 1999
If retransmission of the logged messages cannot lead to consistency,
i.e. the node still receives a DigestErr, regular refresh messages
should be sent for the problematic part of the digest tree. One
possible cause of the problem is that some of the "synchronized"
states were accidentally modified or erased by the neighbor, while
RSVP messages for these states have already been deleted from the
log. Another possible scenario is that some messages were not
received by the neighbor and they are removed from the log because of
limited logging space.
To find the states that are inconsistent, the node first compares the
digest value in the DigestErr message with its own. When it finds
the first mismatching signature (call it S1), it sends a Digest
message containing the signatures used to compute S1. A DigestErr is
expected for this Digest message since at least one of the children
signatures does not match. The node again looks for the first
mismatching signature (S2) in the DigestErr message and sends the
children of S2 in a Digest message. This procedure is repeated until
the leaf session (Sn) causing the problem is found.
Following refreshing of Sn, the node sends the top-level digest to
the neighbor. If the inconsistency is caused by a single session,
walking down the tree once can solve the problem. Otherwise, the
node will receive another DigestErr and it has to continue to
identify the other problematic sessions by traversing the tree.
There is a tradeoff between the latency of the recovery procedure and
the transmission efficiency. For example, if the tree has many
levels, we need many RTTs to exchange the signatures to find the
leaf-level sessions that contribute to the inconsistency. However,
if we have sufficient resources, we can stop at an intermediate level
of the tree and refresh all the states represented by the mismatching
signature at that level.
7. MD5 Computation Cost
RFC 1231 [MD5] provides a detailed description of the MD5 algorithm
computation. The algorithm divides the message in 64-byte chunks and
applies a four step process to each one of the 64-byte chunks. In
each of the four steps, a number of bit-wise logical operations are
applied to the 64-byte chunk. The results from the computation on the
nth chunk are used as input for the computation of the (n+1)th chunk.
The computation cost of MD5 is a linear function of the size of the
message as measured in number of bytes.
Given T total unicast RSVP sessions shared between two RSVP neighbors
draft-wang-rsvp-state-compression-00.txt [Page 21]
INTERNET-DRAFT April 1999
the time needed to create the digest is proportional to the size of
the RSVP state in bytes. The node first computes the MD5 signature
for each of all the sessions, then computes the MD5 signature for
each of N session signatures. So on and so forth, this computation
continues to build a forest of signature trees from bottom up, until
the number of trees is less than or equal to N, the number of
signatures that can be carried in one packet. The list of all the
highest level signatures, the digest, should be carried in one IP
packet without fragmentation.
When the parameters in one of the sessions change, all the signatures
on the path from that leaf to the root of the tree must be
recomputed. The number of signatures in this path, is bounded by the
height of the tree. Assuming T total sessions and an N-ary tree, this
number is:
Ceiling[log_N(T)]
(log_N() is the logarithm of base N)
For example with T=1000 and N=80 (80 MD5 signature is 1280 bytes),
the number of signatures to recompute is 2.
Insertion and deletion times will depend on the particular data
structure used to implement the digest tree. In the N-ary tree
described above the worst case time for insertion or deletion maybe
O(T) since a insertion or deletion may cause a complete tree
restructure. On the other hand, other data structures, such as the BB
tree described in [BB1],[BB2], have insertion and deletion times of
O(log_2(T)). The tradeoff in this case is between the increase in the
recomputation time (from log_N(T) to log_2(T)) and the decrease in
the insertion/deletion time. State ordering after an arbitrary number
of session insertions and deletions, provided that these changes are
reliably delivered to both neighbors, is guaranteed if both neighbors
start from a consistent state and use the same insertion/deletions
algorithms.
Neighboring RSVP nodes SHOULD use the same data structure to store
the digest tree. otherwise even though RSVP state between two
neighboring nodes is consistent the two neighbors will possibly
compute different digests.
draft-wang-rsvp-state-compression-00.txt [Page 22]
INTERNET-DRAFT April 1999
8. Discussion
Motivated by the requirements from using RSVP for MPLS traffic
engineering, Berger et al have proposed a few extensions to the basic
RSVP protocol [REDUCT] for two purposes.
1. to assure prompt delivery of new RSVP control message by adding
an acknowledgment option.
2. to reduce the state refresh overhead by eliminating refresh
messages. Instead, a "Hello" exchange is added between RSVP
neighbors to keep track of the liveness of each other.
The overhead reduction by compression scheme proposed in this draft
shares a similar ACK component with the scheme proposed in [REDUCT].
Both schemes also face the same challenges posed by non-RSVP clouds
and multicast RSVP sessions; when facing either of these conditions,
both schemes resort back to the current RSVP refresh mechanism.
However we believe that stopping refreshes entirely represents a
fundamental departure from the soft-state approach adopted in RSVP
design. As we have argued at a few places in this draft, reliable
RSVP message delivery alone is neither necessary nor sufficient for
assuring consistent reservation state inside the network. To keep
the soft-state approach while making the protocol scale with the
number of sessions, the proposed state compression mechanism pays the
cost of computing and maintaining a tree of session signatures.
When sessions or session parameters change, maintaining the tree of
session signatures has a computational overhead that is proportional
to the log of the total number of sessions (see a quick analysis in
Section 7). When large numbers of RSVP sessions exist in a network,
we believe that the proposed scheme brings substantial performance
gain from reduced refresh overhead, especially when most of the
sessions are relatively stable. However the feasibility of using
this approach in a highly dynamic environment where sessions are
constantly added and deleted remains an open issue.
9. Compatibility
The extensions introduced here are fully compatible with the existing
version of RSVP. The TIMESTAMP has an assigned value whose form is
0bbbbbbb. The RSVP specification dictates that objects of this class
should generate an error message when received by nodes that do not
support them. An RSVP node that sends a message with a TIMESTAMP
object and subsequently receives an "Unknown Object Class" error,
MUST NOT send any more messages with attached TIMESTAMP object and
draft-wang-rsvp-state-compression-00.txt [Page 23]
INTERNET-DRAFT April 1999
MUST NOT start sending Digest objects but SHOULD use regular
refreshes instead.
Regarding Digest messages, a node SHOULD only start sending Digest
messages only when it discovers that its particular peer is
compression-capable using the procedure outlined in Section 3.4.
Implementations that support the TIMESTAMP object, MUST also support
the digest mechanism and vice-versa.
10. IANA Considerations
IANA must provide three new Msg Types for the three new messages
defined in this draft: Digest, ACK and DigestErr. Also IANA must
provide Class Numbers for the objects defined here. Specifically:
The TIMESTAMP object requires a Class-Num value of the form
0bbbbbbb.
The DIGEST object requires a a Class-Num value of the form
10bbbbbb.
11. Security Considerations
No new security issues are raised in this document. See [RFC2205]
for a general discussion on RSVP security issues.
12. Acknowledgments
The phrase "overhead reduction by state compression" was suggested by
Van Jacobson of cisco. Our design also benefited from discussion
with Steve McCanne of UC Berkeley regarding the roles of
acknowledgment in soft-state protocol design.
13. References
[BB1] R. Bayer. "Symmetric binary B-trees: Data structure and
maintenance algorithms". Acta Informatica, 1:290-306, 1972
[BB2] L. Guibas, R. Sedgewick, "A diocrhomatic framework for balance
trees", In Proceedings of the 19th Annual Symposium on Foundations of
Computer Science, pages 8-21. IEEE Computer Society, 1978.
draft-wang-rsvp-state-compression-00.txt [Page 24]
INTERNET-DRAFT April 1999
[REDUCT] Lou Berger et al, "RSVP Refresh Reduction Extensions",
draft-berger-rsvp-refresh-reduct-00.txt, March 1999.
[MD5] R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321,
April 1992.
[RFC2205] Braden, R. Ed., Zhang, L., et al, "Resource ReserVation
Protocol -- Version 1 Functional Specification", RFC 2205, September
1997.
[RFC2209] Braden, R., Zhang, L, "Resource ReSerVation Protocol (RSVP)
-- Version 1 Message Processing Rules", RFC 2209, September 1997.
Appendix A. Considerations for Multicast and Many-to-one Unicast
Sessions
In a multicast session or many-to-one unicast session, there may be
multiple path states and reservation states in each session, so
routers also need to order path and reservation states in the same
way. The order of path states can be determined by sender templates
and the order of reservation states can be based on filterspecs and
Rspecs.
In addition to ordering path and reservation states, a router may
need to keep information about the reservation request sent to each
upstream neighbor if it is a merging or splitting point because
1. the reservation request it received from a downstream neighbor
may not be the same as the one sent to an upstream neighbor;
2. currently reservation states only record reservation made on a
link to a downstream neighbor;
3. the sender of a digest has to compute the digest based on what
flowspec and filterspec were sent to its upstream neighbor.
For a multicast session, complication arises for neighbor discovery
if a router is attached to a broadcast LAN. First, there may be both
compression-capable and compression-incapable downstream neighbors on
the LAN. In this case, both regular RSVP refreshes and digests
should be sent to the same interface. Furthermore, a router may not
detect a change in the downstream neighbors when a downstream router
on the broadcast LAN joins or leaves a group without affecting the
list of outgoing interfaces of the associated RSVP states. One
solution for this problem is to change the RSVP/routing interface so
that it reports a route change when the list of downstream neighbor
routers changes.
If a message with ACK request is sent to a multicast session over a
draft-wang-rsvp-state-compression-00.txt [Page 25]
INTERNET-DRAFT April 1999
multi-access interface (PATH and PATH-TEAR), the receiving node must
add a random delay before sending the ACK to avoid ACK implosion.
The random interval SHOULD be between zero (0) and a configured
maximum time. The configured maximum SHOULD be set in proportion to
the refresh and retransmission interval.
14. Authors' Addresses
Lan Wang
UCLA
4805 Boelter Hall
Los Angeles, CA 90095
Phone: 310-267-2190
Email: lanw@cs.ucla.edu
Andreas Terzis
UCLA
4677 Boelter Hall
Los Angeles, CA 90095
Phone: 310-267-2190
Email: terzis@cs.ucla.edu
Lixia Zhang
UCLA
4531G Boelter Hall
Los Angeles, CA 90095
Phone: 310-825-2695
EMail: lixia@cs.ucla.edu
draft-wang-rsvp-state-compression-00.txt [Page 26]