Internet Draft
MPLS Working Group                                   Philip Matthews
INTERNET-DRAFT                                       Nortel Networks
Expiration Date: August 2000                           February 2000


         LDP/CR-LDP Session Reestablishment -- I'll Be Back
                <draft-matthews-mpls-ldp-ibb-00.txt>

Status of this Memo

This document is an Internet-Draft and is in full conformance with
all provisions of section 10 of RFC 2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups.  Note that
other groups may also distribute working documents as Internet-
Drafts.

Internet-Drafts are draft documents valid for a maximum of six 
months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.

Abstract

This contribution proposes modifications to the LDP and CR-LDP 
protocols that allow an LDP or CR-LDP session to be reestablished 
using a new TCP connection if the old TCP connection goes down 
unexpectedly. It also proposes that, in certain situations, an LSR 
continue to use the label bindings associated with a session for a 
short time after the session goes down, to allow forwarding to 
continue uninterrupted while the two peer LSRs attempt to 
reestablish the session. These modifications allow an LSR to easily 
implement hitless software upgrades and hitless activity switches.


Conventions used in this document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119.






Matthews                  Expires August 2000               [Page 1]

Internet-Draft          Session Reestablishment        February 2000


1. Introduction

Many recent router architectures decouple the control plane from the 
data plane, so that packet forwarding can continue even if the 
control software gets interrupted. One source of interruptions 
occurs during control switches; for example, when a router switches 
to a new version of the control software, or switches to a backup 
control processor in a control redundant system. It is possible to 
design a router to make these interruptions very brief, however, the 
nature of the TCP protocol is such that it is difficult to keep a 
TCP connection up across a control switch.

The current specification of the LDP and CR-LDP protocols ([LDP] and 
[CR-LDP]) state that if the TCP connection associated with an LDP or 
CR-LDP session goes down, then the session itself is terminated and 
all label bindings are discarded. For that reason, it is difficult 
today to build an LSR which can keep its LDP and CR-LDP sessions up 
across a control switch.

This contribution proposes modifications to the LDP and CR-LDP 
protocols that allow an LDP or CR-LDP session to be reestablished 
using a new TCP connection if the old TCP connection goes down. It 
also proposes that, in certain situations, an LSR continue to use 
the label bindings associated with a session for a short time after 
the session goes down, to allow forwarding to continue uninterrupted 
while the two peer LSRs attempt to reestablish the session. These 
changes allow a router to undergo a control switch with minimal 
disruption to the surrounding network.

This contribution proposes that the two peer LSRs negotiate at 
session establishment time whether they wish to allow the session to 
be restarted or not. If this capability is not agreed to, then the 
session operates as specified in [LDP] and [CR-LDP], and the new 
procedures described here are not used. The negotiation procedure is 
such that an LSR which implements these modifications can establish 
a session with a peer without any a priori knowledge of whether the 
peer supports these new procedures or not.

2. Overview of the Method

Say X and Y are two peer LSRs. When X and Y first establish an LDP 
or CR-LDP session, they include a new TLV, the Session 
Reestablishment Capability TLV, in the Initialization messages they 
exchange to negotiate the use of the procedures described in this 
draft.

Once Session Reestablishment Capability has been negotiated, the two 
peers use the message id field present in all LDP and CR-LDP 
messages to track those messages that have been sent to their peer 
LSR but not yet processed. To enable them to do this, the two LSRs 


Matthews                  Expires August 2000               [Page 2]

Internet-Draft          Session Reestablishment        February 2000


treat the message id field as a 32-bit unsigned sequence number, 
incrementing it by one with each new message sent, and rolling it 
over to 0 after 2**31 - 1 is reached. This form of message id 
allocation is not required by the base LDP and CR-LDP specifications 
[LDP] and [CR-LDP], but is required by the procedures described in 
this draft.

Now say LSR X sends a message M to LSR Y. After it does so, X 
remembers that it has sent message M against the eventuality that 
TCP connection carrying M may be broken before Y receives the 
message.

When Y receives M, then it first processes the message according to 
the normal LDP or CR-LDP procedures. Y also records its new state in 
some manner that allows the state to be remembered across a session 
restart event. (For example, it may write the new state into non-
volatile memory).

LSR Y then acks message M by using a new TLV, the Message Ack TLV, 
which contains the message id that X assigned to M. This Message Ack 
TLV is piggybacked on some message that Y happens to be sending back 
to X.

When X receives the ack, it knows that message M has been processed, 
so it can now discard the record it kept of M.

Now say some event happens that causes the TCP connection to drop. 
For example, Y might have control redundancy enabled and experience 
an activity switch. In this case, neither X or Y have any prior 
warning of the event. Alternatively, Y may be undergoing a software 
upgrade. In this case, Y may be able to shutdown the LDP session 
gracefully by sending a Notification message to X containing a new 
status code, the I'll-Be-Back status code, which indicates that Y 
hopes to reestablish the LDP or CR-LDP session shortly. In either 
case, Y is able to continue forwarding labelled packets without 
interruption (or with only a very brief interruption). 

To reestablish the session, they first establish a new TCP 
connection, and then exchange Initialization messages. These 
Initialization messages contain a new TLV, the Want To Reestablish 
TLV, which indicates the willingness of each peer to reestablish the 
previous LDP or CR-LDP session. In the Initialization message sent 
by X, the Want To Reestablish TLV contains the message id of the 
last message that X managed to receive and process from Y before the 
old TCP connection went down. Similarly, the Initialization message 
from Y includes a Want To Reestablish TLV giving the message id of 
the last message that Y had received and processed from X.

Once Initialization messages have been successfully exchanged, the 
session has been reestablished. At this point, both peers know 


Matthews                  Expires August 2000               [Page 3]

Internet-Draft          Session Reestablishment        February 2000


precisely which messages were sent but not received, and can resend 
the missed messages. However, an LSR is not forced to send the 
missed messages in the precise way that they sent originally: it is 
free to send whatever messages it wishes to in whatever order it 
wishes to. 

If the session is not reestablished, either because Y does not 
recover from the event, or because X and Y decide not to reestablish 
the session for some reason, then after a short interval X and Y 
both discard the label bindings associated with the session.

3. New TLVs and Status Codes

The following subsections describe the new TLVs introduced by the 
proposed method.

3.1 Session Reestablishment Capability TLV

The Session Restart Capability TLV can appear in the Initialization 
message to indicate willingness to follow the procedures described 
in this draft.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|0|    Session Reestab Cap    |      Length                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Reserved                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Max Session Down Interval                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Reserved
This field must be set to zeros on transmission, and ignored on 
reception. (Future enhancements to this procedure might use this 
field for flags or other purposes).

Max Session Down Interval
The maximum interval (in milliseconds) this LSR is willing to 
allow between the time it determines the TCP connection is broken 
and the time it determines the session has been successfully 
reestablished. Note that both ends propose Max Session Down 
Intervals -- the actual value is the minimum of the two proposed 
values. 

The Session Reestablishment Capability TLV is an "optional" TLV 
according to the terminology of [LDP] and [CR-LDP]. It MUST appear 
only in the Initialization message and only when the LSR wishes to 
use the procedures described in this draft.


Matthews                  Expires August 2000               [Page 4]

Internet-Draft          Session Reestablishment        February 2000


Because it is an optional TLV, the TLV has the U bit set to indicate 
that it should be ignored if it is not understood. This allows an 
LSR to propose the use of these procedures, but revert easily to 
standard [LDP] or [CR-LDP] operation if its peer does not understand 
the TLV. (See the procedures section below.)

3.2 Want To Reestablish TLV

The Want To Reestablish TLV can appear in the Initialization message 
to indicate willingness to reestablish a previous an [LDP] or [CR-
LDP] session that had been prematurely terminated.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|0|   Want To Reestablish     |      Length                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Reserved                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Last Message ID Processed                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Reserved
This field must be set to zeros on transmission, and ignored on 
reception. (Future enhancements to this procedure might use this 
field for flags or other purposes).

Last Message ID Processed
The ID of the last message which the sending LSR received and 
processed from the receiving LSR. The sending LSR may have 
received later messages from the receiving LSR, but the sending 
LSR did not complete processing of them and thus does not remember 
them. 

The Want To Reestablish TLV is an "optional" TLV according to the 
terminology of [LDP] and [CR-LDP]. It MUST appear only in the 
Initialization message and only when the LSR wishes to restart a 
session using the procedures described in this draft.

Because it is an optional TLV, the TLV has the U bit set to indicate 
that it should be ignored if it is not understood. This allows an 
LSR to propose the restart of a session, but revert easily to 
standard [LDP] or [CR-LDP] operation if its peer does not understand 
the TLV. (See the procedures section below.)








Matthews                  Expires August 2000               [Page 5]

Internet-Draft          Session Reestablishment        February 2000


3.3 Message Ack TLV

The Message Ack TLV can appear in any message to indicate 
acknowledgement of a message which the sending LSR has received from 
the receiving LSR.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|0|    Message Ack            |      Length                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Last Message ID Processed                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Last Message ID Processed
The ID of the last message which the sending LSR received and 
processed from the receiving LSR. 

Note that the ack is cumulative; that is, the use of this TLV acks 
not only the message specified but all previous messages. The 
receiving LSR MUST be able to accept gaps in the sequence of message 
IDs acked using this TLV. For example, it is acceptable for an LSR 
to include a Message Ack TLV with a value of 5, then not include any 
Message Ack TLV for a period of time, and then include a Message Ack 
TLV with a value of 12. This latter Message Ack TLV acks all 
messages from 6 to 12 inclusive.

The Message Ack TLV is an "optional" TLV according to the 
terminology of [LDP] and [CR-LDP]. It MAY appear in any message, but 
SHOULD appear only if the use of the procedures described in this 
draft has been agreed to by both peers.

Because it is an optional TLV, the TLV has the U bit set to indicate 
that it should be ignored if it is not understood. It also has the F 
bit cleared to indicate that it should not be forwarded to any other 
LSRs.

3.4 Status codes

This draft defines the following new status codes. See the 
procedures section for how they are used.

Status Code                 E      Status Data

I'll Be Back                1      (tbd)
Session Rejected/           1      (tbd)
   Parameters Max Session
   Down Interval
Session Rejected/           1      (tbd)
   No Previous Session


Matthews                  Expires August 2000               [Page 6]

Internet-Draft          Session Reestablishment        February 2000


Session Rejected/           1      (tbd)
   Parameters Last Message
   ID Processed
Session Rejected/           1      (tbd)
   Session Parameter
   Changed
Bad Message ID              1      (tbd)
Bad Message Ack             1      (tbd)
Out of Message IDs          1      (tbd)


4. New Procedures

4.1 Session Establishment

The procedures for session initialization are as specified in 
Section 2.5 of [LDP] with the following modifications.

a) An LSR which wishes to follow the procedures described in this 
draft includes a Session Reestablishment Capability TLV in the 
Initialization message it sends to its peer. An LSR which does 
not wish to follow the procedures described here does not include 
this TLV.

b) An LSR which receives an Initialization message containing a 
Session Reestablishment Capability TLV and which recognizes this 
TLV, but does not wish to follow the procedures described here 
ignores the TLV when processing the Initialization message. In 
particular, it SHOULD NOT send an "Unknown TLV" Status Code in 
reply.

c) An LSR which receives an Initialization message containing a 
Session Reestablishment Capability, and which wishes to follow 
the procedures described here, computes the minimum of the Max 
Session Down Interval specified in the message and its own Max 
Session Down Interval. If this value is acceptable, then it 
considers the TLV acceptable when processing the Initialization 
message, and it MUST use this computed value as the actual Max 
Session Down Interval for the duration of the session. If this 
value is not acceptable, then it SHOULD send an error 
Notification with a status code of "Session Rejected/Parameters 
Max Session Down Interval".

d) An LSR MUST both send a Session Reestablishment Capability TLV 
which is acceptable to its peer, and receive a Session 
Reestablishment Capability which is acceptable to it in order to 
use the procedures defined here for the remainder of the session. 
If it does not either send or receive a Session Reestablishment 
Capability TLV, then it SHOULD follow the procedures described in 
[LDP] and [CR-LDP]. If both peers include Session Reestablishment 


Matthews                  Expires August 2000               [Page 7]

Internet-Draft          Session Reestablishment        February 2000


Capability TLVs in their Initialization messages, but the 
computed Max Session Down Interval is not acceptable to one or 
both peers, then the session is torn down as specified in [LDP].

4.2 Message IDs

The procedures for using Message IDs are as specified in [LDP] or 
[CR-LDP] with the following modifications.

a) Each LSR treats the Message ID field as an unsigned 32-bit 
sequence number.

b) An LSR MAY use any value it wishes for the Message ID of the 
Initialization message. The value it uses becomes the initial 
sequence number. Subsequent messages are sent with consecutive 
increasing sequence numbers, continuing with 0 after 2**32 - 1 is 
used. 

c) When a session is reestablished, the old sequence of message IDs 
is broken and a new sequence is established with the message ID 
of the reestablishing Initialization message. For example, some 
implementations MAY elect to use the next number in the old 
sequence as the message ID of the Initialization message, while 
others MAY elect to restart the sequence at some fixed value.

d) An LSR which receives a message with a message ID that is not one 
greater than the message ID of the previous message (module 
2**32), MUST terminate the session with a status code of "Bad 
Message ID".

e) An LSR MUST NOT reuse a Message ID until it has received an ack 
for its previous use. This ensures that the LSR can uniquely 
match message acks to messages. If an LSR is getting close to 
exhausting this interval, then it MAY elect to stop sending 
messages for a while to allow its peer a chance to ack some 
messages. Regardless of whether it pauses or not, an LSR must 
reserve the Message ID for a Notification message (with status 
code "Out of Message IDs") which it can use to terminate the 
session.

4.3 Message Acks

The procedures for processing received messages are as specified in 
[LDP] or [CR-LDP] with the following additions.

a) When processing a message, each LSR arranges to record in some 
way its new local state. Note that this does not require the LSR 
to remember the message or even remember the transition it 
underwent from its old local state to its new local state. 


Matthews                  Expires August 2000               [Page 8]

Internet-Draft          Session Reestablishment        February 2000


However, the processing SHOULD be done in a manner that is as 
atomic as possible, so that if a fault occurs during processing, 
the LSR restarts the session with the old state.

b) As part of the local state, each LSR keeps the message ID of the 
last message it processed.
c) Whenever an LSR sends a message to its peer, the LSR MAY elect to 
include a Message Ack TLV. The value of the Message Ack TLV 
SHOULD be the value of the last Message ID processed. In certain 
implementations, the routine filling in the Message Ack TLV may 
not learn of messages that have been newly processed for some 
time; in these implementations, the routine SHOULD use the most 
accurate value it knows. In all cases, an LSR MUST NOT ack a 
message that has not yet been processed.

d) An LSR MUST ack messages within a relatively short time after 
processing them.

e) The sequence of Message Ack values MUST be monotonically 
increasing (modulo 2**32). The value may repeat, but it may not 
go backwards, nor can it jump ahead to a message that has not 
been sent yet. If an LSR receives a Message Ack TLV which does 
not obey these rules, then it MUST terminate the session with a 
Notification message with a status code of "Bad Message Ack".

4.4 Session Termination

A sessions between peers who have negotiated the use of the Session 
Restart capability can be terminated in the following ways.

a) One or both peers can experience an event that causes the TCP 
connection to be terminated without warning. Events of this 
nature might include activity switches in a control redundant 
system.

b) One or both peers can terminate the session using a Notification 
message with a status code of "I'll Be Back".

c) One or both peers can terminate the session because their local 
TCP gave up, or because their local keepalive timer expired.

d) One or both peers can terminate the session using a Notification 
message with a status code OTHER than "I'll Be Back".

Sessions terminated in the fourth way SHOULD NOT restarted, and an 
LSR SHOULD reject any attempts to restart such sessions.

Sessions terminated in one of the first three ways are candidates 
for restarting. An LSR SHOULD continue to use the labels received 
from its peer and honor the labels which it has distributed to its 


Matthews                  Expires August 2000               [Page 9]

Internet-Draft          Session Reestablishment        February 2000


peer until it determines that either the session has been restarted 
or it determines that the session cannot be successfully restarted. 
If an LSR determines that an session cannot be successfully 
restarted, it SHOULD discard any label bindings associated with the 
session.

An LSR determines that a session cannot be successfully restarted 
when one of the following occurs:

a) An interval longer than the computed max session down interval 
has elapsed since the LSR detected that the old TCP connection 
was broken. 

b) A new session has been established, but the peers did not agree 
to make this session a continuation of the old session.


4.5 Session Reestablishment

The procedures for reestablishing a session are an modification of 
the procedures for establishing the session originally (as described 
in the section "Session Establishment" above).

a) The two LSR peers use the LDP Identifier and Receiver LDP 
Identifier fields of the Initialization message to uniquely 
identify the session being reestablished.

b) An LSR indicates its willingness to reestablish the previous 
session by including the Want To Reestablish TLV in its 
Initialization message. 

c) A previous session can only be reestablished if both peers 
include the Want To Reestablish TLV in their Initialization 
messages, and each peer accept the value of the Want To 
Reestablish TLV that its receives.

d) If an LSR receives an Initialization message containing Want To 
Reestablish TLV, but it has no record of a previous session 
(perhaps because an interval greater than the computed max 
session down interval has elapsed since the previous session was 
terminated), then it rejects the Initialization message with a 
"Session Rejected/No Previous Session" status code.

e) If an LSR receives an Initialization message containing Want To 
Reestablish TLV, but it cannot reestablish the previous session 
at that point for some reason, then it rejects the Initialization 
message with a "Session Rejected/Parameters Last Message ID 
Processed" status code. (This could happen if the peer proposed a 
value which was out-of-range, or if, despite the peer proposing a 
   
   
Matthews                  Expires August 2000              [Page 10]

Internet-Draft          Session Reestablishment        February 2000


reasonable value, the local LSR simply cannot reestablish the 
session at that point, due to some internal restriction).

f) The reestablished session must have the same session parameters 
as the original session. Note that this does not mean that the 
Initialization messages used to reestablish the session must have 
exactly the same parameters as in the original exchange. Rather, 
it is the parameters that result from comparing the received 
Initialization message and the local configuration must be the 
same. A simple way to implement this is to send the computed 
session parameters from the original session in the 
reestablishing Initialization message.

g) If a peer detects that a session will be established with changed 
session parameters, then it SHOULD reject the session with a 
status code of "Session Rejected/Session Parameter Changed".

5. Security Considerations

There seems to be no difficulty in using these procedures with LDP 
or CR-LDP sessions that are protected using the MD5 signature 
option.

6. Areas for Further Study

This section discusses some possible areas for further study.

a) It might be useful to allow the session to be reestablished with 
new value for one or more session parameters. This would serve 
two purposes: one, it would provide a simple way to renegotiate 
session parameters, and two, it would provide a simple way of 
taking advantage of the new capabilities of upgraded control 
software. The main question to be answered here is: which session 
parameter changes can be reasonable supported? It is easy to see 
how a change in the KeepAlive interval can be accommodated, but 
what about changes to the label advertisement discipline or a 
decrease in the ATM label range?

b) It might also be useful to formalize methods of changing the 
transport addresses associated with the session. This would be 
particularly useful in control redundancy situations where the 
primary and backup LDP/CR-LDP entities have different IP 
addresses.

c) If the LSR which causes the TCP connection to drop plays the 
passive role in restarting the new session, then it must wait 
until its peer LSR initiates the session restart. If the 
underlying cause was an activity switch on the passive LSR, then 
the active LSR will not notice a problem until either the 
KeepAlive timer expires or the local TCP times out. This may take 


Matthews                  Expires August 2000              [Page 11]

Internet-Draft          Session Reestablishment        February 2000


a while. It would be nice if the passive LSR could somehow kick 
the active LSR into action sooner. Unfortunately, there are 
security implications in providing such a mechanism. One solution 
might be to add an "I've Come Back" flag to the Hello message and 
then extend MD5 protection to these messages.

7. Acknowledgements

The original inspiration for this draft was the proposal by David 
Ward and John Scudder for restarting BGP sessions [WARD]. I have 
borrowed some of their terms, but the nature of LDP and CR-LDP 
(specifically DoD mode) forced me to adopt a different approach.

Thanks also to Peter Ashwood-Smith for helpful comments when I was 
working out the technical details behind this proposal.


8. References

[CR-LDP] Constraint-Based LSP Setup using LDP, 
         draft-ietf-mpls-cr-ldp-04.txt

[LDP] LDP Specification, draft-ietf-mpls-ldp-05.txt

[WARD] BGP Notification Cease: I'll Be Back, draft-ward-bgp4-ibb-
00.txt


9. Author's Address

   Philip Matthews 
   Nortel Networks Corp.             
   P.O. Box 3511 Station C,         
   Ottawa, ON K1Y 4H7                
   Canada                          
   Phone: +1 613-768-3262          
   philipma@nortelnetworks.com         















Matthews                  Expires August 2000              [Page 12]