Internet Draft MPLS Working Group Philip Matthews INTERNET-DRAFT Nortel Networks Expiration Date: August 2000 February 2000 LDP/CR-LDP Session Reestablishment -- I'll Be Back <draft-matthews-mpls-ldp-ibb-00.txt> Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This contribution proposes modifications to the LDP and CR-LDP protocols that allow an LDP or CR-LDP session to be reestablished using a new TCP connection if the old TCP connection goes down unexpectedly. It also proposes that, in certain situations, an LSR continue to use the label bindings associated with a session for a short time after the session goes down, to allow forwarding to continue uninterrupted while the two peer LSRs attempt to reestablish the session. These modifications allow an LSR to easily implement hitless software upgrades and hitless activity switches. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Matthews Expires August 2000 [Page 1] Internet-Draft Session Reestablishment February 2000 1. Introduction Many recent router architectures decouple the control plane from the data plane, so that packet forwarding can continue even if the control software gets interrupted. One source of interruptions occurs during control switches; for example, when a router switches to a new version of the control software, or switches to a backup control processor in a control redundant system. It is possible to design a router to make these interruptions very brief, however, the nature of the TCP protocol is such that it is difficult to keep a TCP connection up across a control switch. The current specification of the LDP and CR-LDP protocols ([LDP] and [CR-LDP]) state that if the TCP connection associated with an LDP or CR-LDP session goes down, then the session itself is terminated and all label bindings are discarded. For that reason, it is difficult today to build an LSR which can keep its LDP and CR-LDP sessions up across a control switch. This contribution proposes modifications to the LDP and CR-LDP protocols that allow an LDP or CR-LDP session to be reestablished using a new TCP connection if the old TCP connection goes down. It also proposes that, in certain situations, an LSR continue to use the label bindings associated with a session for a short time after the session goes down, to allow forwarding to continue uninterrupted while the two peer LSRs attempt to reestablish the session. These changes allow a router to undergo a control switch with minimal disruption to the surrounding network. This contribution proposes that the two peer LSRs negotiate at session establishment time whether they wish to allow the session to be restarted or not. If this capability is not agreed to, then the session operates as specified in [LDP] and [CR-LDP], and the new procedures described here are not used. The negotiation procedure is such that an LSR which implements these modifications can establish a session with a peer without any a priori knowledge of whether the peer supports these new procedures or not. 2. Overview of the Method Say X and Y are two peer LSRs. When X and Y first establish an LDP or CR-LDP session, they include a new TLV, the Session Reestablishment Capability TLV, in the Initialization messages they exchange to negotiate the use of the procedures described in this draft. Once Session Reestablishment Capability has been negotiated, the two peers use the message id field present in all LDP and CR-LDP messages to track those messages that have been sent to their peer LSR but not yet processed. To enable them to do this, the two LSRs Matthews Expires August 2000 [Page 2] Internet-Draft Session Reestablishment February 2000 treat the message id field as a 32-bit unsigned sequence number, incrementing it by one with each new message sent, and rolling it over to 0 after 2**31 - 1 is reached. This form of message id allocation is not required by the base LDP and CR-LDP specifications [LDP] and [CR-LDP], but is required by the procedures described in this draft. Now say LSR X sends a message M to LSR Y. After it does so, X remembers that it has sent message M against the eventuality that TCP connection carrying M may be broken before Y receives the message. When Y receives M, then it first processes the message according to the normal LDP or CR-LDP procedures. Y also records its new state in some manner that allows the state to be remembered across a session restart event. (For example, it may write the new state into non- volatile memory). LSR Y then acks message M by using a new TLV, the Message Ack TLV, which contains the message id that X assigned to M. This Message Ack TLV is piggybacked on some message that Y happens to be sending back to X. When X receives the ack, it knows that message M has been processed, so it can now discard the record it kept of M. Now say some event happens that causes the TCP connection to drop. For example, Y might have control redundancy enabled and experience an activity switch. In this case, neither X or Y have any prior warning of the event. Alternatively, Y may be undergoing a software upgrade. In this case, Y may be able to shutdown the LDP session gracefully by sending a Notification message to X containing a new status code, the I'll-Be-Back status code, which indicates that Y hopes to reestablish the LDP or CR-LDP session shortly. In either case, Y is able to continue forwarding labelled packets without interruption (or with only a very brief interruption). To reestablish the session, they first establish a new TCP connection, and then exchange Initialization messages. These Initialization messages contain a new TLV, the Want To Reestablish TLV, which indicates the willingness of each peer to reestablish the previous LDP or CR-LDP session. In the Initialization message sent by X, the Want To Reestablish TLV contains the message id of the last message that X managed to receive and process from Y before the old TCP connection went down. Similarly, the Initialization message from Y includes a Want To Reestablish TLV giving the message id of the last message that Y had received and processed from X. Once Initialization messages have been successfully exchanged, the session has been reestablished. At this point, both peers know Matthews Expires August 2000 [Page 3] Internet-Draft Session Reestablishment February 2000 precisely which messages were sent but not received, and can resend the missed messages. However, an LSR is not forced to send the missed messages in the precise way that they sent originally: it is free to send whatever messages it wishes to in whatever order it wishes to. If the session is not reestablished, either because Y does not recover from the event, or because X and Y decide not to reestablish the session for some reason, then after a short interval X and Y both discard the label bindings associated with the session. 3. New TLVs and Status Codes The following subsections describe the new TLVs introduced by the proposed method. 3.1 Session Reestablishment Capability TLV The Session Restart Capability TLV can appear in the Initialization message to indicate willingness to follow the procedures described in this draft. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|0| Session Reestab Cap | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Max Session Down Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Reserved This field must be set to zeros on transmission, and ignored on reception. (Future enhancements to this procedure might use this field for flags or other purposes). Max Session Down Interval The maximum interval (in milliseconds) this LSR is willing to allow between the time it determines the TCP connection is broken and the time it determines the session has been successfully reestablished. Note that both ends propose Max Session Down Intervals -- the actual value is the minimum of the two proposed values. The Session Reestablishment Capability TLV is an "optional" TLV according to the terminology of [LDP] and [CR-LDP]. It MUST appear only in the Initialization message and only when the LSR wishes to use the procedures described in this draft. Matthews Expires August 2000 [Page 4] Internet-Draft Session Reestablishment February 2000 Because it is an optional TLV, the TLV has the U bit set to indicate that it should be ignored if it is not understood. This allows an LSR to propose the use of these procedures, but revert easily to standard [LDP] or [CR-LDP] operation if its peer does not understand the TLV. (See the procedures section below.) 3.2 Want To Reestablish TLV The Want To Reestablish TLV can appear in the Initialization message to indicate willingness to reestablish a previous an [LDP] or [CR- LDP] session that had been prematurely terminated. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|0| Want To Reestablish | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Last Message ID Processed | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Reserved This field must be set to zeros on transmission, and ignored on reception. (Future enhancements to this procedure might use this field for flags or other purposes). Last Message ID Processed The ID of the last message which the sending LSR received and processed from the receiving LSR. The sending LSR may have received later messages from the receiving LSR, but the sending LSR did not complete processing of them and thus does not remember them. The Want To Reestablish TLV is an "optional" TLV according to the terminology of [LDP] and [CR-LDP]. It MUST appear only in the Initialization message and only when the LSR wishes to restart a session using the procedures described in this draft. Because it is an optional TLV, the TLV has the U bit set to indicate that it should be ignored if it is not understood. This allows an LSR to propose the restart of a session, but revert easily to standard [LDP] or [CR-LDP] operation if its peer does not understand the TLV. (See the procedures section below.) Matthews Expires August 2000 [Page 5] Internet-Draft Session Reestablishment February 2000 3.3 Message Ack TLV The Message Ack TLV can appear in any message to indicate acknowledgement of a message which the sending LSR has received from the receiving LSR. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|0| Message Ack | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Last Message ID Processed | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Last Message ID Processed The ID of the last message which the sending LSR received and processed from the receiving LSR. Note that the ack is cumulative; that is, the use of this TLV acks not only the message specified but all previous messages. The receiving LSR MUST be able to accept gaps in the sequence of message IDs acked using this TLV. For example, it is acceptable for an LSR to include a Message Ack TLV with a value of 5, then not include any Message Ack TLV for a period of time, and then include a Message Ack TLV with a value of 12. This latter Message Ack TLV acks all messages from 6 to 12 inclusive. The Message Ack TLV is an "optional" TLV according to the terminology of [LDP] and [CR-LDP]. It MAY appear in any message, but SHOULD appear only if the use of the procedures described in this draft has been agreed to by both peers. Because it is an optional TLV, the TLV has the U bit set to indicate that it should be ignored if it is not understood. It also has the F bit cleared to indicate that it should not be forwarded to any other LSRs. 3.4 Status codes This draft defines the following new status codes. See the procedures section for how they are used. Status Code E Status Data I'll Be Back 1 (tbd) Session Rejected/ 1 (tbd) Parameters Max Session Down Interval Session Rejected/ 1 (tbd) No Previous Session Matthews Expires August 2000 [Page 6] Internet-Draft Session Reestablishment February 2000 Session Rejected/ 1 (tbd) Parameters Last Message ID Processed Session Rejected/ 1 (tbd) Session Parameter Changed Bad Message ID 1 (tbd) Bad Message Ack 1 (tbd) Out of Message IDs 1 (tbd) 4. New Procedures 4.1 Session Establishment The procedures for session initialization are as specified in Section 2.5 of [LDP] with the following modifications. a) An LSR which wishes to follow the procedures described in this draft includes a Session Reestablishment Capability TLV in the Initialization message it sends to its peer. An LSR which does not wish to follow the procedures described here does not include this TLV. b) An LSR which receives an Initialization message containing a Session Reestablishment Capability TLV and which recognizes this TLV, but does not wish to follow the procedures described here ignores the TLV when processing the Initialization message. In particular, it SHOULD NOT send an "Unknown TLV" Status Code in reply. c) An LSR which receives an Initialization message containing a Session Reestablishment Capability, and which wishes to follow the procedures described here, computes the minimum of the Max Session Down Interval specified in the message and its own Max Session Down Interval. If this value is acceptable, then it considers the TLV acceptable when processing the Initialization message, and it MUST use this computed value as the actual Max Session Down Interval for the duration of the session. If this value is not acceptable, then it SHOULD send an error Notification with a status code of "Session Rejected/Parameters Max Session Down Interval". d) An LSR MUST both send a Session Reestablishment Capability TLV which is acceptable to its peer, and receive a Session Reestablishment Capability which is acceptable to it in order to use the procedures defined here for the remainder of the session. If it does not either send or receive a Session Reestablishment Capability TLV, then it SHOULD follow the procedures described in [LDP] and [CR-LDP]. If both peers include Session Reestablishment Matthews Expires August 2000 [Page 7] Internet-Draft Session Reestablishment February 2000 Capability TLVs in their Initialization messages, but the computed Max Session Down Interval is not acceptable to one or both peers, then the session is torn down as specified in [LDP]. 4.2 Message IDs The procedures for using Message IDs are as specified in [LDP] or [CR-LDP] with the following modifications. a) Each LSR treats the Message ID field as an unsigned 32-bit sequence number. b) An LSR MAY use any value it wishes for the Message ID of the Initialization message. The value it uses becomes the initial sequence number. Subsequent messages are sent with consecutive increasing sequence numbers, continuing with 0 after 2**32 - 1 is used. c) When a session is reestablished, the old sequence of message IDs is broken and a new sequence is established with the message ID of the reestablishing Initialization message. For example, some implementations MAY elect to use the next number in the old sequence as the message ID of the Initialization message, while others MAY elect to restart the sequence at some fixed value. d) An LSR which receives a message with a message ID that is not one greater than the message ID of the previous message (module 2**32), MUST terminate the session with a status code of "Bad Message ID". e) An LSR MUST NOT reuse a Message ID until it has received an ack for its previous use. This ensures that the LSR can uniquely match message acks to messages. If an LSR is getting close to exhausting this interval, then it MAY elect to stop sending messages for a while to allow its peer a chance to ack some messages. Regardless of whether it pauses or not, an LSR must reserve the Message ID for a Notification message (with status code "Out of Message IDs") which it can use to terminate the session. 4.3 Message Acks The procedures for processing received messages are as specified in [LDP] or [CR-LDP] with the following additions. a) When processing a message, each LSR arranges to record in some way its new local state. Note that this does not require the LSR to remember the message or even remember the transition it underwent from its old local state to its new local state. Matthews Expires August 2000 [Page 8] Internet-Draft Session Reestablishment February 2000 However, the processing SHOULD be done in a manner that is as atomic as possible, so that if a fault occurs during processing, the LSR restarts the session with the old state. b) As part of the local state, each LSR keeps the message ID of the last message it processed. c) Whenever an LSR sends a message to its peer, the LSR MAY elect to include a Message Ack TLV. The value of the Message Ack TLV SHOULD be the value of the last Message ID processed. In certain implementations, the routine filling in the Message Ack TLV may not learn of messages that have been newly processed for some time; in these implementations, the routine SHOULD use the most accurate value it knows. In all cases, an LSR MUST NOT ack a message that has not yet been processed. d) An LSR MUST ack messages within a relatively short time after processing them. e) The sequence of Message Ack values MUST be monotonically increasing (modulo 2**32). The value may repeat, but it may not go backwards, nor can it jump ahead to a message that has not been sent yet. If an LSR receives a Message Ack TLV which does not obey these rules, then it MUST terminate the session with a Notification message with a status code of "Bad Message Ack". 4.4 Session Termination A sessions between peers who have negotiated the use of the Session Restart capability can be terminated in the following ways. a) One or both peers can experience an event that causes the TCP connection to be terminated without warning. Events of this nature might include activity switches in a control redundant system. b) One or both peers can terminate the session using a Notification message with a status code of "I'll Be Back". c) One or both peers can terminate the session because their local TCP gave up, or because their local keepalive timer expired. d) One or both peers can terminate the session using a Notification message with a status code OTHER than "I'll Be Back". Sessions terminated in the fourth way SHOULD NOT restarted, and an LSR SHOULD reject any attempts to restart such sessions. Sessions terminated in one of the first three ways are candidates for restarting. An LSR SHOULD continue to use the labels received from its peer and honor the labels which it has distributed to its Matthews Expires August 2000 [Page 9] Internet-Draft Session Reestablishment February 2000 peer until it determines that either the session has been restarted or it determines that the session cannot be successfully restarted. If an LSR determines that an session cannot be successfully restarted, it SHOULD discard any label bindings associated with the session. An LSR determines that a session cannot be successfully restarted when one of the following occurs: a) An interval longer than the computed max session down interval has elapsed since the LSR detected that the old TCP connection was broken. b) A new session has been established, but the peers did not agree to make this session a continuation of the old session. 4.5 Session Reestablishment The procedures for reestablishing a session are an modification of the procedures for establishing the session originally (as described in the section "Session Establishment" above). a) The two LSR peers use the LDP Identifier and Receiver LDP Identifier fields of the Initialization message to uniquely identify the session being reestablished. b) An LSR indicates its willingness to reestablish the previous session by including the Want To Reestablish TLV in its Initialization message. c) A previous session can only be reestablished if both peers include the Want To Reestablish TLV in their Initialization messages, and each peer accept the value of the Want To Reestablish TLV that its receives. d) If an LSR receives an Initialization message containing Want To Reestablish TLV, but it has no record of a previous session (perhaps because an interval greater than the computed max session down interval has elapsed since the previous session was terminated), then it rejects the Initialization message with a "Session Rejected/No Previous Session" status code. e) If an LSR receives an Initialization message containing Want To Reestablish TLV, but it cannot reestablish the previous session at that point for some reason, then it rejects the Initialization message with a "Session Rejected/Parameters Last Message ID Processed" status code. (This could happen if the peer proposed a value which was out-of-range, or if, despite the peer proposing a Matthews Expires August 2000 [Page 10] Internet-Draft Session Reestablishment February 2000 reasonable value, the local LSR simply cannot reestablish the session at that point, due to some internal restriction). f) The reestablished session must have the same session parameters as the original session. Note that this does not mean that the Initialization messages used to reestablish the session must have exactly the same parameters as in the original exchange. Rather, it is the parameters that result from comparing the received Initialization message and the local configuration must be the same. A simple way to implement this is to send the computed session parameters from the original session in the reestablishing Initialization message. g) If a peer detects that a session will be established with changed session parameters, then it SHOULD reject the session with a status code of "Session Rejected/Session Parameter Changed". 5. Security Considerations There seems to be no difficulty in using these procedures with LDP or CR-LDP sessions that are protected using the MD5 signature option. 6. Areas for Further Study This section discusses some possible areas for further study. a) It might be useful to allow the session to be reestablished with new value for one or more session parameters. This would serve two purposes: one, it would provide a simple way to renegotiate session parameters, and two, it would provide a simple way of taking advantage of the new capabilities of upgraded control software. The main question to be answered here is: which session parameter changes can be reasonable supported? It is easy to see how a change in the KeepAlive interval can be accommodated, but what about changes to the label advertisement discipline or a decrease in the ATM label range? b) It might also be useful to formalize methods of changing the transport addresses associated with the session. This would be particularly useful in control redundancy situations where the primary and backup LDP/CR-LDP entities have different IP addresses. c) If the LSR which causes the TCP connection to drop plays the passive role in restarting the new session, then it must wait until its peer LSR initiates the session restart. If the underlying cause was an activity switch on the passive LSR, then the active LSR will not notice a problem until either the KeepAlive timer expires or the local TCP times out. This may take Matthews Expires August 2000 [Page 11] Internet-Draft Session Reestablishment February 2000 a while. It would be nice if the passive LSR could somehow kick the active LSR into action sooner. Unfortunately, there are security implications in providing such a mechanism. One solution might be to add an "I've Come Back" flag to the Hello message and then extend MD5 protection to these messages. 7. Acknowledgements The original inspiration for this draft was the proposal by David Ward and John Scudder for restarting BGP sessions [WARD]. I have borrowed some of their terms, but the nature of LDP and CR-LDP (specifically DoD mode) forced me to adopt a different approach. Thanks also to Peter Ashwood-Smith for helpful comments when I was working out the technical details behind this proposal. 8. References [CR-LDP] Constraint-Based LSP Setup using LDP, draft-ietf-mpls-cr-ldp-04.txt [LDP] LDP Specification, draft-ietf-mpls-ldp-05.txt [WARD] BGP Notification Cease: I'll Be Back, draft-ward-bgp4-ibb- 00.txt 9. Author's Address Philip Matthews Nortel Networks Corp. P.O. Box 3511 Station C, Ottawa, ON K1Y 4H7 Canada Phone: +1 613-768-3262 philipma@nortelnetworks.com Matthews Expires August 2000 [Page 12]