Internet Draft MPLS Working Group Peter Ashwood-Smith Internet Draft Bilel Jamoussi Expiration Date: September 2000 Don Fedyk Darek Skalecki Nortel Networks February 2000 IMPROVING TOPOLOGY DATA BASE ACCURACY WITH LSP FEEDBACK VIA CR-LDP draft-ietf-mpls-te-feed-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract One key component of traffic engineering is a concept known as constraint based routing. In constraint based routing a topology database is maintained on all participating nodes. This database contains a complete list of all the links in the network that participate in traffic engineering and for each of these links a set of constraints which those links can meet. Bandwidth, for example, is one essential constraint. Since the bandwidth available changes Ashwood-Smith, et. al. [Page 1]Internet Draft LSP Feedback with CR-LDP January, 2000 as new LSPs are established and terminated the topology database will develop inconsistencies with respect to the real network. It is not possible to increase the flooding rates arbitrarily to keep the database discrepancies from growing. We propose a new mechanism whereby a source node can learn about the successes or failures of its path selections by receiving feedback from the paths it is attempting. This fed-back information can be incorporated into subsequent route computations, which greatly improves the accuracy of the overall routing solution by significantly reducing the database discrepancies. 1. Introduction Because the network is a distributed system, it is necessary to have a mechanism to advertise information about links to all nodes in the network [IS-IS], [OSPF]. A node can then build a topology map of the network. This information is required to be as up-to-date as possible for accurate traffic engineered paths. Information about link or node failures must be rapidly propagated through the network so that recovery can be initiated. Other information about links that may be useful for reasons of quality of service include parameters such as available bandwidth, and delay. The information in this topology database is often out of date with respect to the real network. Available bandwidth is the most critical of these attributes and it can drift substantially with respect to reality due to the low frequency of link state updates that can be sustained in a very large topology. We refer to the deviation in the topology database available bandwidth as being optimistic if the database shows more available bandwidth than there really is, or pessimistic if the topology database shows less bandwidth than there really is. This distinction is important because we shall propose an efficient algorithm to deal with optimistic databases without resorting to shorter flooding intervals. One of the major problems for a constraint based routing system is dealing with changing constraints. Obviously, since bandwidth is one of the essential constraints, dealing with the rapid changes in reserved bandwidth poses some interesting challenges. In smaller networks, one can resort to higher frequency flooding but this obviously does not scale. The basic proposal is to add to the signaling protocol the ability to piggyback actual link bandwidth availability information at every link that the signaling traverses. This is done as part of the reverse messaging on success or failure (mapping, release, withdraw or notification). What this means is that every time signaling messages flow backwards toward a source to tell it of the success, failure or termination of a request, that message contains a detailed slice of bandwidth availability information for the exact path that the message has followed. This slice of reservation information, which is very up to date, is received by the source Ashwood-Smith, et. al. January 2000 [Page 2]Internet Draft LSP Feedback with CR-LDP January, 2000 node and inserted into the source node's topology database prior to making any further source route computations. The result is that the source node's topology database will tend to stay synchronized with the slices of the network through which it is establishing paths. This is nothing more than learning from successes and failures and represents an intelligent alternative to either waiting for floods or introducing non-determinism (guessing) into the source algorithms. Operating a constraint based routing system without such feedback is inefficient at best since a source node will continue to give out incorrect route over and over again until it gets an IGP update. This could be minutes away and as a result the worst case blocking time for a new route is the minimum repeatable flooding interval (often several minutes in big networks). Alternatives to feedback mechanisms involve adding some non-determinism (randomness) to the routing algorithm in the hopes that it will stumble onto a path that works. These sorts of approaches are seen in ATM dynamic routing systems, which do not have these forms of feedback. In order to get a good understanding of how the feedback works, imagine a network with precisely one path (with sufficient unreserved bandwidth) available from the source to the destination. Further, imagine that the topology database at the source is significantly out of date with respect to the real network in that the source topology database sees sufficient bandwidth available on many different routes to the destination. We call this being optimistic with respect to the network since the source thinks that more bandwidth is available than there really is. When such an optimistic source selects its first path it will likely contain links that do not in reality have sufficient unreserved bandwidth. Therefore, the path is only established up to the link that does not have sufficient bandwidth. A notification message is formatted that contains the actual unreserved bandwidth for this blocking link which flows back toward the source, collapsing the partially created path as it goes. In addition, at every link that this notification traverses, the current unreserved bandwidth information for each corresponding link is appended to the vector of unreserved bandwidth along the path. In this manner, an accurate view of the slice through the network we are traversing is constructed. Eventually this message arrives back at the source node, where the vector is taken and inserted into the topology database. This node has just learned from its mistake and is now slightly less optimistic with respect to the real network conditions. Path selection can be attempted again but this time the node will not make the same mistake it made the previous time. The link in question, at which rejection occurred the first time, will not even be eligible this time around, so a source route computation is guaranteed to produce a different path (or none). The same procedure Ashwood-Smith, et. al. January 2000 [Page 3]Internet Draft LSP Feedback with CR-LDP January, 2000 may be repeated as many times as is necessary, each time learning from its mistakes, until eventually no paths remain in the source topology to the destination, or we actually find a path that works. This tendency to converge either to a solution or determine that there is no solution is an important property of a routing system (it actually behaves a lot like a depth first search). This property is not present with flooding mechanisms alone since the source node must randomly hunt, or continually make the same mistakes, or abort until the next flood arrives. In addition to feeding back bandwidth on failure, we also recommend feedback on success. This has important consequences on our ability to spread load or to spill over to new links as existing links fill. It is true that spilling over to new links does not require feedback on success since we could simply wait for a feedback on failure, but we can achieve better load spreading earlier. Finally, when a path is torn down the release/withdraw messages also contain bandwidth information that can be fed back into the source topology database. This is very important during failure scenarios where the links we need to use to reroute the path share common sub- segments with the failed path. Without the feedback, the common sub- segments may not indicate sufficient available bandwidth until we get a flood that may mean many seconds without a connection. With feedback at least we will be up to date with respect to available bandwidth up to the point of failure in the path. Also since failure involves many paths tearing down and re-establishing this is the time that it is most critical to have an accurate view. When preemption is being employed it is also extremely important that the topology database inconsistencies be small. If not, high setup priority LSPs may unnecessarily preempt lower holding priority LSPs to obtain bandwidth that, had they had a more up to date view of unreserved bandwidth, they would have been able to find elsewhere. Since preempted LSPs may in turn preempt other LSPs in a domino like effect, the results of such database inconsistencies can have wide reaching ripple like impacts. These feedback mechanisms help reduce these occurrences significantly. There are a number of network conditions where feedback shows its value. One can think of a constraint-based network as being in one of three conditions. The first is called ramp-up, this is when the rate of arriving reservations exceeds the rate of departing reservations. The second is called steady-state, this is when the rate of arriving reservations is about the same as the rate of departing reservations. Finally, the ramp-down condition is that which has a greater rate of departing reservations than arriving reservations. These three network conditions show distinctly different types of error in the topology databases. In particular an optimistic view of available bandwidth by a source node is characteristic of the ramp- Ashwood-Smith, et. al. January 2000 [Page 4]Internet Draft LSP Feedback with CR-LDP January, 2000 up condition of a network. A pessimistic view of available bandwidth by a source node is characteristic of the ramp-down condition of a network. If one plots the average error in the topology databases with respect to the real network for the three different network conditions, one will see the error slowly go positive during ramp up, slowly go negative during ramp down, and drift slowly around 0 for the steady condition. The effect of flooding on this plot is to periodically snap the error back to 0 at flooding intervals. The effect of the feedback algorithm is to bring an optimistic error back to zero without having to wait for the flood interval. On average then, the feedback algorithm tends to halve the absolute error, keeping it mostly negative or pessimistic. This makes sense since a routing system will never give paths to links that it thinks do not have resources and as a result its pessimistic view of the world stays that way until it gets a flood. This relieves the IGP updates of the most urgent requirement of flooding when bandwidth is consumed. Availability of new bandwidth occurs when paths are released or new links become available. New links are accompanied by floods. Significant releases of bandwidth can be broadcast at relatively low frequencies in the order of several minutes with little operational impact. Extensive operational experience with this feedback protocol in proprietary Nortel Networks (pre-standard CR-LDP) products has shown it to work very well for networks up to 1000 nodes with significant flooding intervals damped to several minutes. Without this protocol, these networks would block setups for up to several minutes. With this protocol, the blocking in most cases is reduced to a small number of retry attempts which is usually sub-second depending mostly on the propagation delays in the network. These feedback algorithms have been particularly beneficial in cases of failure recovery during which the network is in a sudden condition of ramp-up. Since a large number of reservations must be remade, it is highly likely that we will exceed the limits of certain key links in the network. Without feedback, the rerouting must block until a flood arrives telling us of the situation at those key links at which time rerouting can continue. With feedback, the rerouting simply continues until a feedback indicates that a link is full. In addition since reservation-balancing algorithms are also often used, feedback allows the balancing algorithms to make better distribution decisions based on immediate feedback. We have also explored through simulation and implementation a variety of mechanisms to deal with the pessimistic error in the database. One simple proposal is to use selective forgetting. In this algorithm, a reserved bandwidth value slowly drops back to zero over a relatively short time interval. The theory being that you shift the network back to an optimistic state (by forgetting your pessimism) where the feedback algorithm will again correctly operate. These algorithms have not shown any great advantage and are actually non-optimal when the error is purely optimistic. Ashwood-Smith, et. al. January 2000 [Page 5]Internet Draft LSP Feedback with CR-LDP January, 2000 Other algorithmic permutations we have explored include such variations as: Feeding-back to all intermediate nodes, information learned from control messages upstream of that intermediate node. Feeding back in both directions so that both the source and destination node's databases stay synchronized. Allowing a request to continue to its destination despite there being insufficient bandwidth at some intermediate hop. Then, rejecting the request with a full bandwidth vector slice all the way to the destination instead of just to the point of rejection. Our simulations have not show significant benefits relative to the simpler algorithm proposed here. However, it is an interesting research topic to explore and quantify the different feedback algorithms and their impacts on blocking times so we do not want to discourage the interested reader from exploring these concepts more fully. 2. Adding feedback TLVs to CR-LDP Two new TLVs are optionally added to the CR-LDP mapping, notification, and withdraw messages. There may be an arbitrary number of these TLV in any order or position in the message. It is recommended that they be placed such that they can be read and applied to a topology database by scanning the message forwards and walking the topology database from the point where the last link feedback TLV left off. Each TLV consists of the 8 unreserved bandwidth values for each holding priority 0 through 7 as IEEE floating point numbers (the units are unidirectional bytes per second). Following this are the IP addresses of the two ends of the interface. Two TLVs are possible, one for IPV4 and one for IPV6 addressing of the link. 2.1 Bandwidth directionality considerations The order of the two addresses in the feedback TLV implies the direction in which the bandwidth is available. For example if the first address is A and the second address is B the bandwidth is unreserved in the A to B direction. It is possible for an implementation to provide both the A to B direction and the B to A direction as part of the same feedback message. This is done by simply including a TLV with A,B as the addresses of the link and a different TLV with B,A as the addresses of the link. Should CR-LDP evolve to be able to support bi- directional traffic flow and reservations it is expected that bi- directional feedback would also be implemented via this mechanism. Ashwood-Smith, et. al. January 2000 [Page 6]Internet Draft LSP Feedback with CR-LDP January, 2000 3. IPV4 specified link feedback TLV 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U|F| 0x830 | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BANDWIDTH UNRESERVED AT HOLDING PRIORITY 0 (IEEE float) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . . . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BANDWIDTH UNRESERVED AT HOLDING PRIORITY 7 (IEEE float) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPV4 address of interface (near end) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPV4 address of interface (far end) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4. IPV6 specified link feedback TLV 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U|F| 0x831 | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BANDWIDTH UNRESERVED AT HOLDING PRIORITY 0 (IEEE float) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . . . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BANDWIDTH UNRESERVED AT HOLDING PRIORITY 7 (IEEE float) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPV6 address of interface (near end) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPV6 address of interface (far end) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5. Detailed Procedures On receipt of a withdraw, notification, or mapping message pertaining to a request made by CR-LDP (as opposed to LDP), a feedback TLV of the appropriate format for the interface over which the message was received is inserted into the message before forwarding it back to the source of the request. The 8 bandwidth values are filled in with the outgoing bandwidth available on this interface for each of the 8 holding priorities in bytes per second. Finally the interface's address and far end address are placed in the TLV. On receipt of a CR-LDP request message which cannot be satisfied. A notification message is formatted normally. The 8-bandwidth values Ashwood-Smith, et. al. January 2000 [Page 7]Internet Draft LSP Feedback with CR-LDP January, 2000 are filled in with the outgoing bandwidth available on this interface for each of the 8 holding priorities in bytes per second. Finally, the interface's address and far end address are placed in the TLV. On receipt of a CR-LDP request message which has been satisfied and which results in a mapping being generated. No feedback TLV is added since the previous node will insert the proper TLV when it receives the reverse flowing mapping. When an LDP session goes down either because of a link failure, TCP/IP timeout, keepalive timeout, adjacency timeout etc. Other LDP sessions in the module must generate either notification, withdraw or release messages for LSPs that traversed the LDP in question. In the case that the LSP was created by CR-LDP and that a withdraw or notification is about to be generated, LDP will insert a feedback TLV for the interface which just went down that contains 0's for all the bandwidth values and attach to it the proper interface addresses. When the LDP session that originated a CR-LDP label request receives a mapping that contains feedback TLV's it is recommended that these bandwidth values overwrite the corresponding values in the node's topology database. Doing so permits this node to immediately synchronize its topology with respect to the real bandwidth reservations along the path that was just established. When the LDP session that originated a CR-LDP label request receives a notification that contains feedback TLV's it is recommended that these bandwidth values overwrite the corresponding values in the nodes topology database. Doing so permits this node to immediately synchronize its topology with respect to the real bandwidth reservations along the path that just failed to establish. The source node may then re-compute a path knowing that the computation will take into account the failure if it was caused by the topology database being in error with respect to the real network state. 6. IGP considerations Implementations MUST NOT permit bandwidth information learned by this feedback mechanism to be re-flooded via IS-IS, OSPF or any other IGP. The bandwidth information learned via these feedback mechanisms is to be used ONLY for source route computations on the nodes that are directly on the path that fed back the bandwidth. Normally only the source node of the LSP, or perhaps intermediate gateway nodes will use this information. It is however permitted for intermediate nodes that are forwarding this feedback information to store it for their own local source route computations. There is a possibility of a race condition between the bandwidth information that is received via feedback and that which is received via a normal IGP flood. While there may be a discrepancy between the Ashwood-Smith, et. al. January 2000 [Page 8]Internet Draft LSP Feedback with CR-LDP January, 2000 two, both are within a few 100 milliseconds of being correct. Solutions to allow us to determine which information is most up to date (say by adding a sequence number) do not add any significant benefit. Constraint based, source routed systems will always have errors in the local topology database with respect to the real network. We can reduce these errors through reduced flooding intervals, path following feedback and selective flooding but we cannot realistically reduce the errors below the second or so range. As a result propagation delay order race conditions are noise with respect to the average expected errors. An implementation SHOULD therefore consider the most recently received update (IGP or feedback) as being the most up to date. 7. Future considerations Constraint based routing systems such as CR-LDP will in the future offer other forms of constraint than simply reserved bandwidth. Actual utilization levels, current congestion levels, number of discrete channels/wavelengths available etc. are all possible constraints that change rapidly and which must be taken into consideration when computing a route. It is expected that this mechanism will be used to feedback these and other new forms of link constraining data. 8. RSVP consideration Nothing precludes the use of such feedback mechanisms with a similar TLV structure in the RSVP Resv and other reverse flowing message although repeatedly applying a fed-back update into a local topology database is wasteful and probably should be damped. 9. Intellectual Property Consideration The IETF has been notified of intellectual property rights claimed in regard to some or all of the specification contained in this document. For more information consult the online list of claimed rights. 10. Security Considerations This document raises no new security considerations for CR-LDP, RSVP or MPLS in general. 11. Acknowledgments The authors would like to thank Keith Dysart for his guidance, and Jerzy Miernik for helping implement these concepts and bringing them to life. 12. References Ashwood-Smith, et. al. January 2000 [Page 9]Internet Draft LSP Feedback with CR-LDP January, 2000 [CR-LDP] Constraint-Based LSP Setup using LDP, draft-ietf-mpls-cr- ldp-04.txt [LDP] LDP Specification, draft-ietf-mpls-ldp-05.txt [IS-IS] Extensions to IS-IS for traffic engineering, draft-ietf- isis-traffic-01.txt 13. Author's Addresses Peter Ashwood-Smith Bilel Jamoussi Nortel Networks Corp. Nortel Networks Corp. P.O. Box 3511 Station C, 600 Technology Park Drive Ottawa, ON K1Y 4H7 Billerica, MA 01821 Canada USA Phone: +1 613-763-4534 phone: +1 978-288-4506 petera@nortelnetworks.com jamoussi@nortelnetworks.com Darek Skalecki Don Fedyk Nortel Networks Corp. Nortel Networks Corp. P.O. Box 3511 Station C, 600 Technology Park Drive Ottawa, On K1Y 4H7 Billerica, MA 01821 Canada USA Phone: +1 613-765-2252 Phone: +1 978-228-3041 dareks@nortelnetworks.com dwfedyk@nortelnetworks.com Full Copyright Statement Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. Ashwood-Smith, et. al. January 2000 [Page 10]