Internet Draft Internet Engineering Task Force Curtis Villamizar INTERNET-DRAFT UUNET draft-villamizar-mpls-omp-01 February 25, 1999 MPLS Optimized Multipath (MPLS--OMP) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference mate- rial or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright (C) The Internet Society (February 25, 1999). All Rights Reserved. This draft specifies a precedure for routing MPLS label switched paths and balancing load across these paths. The algorithms presented here have been coded into a simulator and performance has been evaluated for a limited set of cases. Some details of the algorithms are likely to change as simulations progress. The portions of this document for which simulation experience is limited are Section 6, Section 7, Ap- pendix B.7. This draft contains notes pointing out sections which are more likely to change. INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Abstract MPLS ingress routers may establish one or more explicit paths to a given egress to the MPLS domain. Load can be balanced across a com- plex topology using MPLS and the technique referred to here as MPLS-- OMP (Multiprotocol Label Switching -- Optimized Multipath). The tech- nique requires the use of a link state interior gateway protocol (IGP) such as OSPF or IS--IS with extensions to the link state protocol to flood loading information. It requires the ingress router be capa- ble of computing a hash with a sufficiently fine level of granularity based on the IP source and destination and selecting a forwarding entry based on the outcome of the hash. MPLS Optimized Mulitpath is an extension to MPLS. MPLS-OMP requires no additions or modification to the MPLS protocols. MPLS-OMP does require that the IGP be capable of flooding loading information as described in the OSPF-OMP and ISIS-OMP documents. At the MPLS ingress an algorithm is applied to select alternate paths where needed and adjust forwarding. Forwarding is adjusted gradually enough to insure stability yet fast enough to track long term changes in loading. The load adjustment algorithm itself is fully defined in a related docu- ment describing OSPF--OMP. The application of this technique to MPLS is described here. 1 Overview Networks are often heavily loaded. Topologies often evolve to include multiple paths. Multiple paths may be initially designed to provide redundancy but also result from incremental addition of circuits to accommodate traffic growth. The redundant paths provide a potential to distribute traffic loading and reduce congestion. Optimized Mulit- path (OMP) provides a means to make better use of this potential to distribute loading. Please refer to [12] for a more complete intro- duction to OMP. MPLS involves a forwarding technique. The MPLS framework and ar- chitecture are defined by [2, 3]. The generic forwarding encapsula- tion is defined by [7]. MPLS--OMP involves the setup of MPLS Label Switched Paths (LSPs) with strictly routed Explicit Routes (ERLSPs). There are two methods of setting up LSPs, the Label Distribution Pro- tocol (LDP) [11, 1], and use of RSVP extensions [6, 5]. Both methods of setting up LSPs support strict (and loose) routed ERLSPs. In an underutilized topology, criteria such as minimum delay dominate routing decisions. Often IGP link costs (costs in OSPF, metrics in IS--IS) are set roughly according to link propagation delay. If the entire topology is not underutilized, some consideration is given to Villamizar Expires August 25, 1999 [Page 2] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 lower capacity links or more heavily utilized links to reduce their utilization somewhat. As utilization grows further adjusting link costs to distribute load becomes inadequate. When using MPLS--OMP, link costs may be set roughly according to link propagation delay without regard to link capacity or expected uti- lization. An initial set of LSP will be set up according to these costs as described in Section 4. There will initially one or more LSP between any ingress and any egress and these LSP will take the path of least cost. There will be exactly one LSP between any ingress and egress except where multiple paths exist with equal cost. If metrics are set with sufficient precision, for example taking delay in tenths of a millisecond and using this as cost (such precision is possible with OSPF, but may not be in IS--IS where link metrics are limited to values of 0-63), then few if any equal cost paths will exist. MPLS uses link loading and loss information flooded by OSPF--OMP or ISIS--OMP as described in Section 2. Each link along a given path is examined to see if loading persistently exceeds a specific set of criteria. Alternate paths will be created when criteria is exceeded. This criteria includes the severity of the loading, the amount of time the condition has persisted, and the ingress relative contribution to the loading. This is described in Section 5. The distribution of load among a set of alternate paths is deter- mined by the amount of number space from a IP source and destination hash computation that is allocated to each path. For example using a 16 bit hash such as CRC--16, if there are two paths, one may be given 80% of the 16 bit number space or the values 0-52428 and the other given 52429-65535 to accomplish a split of approximately 4:1. The initial load applied to a newly created alternate LSP is small. As loading on the alternate path persists in being lower than loading on other paths in use, more load is moved to the path with lower load- ing. The increment of change is gradually increased. At some point, what was among the less loaded paths becomes more heavily loaded than all other paths. At this point, this path is designated the most heavily loaded, the load increment toward the previous most heavily loaded path is at least halved, and load is moved away from the new most heavily loaded path. At this point load will begin to converge such that loading on the most heavily loaded link on each paths tends toward equality. See Section 8 and Section 9 for details. When more than one path between an ingress and an egress exists, after load balancing, there may still be an excessive load on the entire set of paths. The same criteria in terms of severity of loading, persis- tence of the condition, and ingress contribution used when examining a single path to see if an alternate is needed is used to examine a set of paths. For each path currently in use, the loading on the most heavily loaded link of the path is considered to be the ``path load- ing''. The lowest path loading is considered to be the loading on the Villamizar Expires August 25, 1999 [Page 3] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 set of paths in considering whether to add an additional path. The algorithm described in Section 5 covers this case as well. If loading is reduced or failed links are restored, then sets of paths may be found to have excess capacity. Paths which are less desir- able in terms of cost, and so therefore also in terms of delay, can be deleted. This is described in Section 6. The impact of link state changes on the path layouts is discussed in Section 7. An algorithms is described that insures that path selec- tion returns to an optimal state after arbitrary link state changes have occurred and the topology has then become quiescent. This overview provides a qualitative description of the MPLS--OMP algorithms. The details of these algorithms are provided in the sec- tions that follow. 2 Use of ISIS--OMP and OSPF--OMP Flooding Both OSPF and IS--IS are link state protocols commonly used as in- terior gateway protocols (IGPs) [10, 8, 4]. MPLS relies on an IGP to determine the feasibility of paths and the relative preference of paths. MPLS--OMP also relies upon the IGP to provide link loading information. The flooding rate and encapsulation of link loading information provided by OSPF or IS--IS is defined in [12] and [9]. The OSPF or IS--IS flooding of link loading information must be en- abled (as defined in [12, 9]). The OSPF or IS--IS load adjustment should be disabled since the amount of traffic forwarded directly by IP (without MPLS encapsulation) will be minimal and have little or no impact on loading. 3 Link, Path, and Path Set Equivalent Load Link loading and link loss is determined from information flooded by OSPF or IS--IS as described in Section 2. ``Link equivalent load'' on each individual link is computed based on link utilization and link loss as described in [12]. A path is a series of links. ``Path equivalent load'' is the maximum link equivalent load along the path. A ``path set'' is a set of paths. ``Path set equivalent load'' is the minimum path equivalent load among a set of paths. Path and path set equivalent load are defined precisely in Appendix B.2. Villamizar Expires August 25, 1999 [Page 4] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 4 Initial Path and Initial Loading From the standpoint of a given ingress, a given egress at some point either at some epoch, or after an outage, makes a transition from completely unreachable to reachable. When this transition occurs, a single initial LSP is created from the ingress to the egress without regard to traffic loading. The shortest path based on IGP metrics is used. If multiple paths have identical shortest metrics, then all of these paths may be used as an initial path set. The best path cri- teria may be further relaxed as described and recommended in [12] to create a larger number of initial paths. All traffic to the given egress is put on this path or initially split among the set of paths. Further details are found in Appendix B.3 5 Creating Alternate Paths A path set may consist of one or more paths. Many path sets will contain only the initial path created when the egress first became reachable as described in Section 4. The path set loading is deter- mined as described in Section 3. The path set loading is compared to a set of thresholds to determine if an alternate path should be created. The set of thresholds should be configurable as described in Appendix A. The symbolic names MIN_NEW_PATH, MAX_NEW_PATH, NEW_PATH_INCR, PATH_COMP_OFFS, LOAD_COMP_OFFS, and OVERLOAD_PERSIST will be used here. MIN_NEW_PATH, MAX_NEW_PATH, NEW_PATH_INCR, PATH_COMP_OFFS, and LOAD_COMP_OFFS are in units of equivalent load, where 1.0 is full link utilization. OVERLOAD_PERSIST is in units of time, generally seconds. A array of values will be maintained for each path set. The array will be indexed such that there is an array entry for each value from MIN_NEW_PATH to MAX_NEW_PATH inclusive of MAX_NEW_PATH in increments of NEW_PATH_INCR. For example, if MIN_NEW_PATH is 0.45, MAX_NEW_PATH is 1.10, and NEW_PATH_INCR is 0.05, there would be array entries cor- responding to 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, 1.00, 1.05, and 1.10. Each array entry can contain a timestamp or a value illegal as a timestamp referred to here as the constant ``NEVER''. Each array element is initialized to NEVER. If the path set loading is less than MIN_NEW_PATH then all array entries are set to NEVER. If the path set loading is greater than MIN_NEW_PATH, then each array element that corresponds to a loading below or equal to the path set loading that contains the value NEVER is set to the current time. Each array entry that corresponds to a value above the current path load is set to NEVER. Villamizar Expires August 25, 1999 [Page 5] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 When some array values are not all set to NEVER, a conditional de- fined in Appendix B.4 is periodically checked for each array entry. If the conditional is true for any array entry, the decision process is terminated and the search for an alternate path is undertaken. To determine if an alternate path is available, and if so deter- mine the best alternate path, an SPF calculation is performed after temporarily removing all links from the OSPF or IS--IS link state database whose link equivalent load equals or exceed the path set equivalent load. This SPF may result in no path between the ingress and egress. If so, no additional paths are available which can re- move load from the path set under consideration, and no action can be taken. Otherwise, the SPF will yield one or more best paths between ingress and egress. LSPs for these paths are set up and added to the path set. When a path is added to a path set, the time stamp on the array en- tries should be artificially advanced by PATH_ADD_WAIT. If adding PATH_ADD_WAIT puts the time stamp on an array entry into the future that entry is set to NEVER. This will insure that additions to this path set are spaced out in time by at least this amount of time. The algorithms described above are provided in a more concise and precise form in Appendix B.4. When a path is added to a path set, an initial loading and initial rate at which this loading is increased is set. These initial values and the loading adjustment algorithm is provided in Section 8. The arrays are updated when new loading information arrives or every PATH_CHECK seconds and the arrays checked every PATH_CHECK seconds. This parameters is configurable as described in Appendix A. 6 Deleting from Underutilized Path Sets Please note that simulations of this portion of the algorithm have been completed but the behavior of this portion of the algorithm needs to be studied through further simulations. Some details may change. Normal time of day load fluctuations or restoral of link failures can result in path sets that are significantly underutilized. When this occurs, one path at a time can be dropped from the path set. Param- eters are defined in Appendix A which determine the conditions under which a path can be dropped. The loading on the most heavily loaded path in the path set must be below PATH_DEL_SHARE. When this condi- tion becomes true a timestamp must be retained. If the condition is no longer true, the timestamp must be invalidated. If this condi- tion remains true for a period of time, referred to here as ``time to Villamizar Expires August 25, 1999 [Page 6] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 deletion'', then a path can be selected for deletion. The time to deletion must be dependent on loading contribution in or- der to avoid many ingress routers performing deletions at once. The computation must favor deletions from path sets with small traffic contributions. This favors eliminating paths with marginal ben- efit. The time to deletion is computed as follows. The constant PATH_DEL_PERSIST seconds is multiplied by one plus the ratio of total absolute (bytes/second) traffic contributed by the path set to the sum of absolute excess bandwidth on the path set. The best candidate for deletion is the path which has the highest IGP cost. If a subset of paths containing more than one path have ex- actly equal IGP cost and are the highest, then the path among this subset with the lowest traffic share is chosen. Only one path should be deleted at a time. The total contribution of traffic to the path set, as measured by the absolute LSP traffic levels, must not exceed PATH_DEL_EXCESS times the excess bandwidth on the remaining paths. When a path is selected as the best path to delete, this criteria must be checked. If this path cannot be deleted and there is no other path of equal IGP cost, then no deletion should take place. Deleting a path of low IGP cost and lower bandwidth might still be possible but such a path should not be deleted, since the lower IGP cost path that would be deleted is considered better than some of the remaining paths. Once a path is selected for deletion, its traffic share should be divided among the remaining paths in proportion to the absolute band- width available on each remaining path. The LSP can then be deleted, after traffic is moved and the path is known to be idle. After a path is deleted, the timer entry for that path should be set to NEVER. This will insure that path deletions on this path set are spaced out in time. The algorithms for deleting paths are desribed in Appendix B.5. 7 Handling Link State Changes Please note that simulations of this portion of the algorithm have been completed but the behavior of this portion of the algorithm needs to be studied through further simulations. Some details may change. Some initial paths may be placed prior to complete convergence of the link state protocol. In addition, when a failed link is restored, there may not be an overload condition to force traffic back onto the restored link. If this were not corrected the network could settle into a state where path layout and possibly also loading was subopti- Villamizar Expires August 25, 1999 [Page 7] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 mal. The adjustments that occur due to a link failure will occur immedi- ately or quickly. In some cases, all paths from a given ingress to a given egress will be lost. In this case, if connectivity is still feasible, the best remaining path in terms of cost is used, as de- scribed in Section 4. If paths remain, load is shifted to these paths as if the links had been deleted as described in Section 6. The result of a link loss on a well utilized topology will generally be a period of suboptimal loading and congestion. As this condition persists, alternate paths will be added as described in Section 5. The most heavily congested links will be offloaded first, with the ingress contributing the largest share of traffic reacting first. The less loaded links and lesser contributors will react if congestion continues. In some cases, paths will be added faster than the balanc- ing makes use of some of the new paths. As the balancing progresses, path sets which have created excess links will begin to prune the less beneficial links as described in Section 6. When a link is restored to service, the SPF will yield multiple better paths to some egress than paths currently in use, which may also be less loaded than paths in use. If the topology is not overloaded, these paths may not be immediately considered. Two conditions have been described which may cause better path to not be selected and may not be forced into consideration later by loading conditions. The first is where an initial path is selected prior to link state database convergence. The second is when a link is re- stored to service. This condition should not be allowed to persist indefinitely. When a SPF is required due to a link transition to a full adjacency, those path sets for which the new best path is better than some paths in use and is not overloaded are marked. Where the existing set of paths is heavily loaded the new best paths will be used very quickly. For the remainder, the better paths will be added to path sets based on an event timer. This algorithm is described in detail in Ap- pendix B.7. 8 Load Balancing There is no load balancing done on a path set with only one path. When there is more than one path, fine adjustments can be made in the amount of traffic on each path by adjusting boundary values in a source/destination hash number space associated with each path. This technique is described in detail in [12]. Villamizar Expires August 25, 1999 [Page 8] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 The terms ``traffic share'' and ``move increment'' are defined in [12]. The loading on a particular path is determined by its traffic share. The rate of increase in loading, when offloading traffic from other paths, is determined by the move increment. Both of these vari- ables are in units of hash table space. For example, if a 16 bit hash is used, these variables are in the range 0---65535. A path set is constructed according to the rules described in Sec- tion 4 and Section 5. When a new path is added to an existing path set, the initial traffic share is set to a small value as described in Section 5. 9 MPLS--OMP Forwarding The load balance itself utilizes the ``next hop structures'' described in [12]. The next hop structures themselves will differ slightly in that an ingress to an MPLS domain will be adding a label to the label stack in addition to making a forwarding decision. The algorithm for determining when to adjust load and for adjusting load is exactly as described in [12]. MPLS--OMP forwarding is described in detail in Appendix B.8. Acknowledgments Simulations were performed using hypotheitc cases and using UUNET's topology and loading. Simulations involving UUNET's network would not have been possible without data gathered by Dave Cleary, Yong Xue, and Yong Zhue of UUNET. References [1] Ross Callon, Andy Malis, Joel Halpern, Juha Heinanen, N. Feld- man, Eric Gray, Kenneth Sundell, P. Doolan, Loa Andersson, A. Fredette, Pasi Vaananen, Tom Worster, Bilel Jamoussi, Liwen Wu, Muckai Girish, Timothy Kilty, and Ram Dantu. Constraint- based lsp setup using ldp. Internet Draft (Work in Progress) draft-ietf-mpls-cr-ldp-00, Internet Engineering Task Force, 1 1999. ftp://ftp.isi.edu/internet-drafts/draft-ietf-mpls-cr-ldp- 00.txt. [2] Ross Callon, George Swallow, N. Feldman, A. Viswanathan, P. Doolan, and A. Fredette. A framework for multiprotocol la- bel switching. Internet Draft (Work in Progress) draft-ietf- mpls-framework-02, Internet Engineering Task Force, 11 1997. Villamizar Expires August 25, 1999 [Page 9] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 ftp://ftp.isi.edu/internet-drafts/draft-ietf-mpls-framework- 02.txt. [3] Ross Callon, A. Viswanathan, and E. Rosen. Multiprotocol label switching architecture. Internet Draft (Work in Progress) draft- ietf-mpls-arch-04, Internet Engineering Task Force, 2 1999. ftp://ftp.isi.edu/internet-drafts/draft-ietf-mpls-arch-04.txt. [4] R.W. Callon. Use of osi is-is for routing in tcp/ip and dual en- vironments. Technical Report RFC 1195, Internet Engineering Task Force, 1990. ftp://ftp.isi.edu/in-notes/rfc1195.txt. [5] Bruce Davie, Tony Li, Y Rekhter, and E. Rosen. Explicit route support in mpls. Internet Draft (Work in Progress) draft-davie- mpls-explicit-routes-00, Internet Engineering Task Force, 11 1997. ftp://ftp.isi.edu/internet-drafts/draft-davie-mpls- explicit-routes-00.txt. [6] Bruce Davie, Y Rekhter, A. Viswanathan, S. Blake, Vijay Srini- vasan, and E. Rosen. Use of label switching with rsvp. Inter- net Draft (Work in Progress) draft-ietf-mpls-rsvp-00, Internet Engineering Task Force, 3 1998. ftp://ftp.isi.edu/internet- drafts/draft-ietf-mpls-rsvp-00.txt. [7] D. Farinacci, Tony Li, A. Conta, Y Rekhter, Dan Tappan, E. Rosen, and G. Fedorkow. Mpls label stack encoding. Internet Draft (Work in Progress) draft-ietf-mpls-label-encaps-03, Inter- net Engineering Task Force, 9 1998. ftp://ftp.isi.edu/internet- drafts/draft-ietf-mpls-label-encaps-03.txt. [8] ISO/IEC. Iso/iec 10589 - information processing systems - telecommunications and information exchange between systems - intermediate system to intermediate system intra-domain routing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode network service (iso 8473). Technical report, International Organization for Standardization, 1992. ftp://merit.edu/pub/iso/iso10589.ps.gz. [9] Tony Li and Curtis Villamizar. Is-is optimized multi- path (isis-omp). Internet Draft (Work in Progress) draft- ietf-isis-omp-00, Internet Engineering Task Force, 1 1999. ftp://ftp.isi.edu/internet-drafts/draft-ietf-isis-omp-00.txt. [10] J. Moy. Ospf version 2. Technical Report RFC 2328, In- ternet Engineering Task Force, 1998. ftp://ftp.isi.edu/in- notes/rfc2328.txt. [11] Bob Thomas, N. Feldman, P. Doolan, Loa Andersson, and A. Fre- dette. Ldp specification. Internet Draft (Work in Progress) draft-ietf-mpls-ldp-03, Internet Engineering Task Force, 2 1999. ftp://ftp.isi.edu/internet-drafts/draft-ietf-mpls-ldp-03.txt. [12] Curtis Villamizar. Ospf optimized multipath (ospf-omp). Inter- net Draft (Work in Progress) draft-ietf-ospf-omp-01, Internet Engineering Task Force, 10 1998. ftp://ftp.isi.edu/internet- drafts/draft-ietf-ospf-omp-01.txt. Villamizar Expires August 25, 1999 [Page 10] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Security Considerations MPLS--OMP is a method of setting up MPLS LSPs and balancing load across MPLS LSPs. MPLS--OMP, like MPLS itself, depends on the in- tegrity of information obtained from the IGP. MPLS--OMP requires a link state protocol IGP. MPLS--OMP relies upon loading information provided by the IGP in addition to the underlying routing function- ality that is minimally required for any use of MPLS. If the IGP is compromised such that integrity of the information provided by the IGP is no longer maintained, substantial denial of service is possible. Use of MPLS--OMP does not add exposure to the IGP, nor does it provide any means to leverage attacks on the IGP beyond the existing potential for denial of service. MPLS--OMP relies upon the ability of LDP or MPLS RSVP extensions to set up explicit LSPs. As a technique which uses these capabilities with no signaling mechanisms of its own, MPLS--OMP adds no further ex- posure to these signaling mechanisms. If sufficiently misconfigured, MPLS--OMP could add significant load to MPLS LSRs if misconfiguration is sufficient that paths are repeatedly created and deleted. Author's Address Curtis Villamizar UUNET Network Architecture GroupFull Copyright Statement Copyright (C) The Internet Society (February 25, 1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are in- cluded on all such copies and derivative works. However, this doc- ument itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other In- ternet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. Villamizar Expires August 25, 1999 [Page 11] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an ``AS IS'' basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEER- ING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUD- ING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER- CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. A Configuration Options MPLS--OMP configurable parameters do not control the rate of traffic adjustment or the rate of change in traffic adjustment. These rates should be kept as defined in OSPF--OMP [12] in order to insure that stability is not impacted. The MPLS--OMP configurable parameters control the conditions for cre- ating new paths, the initial state of new paths, and the conditions for deleting paths. Parameters: Conditions for Creating New Paths Parameter Name Units Range Default MIN_NEW_PATH unitless 0.50-1.00 0.45 MAX_NEW_PATH unitless see below 1.10 NEW_PATH_INCR unitless 0.01-0.25 0.05 PATH_COMP_OFFS unitless 0.01-1.00 0.25 LOAD_COMP_OFFS unitless 0.01-1.00 0.25 OVERLOAD_PERSIST seconds 60-3600 60 MAX_NEW_PATH >= MIN_NEW_PATH + NEW_PATH_INCR MAX_NEW_PATH <= 10.0 Parameters: Conditions for Deleting Paths Parameter Name Units Range Default PATH_DEL_SHARE hash space 0.00-1.00 0.30 PATH_DEL_PERSIST seconds 60-86400 1200 PATH_DEL_EXCESS unitless 0.00-1.00 0.50 PATH_DEL_SHARE must be significantly less than MIN_NEW_PATH to insure sufficient hysteresis. Villamizar Expires August 25, 1999 [Page 12] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Parameters: Timer based processing Parameter Name Units Range Default PATH_CHECK seconds 15-300 60 PATH_ADD_WAIT seconds 15-3600 240 PATH_DEL_WAIT seconds 15-3600 240 SPF_PERSIST seconds 60-3600 3600 SPF_RECHECK seconds 60-86400 900 It is possible to misconfigure parameters sufficiently to induce pathologic behavior such as frequent LSP setup and tear down. Care should be taken when deviating from default values. B Concise Statement of Algorithms MPLS--OMP uses links state information and link loading information flooded via a link state IGP. MPLS--OMP uses this information to de- termine where to place MPLS LSPs and to adjust load where multiple LSPs are established per traffic flow. The traffic flows are assumed to be aggregates of very large numbers of end-to-end flows such that dividing traffic using a hash over the IP source and IP destination provides a means of distributing load. The MPLS--OMP algorithms are stated here in pseudocode with limited accompanying text. All upper case variables are generally constants or parameters defined in Appendix A. The pseudocode syntax is not formally defined but is sufficiently similar to the ``C'' or perl programming languages that it should be understandable. B.1 Flooding Loading Information The flooding of loading information is the responsibility of the link state IGP. The data formats and flooding rate is provided in the OSPF- -OMP and ISIS--OMP documents [12, 9]. The MPLS routers are all as- sumed to participate in the IGP routing. The ingress router MPLS implementation and IGP implementation must be sufficiently coupled such that links state changes and loading information is passed from the IGP to MPLS. Villamizar Expires August 25, 1999 [Page 13] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 B.2 Link, Path, and Path Set Equivalent Load Link loading and link loss is determined from information flooded by OSPF or IS--IS. The variables ``Utilization'' and ``Loss'' below are the utilization and loss values flooded by the IGP. The utilization and loss directly yield a ``link equivalent load''. The maximum uti- lization of any link on a path yields the ``path equivalent load''. The minimum utilization of a set of paths associated with reaching a specific egress are a ``path set equivalent load''. EventHandler NewLoadingInfo(Link): { if (Loss{Link} < 0.005) { LinkEquivLoad{Link} = Utilization{Link}; } else { if (Loss{Link} <= 0.09) { LossComp = 10 * sqrt(Loss{Link}); } else { LossComp = 3; } LinkEquivLoad{Link} = Utilization{Link} * LossComp; } } Path equivalent load is the maximum link equivalent load on a given path. Function ComputePathLoad(Path) { PathEquivLoad = 0; foreach Link ( Links{Path} ) { Load = LinkEquivLoad{Link} if (Load > PathEquivLoad) PathEquivLoad = Load; } PathEquivLoad{Path} = PathEquivLoad; } The path set equivalent load is the minimum path equivalent load of a set of LSPs between a given ingress and a given egress. Some later computations also require the maximum loading of any path on a path set. Both can be computed at the same time. Villamizar Expires August 25, 1999 [Page 14] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Function ComputePathSetLoad(PathSet) { undefine PathSetEquivLoad; PathSetMaxLoad = 0; foreach Path ( Paths{PathSet} ) { ComputePathLoad(Path); Load = PathEquivLoad{Path}; if (!defined(PathSetEquivLoad)) { PathSetEquivLoad = Load; } else if (Load < PathSetEquivLoad) { PathSetEquivLoad = Load; } if (Load > PathSetMaxLoad) PathSetMaxLoad = Load; } PathSetEquivLoad{PathSet} = PathSetEquivLoad; PathSetMaxLoad{PathSet} = PathSetMaxLoad; } The pseudocode fragments above and elsewhere in this section do not implement any form of data structure threading to allow all parame- ters dependent on a given link to be updated when the link utilization changes. This sort of data structure threading to reflect depen- dencies among computed parameters can save considerable time when processing the periodic timer events. This type of implementation op- timization is not expressed in the pseudocode but may be nessecary to produce an implementation that scales well with respect to the number of path set, paths, and links. B.3 Initial Path and Initial Loading At some point a path that had been unreachable becomes reachable. When this occurs, an initial path is set according to the results of the SPF calculation. The following pseudocode describes the initial- ization. EventHandler NewlyReachable(Egress): { new Paths = SPF(Egress); new PathSet; Egress{PathSet} = Egress; undefined Paths{PathSet}; insert PathSets PathSet; foreach Path ( Paths ) { insert Paths{PathSet} Path; PathSetBelowMax{PathSet} = NEVER; for (Util = MIN_NEW_PATH + NEW_PATH_INCR; Villamizar Expires August 25, 1999 [Page 15] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Util += NEW_PATH_INCR; Util <= MAX_NEW_PATH) { PathSetArray{PathSet,Util} = NEVER; } } ComputePathSetLoad(PathSet); LastRecheckSPF{PathSet} = NOW; } This initialization may also occur when a link failure makes all of the paths to a given egress currently in use unfeasible. In this case, the old path set corresponding to this egress is deleted. A new path must be selected and the egress is treated as if it had just become reachable. B.4 Creating Alternate Paths Deciding whether to create an alternate path and then creating one can be broken down into a number of smaller procedures. Every PATH_CHECK seconds the set of arrays used to determine whether to add or delete a path are checked. Function AdjustPathArrays(PathSet) { Load = PathSetEquivLoad{PathSet}; for (Util = MIN_NEW_PATH + NEW_PATH_INCR; Util += NEW_PATH_INCR; Util <= MAX_NEW_PATH) { if (Util > Load) { if (PathSetArray{PathSet,Util} == NEVER) break; else PathSetArray{PathSet,Util} = NEVER; } else if (PathSetArray{PathSet,Util} == NEVER) { PathSetArray{PathSet,Util} = NOW; } } } The function below is used in pseudocode later in this section. When loading is sufficient to require checking the path's contributions to the loadings on the available paths, the following computation is made. Villamizar Expires August 25, 1999 [Page 16] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Function GetPathContribution(PathSet): { TotalTraffic = 0; TotalCapacity = 0; foreach Path ( Paths{PathSet} ) { TotalTraffic += LoadOnLSP{Path}; undefine Capacity; foreach Link ( Links{Path} ) { if (!defined(Capacity)) Capacity = LinkCapacity{Link}; else if (LinkCapacity{Link} < Capacity) Capacity = LinkCapacity{Link}; } TotalCapacity += Capacity; } return TotalTraffic / TotalCapacity; } Every PATH_CHECK seconds a check is made to determine whether to try to create a new path. An elapsed time is computed. The elapsed time is the difference between the current time and the time stored in the array entry. A load compensation factor is computed. It is the ratio of two differences, the difference between the loading correspond- ing to the array entry and MIN_NEW_PATH and the difference between MAX_NEW_PATH and MIN_NEW_PATH. A traffic contribution factor is com- puted. This is ratio of the current total path set traffic to the sum of the capacities of each path. The capacity of an individual path is taken to be the capacity of the slowest link on that path. The OVER- LOAD_PERSIST constant is compared to the product of the elapsed time, the load compensation factor, and the traffic contribution factor. If the product is the greater of the two, the condition is true for this array entry and an atempt is made to create a new path. Function CheckForNewPath(PathSet) { undefine PathContribution; for (Util = MIN_NEW_PATH + NEW_PATH_INCR; Util += NEW_PATH_INCR; Util <= MAX_NEW_PATH) { Then = PathSetArray{PathSet,Util}; if (Then == NEVER) return 1; Elapsed = NOW - Then; LoadComp = LOAD_COMP_OFFS + ((MAX_NEW_PATH - Util) / MAX_NEW_PATH); if (!defined(PathContribution)) PathContribution = PATH_COMP_OFFS + GetPathContribution(PathSet); if (Elapsed * LoadComp * PathContribution Villamizar Expires August 25, 1999 [Page 17] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 > OVERLOAD_PERSIST) { CreateNewPath(PathSet); break; } } } When the decision has been made to attempt to create a new path, the first step is to search for a candidate path. If a new path is available it is initialized. The set of arrays is incremented by PATH_ADD_WAIT whether a new path is found or not in order to defer repeating the check for this amount of time. Function CreateNewPath(PathSet) { NewPaths = FindAlternatePath(PathSet); if (defined(NewPaths)) { BestNewPath = GetBestNewPath(NewPaths); if (defined(BestNewPath)) InitilizeNewPath(PathSet,BestNewPath); } AddWaitToPathSet(PathSet); } Function GetBestNewPath(NewPaths) { MostCapacity = 0; undefine BestNewPath foreach NewPath ( NewPaths ) ComputePathLoad(NewPath); undefine Capacity; foreach Link ( Links{Path} ) { ExcessCapacity = LinkCapacity{Link} * (1 - PathEquivLoad{NewPath}); if (!defined(Capacity)) Capacity = ExcessCapacity; else if (ExcessCapacity < Capacity) Capacity = ExcessCapacity; } if (MostCapacity < Capacity) { MostCapacity = Capacity; BestNewPath = NewPath; } } return BestNewPath; } Function AddWaitToPathSet(PathSet) { for (Util = MIN_NEW_PATH + NEW_PATH_INCR; Villamizar Expires August 25, 1999 [Page 18] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Util += NEW_PATH_INCR; Util <= MAX_NEW_PATH) { if (PathSetArray{PathSet,Util} == NEVER) break; else PathSetArray{PathSet,Util} += PATH_ADD_WAIT; if (PathSetArray{PathSet,Util} >= NOW) PathSetArray{PathSet,Util} = NEVER; } } Finding an alternate path involves running an SPF calculation with some of the links disabled. Note that the SPF can return more than one equal cost path. There can also be enough links disabled that the SPF returns no path at all. Function FindAlternatePath(PathSet) { Load = PathSetEquivLoad{PathSet}; foreach Link ( LinkStateDataBase ) { if (LinkEquivLoad{Link} >= Load) TemporarilyDisableLink(Link); } NewPaths = SPF(Egress{PathSet}); EnableAllLinks(); return NewPaths; } The best path criteria should not be relaxed when performing the SPF (relaxing the best path criteria is defined in [12]). It is also very useful to compute some form of digest over the SPF conditions and cache the completed SPF results, indexed using the digest. A limited set of SPF calculations computed under a set of gradually changing conditions will be required periodically. Caching SPF results can very significantly reduce the amount of work required. Once an alternate path has been found, the traffic share and the move increment are initialized. The new path is added to the path set. Function InitilizeNewPath(PathSet,NewPath): { TargetShare = 0; TargetIncr = INITIAL_MOVE_INCREMENT; insert Paths{PathSet} NewPath Villamizar Expires August 25, 1999 [Page 19] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 TrafficShare{PathSet,NewPath} = TargetShare; MoveIncrement{PathSet,NewPath} = TargetIncr; } B.5 Deleting Alternate Paths Every PATH_CHECK seconds each path set is checked to see if the set of paths contain sufficient excess bandwidth to drop one of the paths. A simple single threshold is used but the ratio of contributed traf- fic to excess capacity is used to determine the time delay before deletion. This should avoid synchronized actions taken by multiple ingress routers. Function CheckPathExcess(PathSet) { MaxLoad = PathSetMaxLoad{PathSet}; if (MaxLoad >= PATH_DEL_SHARE) { PathSetBelowMax{PathSet} = NEVER; return; } if (PathSetBelowMax{PathSet} == NEVER) { PathSetBelowMax{PathSet} = NOW; return; } Elapsed = NOW - PathSetBelowMax{PathSet}; Traffic = 0; ExcessCapacity = 0; foreach Path ( Paths{PathSet} ) { Traffic += LoadOnLSP{Path}; Excess = ExcessCapacity(Path); if (Excess > 0) ExcessCapacity += Excess; } TimeToDeletion = PATH_DEL_PERSIST * (1 + (Traffic / ExcessCapacity)); if (Elapsed >= TimeToDeletion) { DeletePath(PathSet); PathSetBelowMax{PathSet} += PATH_DEL_WAIT; } } Once the decision is made to drop a path, a path must be selected to drop. The load must be moved to the remaining paths and storage and and other resources recovered. Villamizar Expires August 25, 1999 [Page 20] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Function DeletePath(PathSet): { HighCost = 0; foreach Path ( Paths{PathSet} ) { Cost = PathLinkCosts(Path); if ((Cost > HighCost) || (Cost == HighCost && TrafficShare{PathSet,Path} < LowShare) { PathToDelete = Path; HighCost = Cost; LowShare = TrafficShare{PathSet,Path}; } } foreach Path ( Paths{PathSet} ) { if (Path != PathToDelete) TotalCapacity += Capacity{Path}; } if (Capacity{PathToDelete} > (TotalCapacity * $PATH_DEL_EXCESS)) return; RemovePathFromPathSet(PathSet,PathToDelete); } Function RemovePathFromPathSet(PathSet,PathToDelete) { while (TrafficShare{PathSet,PathToDelete} != 0) { TargetShare = TrafficShare{PathSet,PathToDelete}; foreach Path ( Paths{PathSet} ) { if (Path == PathToDelete) continue; Shares = TargetShare * Capacity{Path} / TotalCapacity; if (Shares < 1) Shares = 1; if (TrafficShare{PathSet,PathToDelete} < Shares) Shares = TrafficShare{PathSet,PathToDelete}; TrafficShare{PathSet,PathToDelete} -= Shares; TrafficShare{PathSet,Path} += Shares; if (TrafficShare{PathSet,PathToDelete} == 0) break; } } delete Paths{PathSet} PathToDelete; RecoverStorage(PathToDelete); } B.6 Load Balancing There are no differences in the algorithm to balance a set of paths in MPLS--OMP from the methods in OSPF--OMP. The path set in MPLS--OMP Villamizar Expires August 25, 1999 [Page 21] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 maps directly into a next hop structure as described in the OSPF--OMP document [12]. Behavior differs slightly in that in MPLS--OMP the decision made at the ingress determines the entire path of the traffic while in OSPF--OMP the decision at the ingress is affected by deci- sions at every downstream router. Regardless the algorithm to adjust the traffic share is identical. B.7 Handling Link State Change Link state changes may make some paths infeasible and make make new paths available. An SPF calculation may involve one or many links state changes. When an SPF calculation is required, first the set of paths is checked to see if any of those paths have become infeasible. These paths are deleted. This may also render some egress temporarily unreachable. If so, the path set structures are deleted as well. After processing the set of paths that must be deleted, the set of egress that are now unreachable are checked. If there are feasible paths, these paths are created. The remaining egress are checked to see if paths now exist that are less loaded than the path set and have a lower cost than some of the paths in use. If so, these path sets are marked with a time stamp. Function RemovePathFromPathSet(PathSet,BadPath) { RemovePathFromPathSet(PathSet,BadPath); if (NumberPathsInPathSet(PathSet) == 0) { Egress = Egress{PathSet}; delete PathSets PathSet; RecoverStorage(PathSet); if (defined(SPF(Egress))) { NewlyReachable(Egress); } } } The loading on some paths may be reduced by balancing done for other egress or balancing done by other routers. For this reason, the SPF must be rechecked periodically to see if better paths than the ones being used exist. This check is performed every SPF_RECHECK seconds. Path sets are marked as they would be if an SPF was just completed. Villamizar Expires August 25, 1999 [Page 22] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 Function RecheckSPF { foreach PathSet ( PathSets ) { Elapsed = NOW - LastRecheckSPF{PathSet}; RelativeLoad = GetPathContribution(PathSet); if (Elapsed < SPF_PERSIST * (1 + RelativeLoad)) continue; LastRecheckSPF{PathSet} = NOW; Paths = FindAlternatePath(PathSet); BestNewPath = GetBestNewPath(NewPaths); if (!defined(BestNewPath)) continue; NewCost = PathLinkCosts(BestNewPath); foreach Path ( PathSet ) { if (PathLinkCosts(Path) > NewCost) { InitilizeNewPath(PathSet,BestNewPath); AddWaitToPathSet(PathSet); return; } } } } The constant SPF_PERSIST is used with a compensation factor for load and traffic contribution similar to the compensations made in Ap- pendix B.4. If the time criteria is met, then the path is added. The SPF must be rechecked to see if other paths are available which are better than some of the paths in use. If a few such paths are added the path set can end up with some excess capacity and the worst paths can be dropped as described in Appendix B.5. B.8 MPLS--OMP Forwarding MPLS--OMP requires that the ingress router perform a hash on the IP source and IP destination in order to to spread traffic over a set of paths. This is a simple operation, but access to the IP header is not something that can always be taken for granted. In the case where an MPLS domain receives only IP encapsulated traffic from neighboring domains, the IP header is accessible. LSRs interior to the MPLS domain have no need to look at the IP header. The case where the IP header is not immediately accessible is where traffic is exchanged across MPLS domains that is already MPLS en- capsulated. The purpose of supporting a label stack is to allow the LSP for multiple MPLS domains to be indicated at the ingress of the first MPLS domain. Supplying an MPLS label to an upstream MPLS domain that only applies to a continuously changing specific set of source Villamizar Expires August 25, 1999 [Page 23] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 and destination hash values is not likely to be feasible. If so, the top label must be assigned at the MPLS domain ingress. Labels fur- ther down the stack might be present and relevant to downstream MPLS domains. In such a scenario it may make sense to use the label pro- vided to the ingress to indicate traffic classification only, with the egress and hash determined by finding the IP header and looking at both IP source and IP destination. This isolates the upstream MPLS domain from any change to internal routing. The use of MPLS across MPLS domains as a means of passing traffic classification is beyond the scope of this document. The inability to quickly find the IP header in an MPLS stack of unknown depth is a cause for concern. This may be an issue that has to be addressed in the MPLS encapsulation. Alternately label numbers provided to the upstream MPLS domain may be assigned such that the label number can reliably indicate the label stack depth of incoming traffic. Given that the IP header is at the top of the stack or can be located quickly enough, a hash is performed on the IP source and destination. The hash number space is subdivided on arbitrary boundaries among the set of possible LSPs. This is indicated in the figure below. +-------+-------+-------+-------+ | IP Source Address | +-------+-------+-------+-------+ | IP Destination Address | +-------+-------+-------+-------+ | V hash function | V +-------+-------+ | hash value | +-------+-------+ | V +-------+-------+-------+-------+ | hash boundary | MPLS LSP #1 | +-------+-------+-------+-------+ | hash boundary | MPLS LSP #2 | +-------+-------+-------+-------+ ... +-------+-------+-------+-------+ | hash boundary | MPLS LSP #N | +-------+-------+-------+-------+ | V selection Villamizar Expires August 25, 1999 [Page 24] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 | V MPLS label Conceptually the selection process involves walking through the set of hash boundary LSP pairs. If the hash value exceeds the boundary value, the next LSP is tried. If the last LSP in the data structure is reached, it is used regardless of the boundary. The selection would yield a MPLS label to use which would in turn yield all of the nessecary forwarding information. The actual algorithm used is unlikely to be the exact algorithm de- scribed here since a linear walk through a set of data structures can be inefficient if the number of entries is large. Instead an algo- rithm would be used that would have the same net effect but accomplish its task more efficiently when a large number of LSPs were in use. C Comparison of MPLS--OMP and OSPF-OMP Dynamics ISIS--OMP and OSPF--OMP differ in fairly insignificant ways. The behavior of MPLS--OMP and OSPF--OMP differ in more significant ways. In OSPF--OMP the paths are a direct result of the SPF calculation. As a result, in OSPF--OMP there are limitations to the set of paths on which traffic can be split for a given ingress and egress that can result in an imperfect load balance. This limitation can be overcome by carefully selecting the link costs. The most serious difficulty arises when links fail and the OSPF costs are suboptimal for the re- maining topology. MPLS--OMP differs from OSPF--OMP in the way in which the candidate set of paths are created. MPLS--OMP is capable of optimal load balancing in situations in which OSPF--OMP does not create a perfect balance across critical cutsets in the network topology. The immediate ben- efit is relief from the tedious task (one of N--P complete time com- plexity) of selecting link costs. A second benefit is an improvement in the behavior in the presence of link failure. D Simulations of MPLS--OMP Behavior Simulations using MPLS-OMP have been in progress. At the time of this writing all functionality defined in this document has been included in the simulator. The set of simulations under which the simula- tor has been tested have so far been focused on the addition of MPLS Villamizar Expires August 25, 1999 [Page 25] INTERNET-DRAFT MPLS Optimized Multipath (MPLS--OMP) February 25, 1999 LSP and subsequent balancing using some hypothetical examples and the topology and traffic data measured from the UUNET backbone as it existed in October 1998. The set of conditions under which the simulator has been tested are not yet as comprehensive as the set of conditions under which OSPF-OMP has been tested. In particular, dele- tion of old paths as load is reduced and restoration of traffic flow to lower cost paths after link failure and link restoration have not been adequately tested. These simulations will be made available at http://engr.ans.net/mpls- omp/. Villamizar Expires August 25, 1999 [Page 26]