Internet Draft Network Working Group Eric C. Rosen, Cisco Systems, Inc. Internet Draft Andre Fredette, Bay Networks, Inc. Expiration Date: May 1998 Tony Li, Juniper Networks, Inc. Keith McCloghrie, Cisco Systems, Inc. Milan Merhar, Lucent Technologies November 1997 Comparison of MPLS LAN Encapsulation Proposals draft-rosen-mpls-lan-encaps-compar-00.txt Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract [1] describes how to encode an MPLS label stack as a ''shim'' between the data link and network layer headers of a labeled frame, but [1] does not require that this encoding be used to encode the top of the label stack on LAN media. This document examines the alternative encapsulations that have been proposed for LANs. One alternative is to use the shim as the MPLS encapsulation on LAN interfaces [2]. Another alternative is to encode the top label in the MAC header, rather than in the shim [3]. We describe the implications of each approach. Rosen, et al. [Page 1] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 Table of Contents 1 Introduction ....................................... 2 2 Frame Size and Fragmentation ....................... 3 3 Time to Live ....................................... 4 4 Interactions with Installed Equipment .............. 4 4.1 MAC Address Filtering .............................. 4 4.2 Effect on 'Source Address Learning' in LAN Bridges . 6 4.3 Size of Bridge Forwarding Tables ................... 8 4.4 Environments with Mixed Bridging/Routing ........... 8 4.5 Uniqueness of Labels ............................... 10 4.6 Protocol Layering .................................. 10 5 Hardware Implementation ............................ 11 6 Leveraging the ATM Encapsulation ................... 11 7 Summary ............................................ 12 8 Authors' Addresses ................................. 13 9 Bibliography ....................................... 14 1. Introduction In [1], there is a proposal for encoding an MPLS label stack as a "shim" between the data link layer header and the network layer header. This is sometimes referred to as the "generic" encapsulation of MPLS messages, since it is independent of the underlying data link. It has been proposed to use the generic encapsulation on PPP interfaces [1] and on LAN interfaces [2]. In both cases, the data link layer header would have a protocol codepoint which identifies the frame as containing an MPLS packet. In the proposal of [2], the data link layer and MAC layer would remain unaffected, with MPLS being, from the data link's point of view, just another higher layer. In this draft, we will refer to this proposal as the "MPLS-SHIM proposal", or just "MPLS-SHIM". In the proposal of [3], the top of the label stack is encoded directly into the MAC layer header. It is proposed to carry the top entry of the label stack in the field of the MAC layer header which is conventionally used to hold the MAC Destination Address. That is, the MAC Destination Address field would be redefined as follows: Rosen, et al. [Page 2] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 +--------------------+------------+---------+-----------+ | OUI Prefix (24) | Label (20) | CoS (3) | Stack (1) | +--------------------+------------+---------+-----------+ where "Label", "CoS", and "Stack" have the same meaning as defined in [1]. The 24-bit OUI prefix is used to indicate that this 48-bit field contains an MPLS label stack entry, rather than a real MAC address. In this draft,we will refer to this proposal as the "MPLS- MAC proposal", or just "MPLS-MAC". 2. Frame Size and Fragmentation An MPLS-MAC frame carrying the same information as an MPLS-SHIM frame is four bytes shorter. If a frame needs to carry one label, and the original, unlabeled frame is already at the maximum size (MTU) for the LAN, MPLS-SHIM will require the frame to be fragmented, while MPLS-MAC will not. If the frame needs to carry two or more labels, however, both schemes require fragmentation. Thus in either case, MPLS must have procedures for fragmenting labeled packets; these procedures can be found in [1]. The use of shorter packets on the LAN may reduce the number of packets which need to get fragmented. However, as is discussed in [1], packets would in fact never need to get fragmented unless they came from one of the relatively small number of end systems which (a) do not support Path MTU discovery, and (b) emit 1500-byte IP datagrams even when the source and destination are not on the same subnet (normally this implies that the source and destination have the same classful network number). Thus the difference between the amount of fragmentation caused by MPLS-SHIM and the amount caused by MPLS-MAC is quite small. Rosen, et al. [Page 3] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 3. Time to Live Unlike MPLS-SHIM, MPLS-MAC does not propose to carry an 8-bit TTL value in the top label stack entry. However, doing so is not ruled out by the use of the MAC header to carry the label stack entry. For example, if MPLS were assigned 256 OUI prefixes, the TTL could certainly be encoded therein. Whether this particular technique is practical, given IEEE fees and policies with respect to OUI assignment, is certainly arguable. The point though is that the presence or absence of TTL is not a fundamental difference between MPLS-SHIM and MPLS-MAC. 4. Interactions with Installed Equipment The ethernet and IEEE 802.3 data link protocols assume that the "address" fields in the frame headers contain MAC addresses. In MPLS-MAC, these fields carry a completely different kind of information, with completely different semantics. On a LAN in which MPLS-MAC is in use, there is not one data link protocol being used, but two: - The existing data link layer protocol (ethernet or 802.3), which continues to be used for unlabeled packets. - A new data link layer protocol (specified, e.g., in [3]) for carrying labeled packets. Existing LAN networks, existing LAN bridges and switches, existing LAN troubleshooting and administrative tools and procedures, have all been designed around the assumption that all frames on the LAN use certain data link layer protocols. To add a new data link layer protocol which assigns different semantics to the addressing fields is extremely likely to cause problems. We cannot hope to exhibit all such problems here, but we can certainly point out a few of them. 4.1. MAC Address Filtering Ordinarily, a LAN device (such as a host or a router) does not receive a frame unless the frame's MAC Destination Address field ("DA field", or just "DA") satisfies one of the following conditions: Rosen, et al. [Page 4] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 - contains the 48-bit MAC address of the device itself, or - contains the broadcast address or a multicast address. Bridges and switches, on the other hand, run in "promiscuous mode"; they receive all frames. In MPLS-MAC, if one needs to send a labeled packet to an LSR, one does not put the LSR's MAC address in the DA; rather, one puts in one of the labels assigned and distributed by the LSR. So how does the LSR receive the frame? In general, LAN NICs can be programmed with a small number of unicast MAC addresses (often only one, certainly less than a dozen) which they will receive. That is, the number of unicast addresses which can be programmed into a LAN NIC is MUCH smaller than the number of labels which an LSR can assign and distribute. Therefore, if one is using MPLS-MAC, one must operate every LSR LAN interface in promiscuous mode. Running in promiscuous mode can be quite costly, especially if the LAN is heavily loaded, as every frame must be examined. Generally one does not run a system in promiscuous mode unless it has been explicitly designed to run in that mode (e.g., is a bridge or a switch). It must be possible however to run MPLS in devices which attach to a LAN but which are not bridges or switches, such as routers. Using promiscuous mode for LSRs on LANs may have significant additional development costs on new equipment, and may not be practical on installed systems. What if the labels were encoded not as unicast addresses, but as multicast addresses? The same problem occurs, in that most LAN NICs cannot be programmed to receive ONLY multicasts whose DA fields contain one of a specified set of values. When these NICs are programmed to receive a particular multicast address, they receive all frames with that address in the DA, but they also receive frames with other multicast DAs, and software must be used to filter out the undesired frames. The more multicast addresses one programs into the NIC, the more undesired multicast addresses one will receive. Some NICs could even end up receiving all multicast frames. If the traffic on the LAN large consists largely of labeled frames, this is essentially no different than running in promiscuous mode, and may be prohibitively expensive. Rosen, et al. [Page 5] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 4.2. Effect on 'Source Address Learning' in LAN Bridges Suppose it is desired to send a labeled packet from one LSR to another, where both are on the same 802.1D extended LAN (bridged Ethernet), and where traffic from one to the other must traverse one or more bridges. The LSRs do not recognize the presence of the bridges; 802.1D defines transparent bridges. Since the DA contains a label instead of the MAC address of the target LSR, the bridges will flood such frames until such time as their forwarding databases have entries which are keyed by label values, where those label values are used as if they were MAC addresses. The (bridged) LAN would have a serious performance problem if it were necessary to flood all such frames. Bridges populate their forwarding databases by "learning". They learn which MAC addresses are reachable over which ports by looking at the MAC Source Address fields (SA fields, or just "SA") in frames which they see on the ethernets to which they are attached. This mapping between MAC addresses and ports is maintained in a forwarding table entry. These entries are aged out after some period (a configurable parameter with a default of 300 seconds) of non-use. Thus to enable MPLS-MAC to be used in a bridged LAN environment, it is necessary to send frames which carry labels in their SA fields, as well as in their DA fields. In order to keep the bridge forwarding tables properly populated, an LSR must transmit, for each label it is using, (or more accurately, for each label/TTL/Cos combination) at least one frame every aging period which has that label value in its SA field. Failure to do this would result in the flooding of labeled frames along the spanning tree, which would have a significant detrimental impact on LAN performance. There are a number of different ways to do this, but all are problematical: - When one sends the control message (an LDP message) which distributes a particular label, one can put that label in the SA field of the control message. However, this requires that a separate control message be sent for each label, and that each such message be refreshed every aging period. Given a large number of labels, this creates the need for high control overhead. It also requires that all the LSRs and bridges be configured with the same aging period. - It is sometimes suggested that some new protocol, such as GARP, be used for populating the bridge forwarding tables with labels. However, the existing installed base of bridges does not support GARP. Many existing bridges cannot be upgraded to support GARP. Rosen, et al. [Page 6] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 This would have a significant negative impact on the ability to deploy MPLS in existing bridged environments. - As an LSR sends ordinary data frames out a particular interface, it could cycle through the list of labels it has distributed out that interface, writing one such label into the SA of each frame. This does not create any extra overhead on the ethernet itself, but does create additional processing overhead for each transmitted frame (i.e., additional processing in the "fast path"). It also impacts certain administrative tools and procedures: * When one is having LAN ethernet problems, one frequently troubleshoots by using a sniffer to find the frames that are causing problems. Then one looks at the SA of the frames to find the system that is at fault, so that the system can be located, examined and repaired. If the SA is overwritten with a label, this sort of troubleshooting technique can no longer be used. * Administrators frequently filter the SA values at hub and switch ports, in order to ensure that only specific systems can access portions of the network. If the SA is overwritten with a label, this sort of filtering can no longer be done, and a valuable security tool is removed from the administrator's portfolio. * A variety of automated topology discovery tools depend on the SA of a frame containing the actual MAC address of the system which originated the frame. If the SA is overwritten with a label, these tools will no longer produce correct results. To summarize this section: - Since MPLS-SHIM does not alter the use of the SA and DA, it has no effect on the source address learning procedures, or upon tools which examine the SA. - MPLS-MAC must include some sort of "source address spoofing" procedure so that existing bridges do not flood all labeled packets down the spanning tree. If the SA is overwritten in data frames, there are issues of compatibility with existing tools. If the SA is overwritten in control frames, additional overhead is required. Rosen, et al. [Page 7] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 4.3. Size of Bridge Forwarding Tables In many large bridged LANs, the bridges operate with forwarding tables that are very nearly full to their maximum size, containing just the actual MAC addresses of systems that are on the bridged LAN. There simply may not be enough space in the bridge forwarding tables to accommodate labels as well as MAC addresses, especially if the number of labels in use in the LAN is large. If the table size overflows, packets with DA field values that did not make it into the table will get flooded along the spanning tree. That is, MPLS-MAC could result in many packets, labeled or unlabeled, getting flooded along the spanning tree. One could of course encode the labels as if they were "multicast addresses"; this would keep them out of bridge forwarding tables (unless the bridge has a VLAN implementation which keeps multicast addresses in the same forwarding table as unicast addresses). While this would prevent the table size from increasing, the cost would be that all labeled packets get flooded along the spanning tree. Any time frames need to get flooded along the spanning tree, there is a significant degradation in LAN performance, which affects both labeled and unlabeled frames. 4.4. Environments with Mixed Bridging/Routing Consider the following topology (which is part of a larger topology): RI1 RX1 | | -----------------------------LAN1 | | RI2 B RX2 | | | LAN2 ------------------------------- In this topology: - There are two LANs, LAN1 and LAN2. - B is a conventional bridge connecting them. Rosen, et al. [Page 8] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 - B is configured to filter (i.e., discard) IP packets, but to pass packets of other protocols. - LAN1 and LAN2 are both in the spanning tree. - All the R systems are LSRs running IP routing: * RI1, RX1, and RI2 are IP routing neighbors. * RI2 and RX2 are IP routing neighbors. - There is an LDP connection between every pair of IP routing neighbors. - RX1 and RX2 are LSRs running IPX routing in addition to IP routing. They are NOT IP routing neighbors, but they ARE IPX routing neighbors. - There is an LDP connection between every pair of IPX routing neighbors. This topology represents a fairly common situation in which IP is being routed between two LANs, but IPX (for example) is being bridged. Since RI1 and RX2, for example, are NOT in the same broadcast domain with respect to IP (and with respect to the Label Distribution Protocol, LDP, which sits on top of IP), there is nothing to stop them from assigning the same label. That is, there is no way they can coordinate their label assignments. Suppose L is a label which they both use. Then each will generate frames with L in the MPLS-MAC Source Address field. This means that bridge B will "learn" that L is located on each of the two LANs. Suppose that RX1 sends a frame with L in the MAC Destination Address field, intending the frame for RI1. There is no way to prevent B from relaying that frame to LAN2, where it will be received by RX2. This causes unintended packet duplication. RX2 will of course misinterpret the frame, but eventually that frame will reach a point where it gets unlabeled, and forwarded according to the IP address. If the packet makes it back to RX1 somehow, we have created a loop. (Though even in the absence of a loop, the packet duplication is bad enough.) One might think that this problem could be avoided by telling B not to learn from frames with labels in the SA. But remember that RX1 and RX2 have an LDP connection between them, and each will be distributing labels to the other, where the labels correspond to IPX Rosen, et al. [Page 9] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 routes. B MUST learn from frames with labels in the SA, or else the labeled IPX frames will not be passed from one LAN to the other. 4.5. Uniqueness of Labels In MPLS-SHIM, there is no need for the LSRs on a LAN to coordinate their use of labels; each label has only local significance. In MPLS-MAC, the set of labels that can be used on a LAN must be partitioned, and each LSR on the LAN must be assigned a distinct set of labels which it can use. The reason is that the labels will appear in the SA field, and bridge learning presupposes that the SA field contains a value which is unique throughout the LAN. The need to partition the set of labels this way imposes scaling limitations, either in the number of LSRs that can exist on the LAN, or the number of labels that each can use. As LSRs are added to or removed from the LAN, it will be necessary to change the way the labels are partitioned, and/or the way the labels are assigned to particular LSRs. This may result in periods of time during which labels are not unique throughout the LAN. This can have a negative effect on bridge learning, causing additional flooding of packets, as well as packet duplication and looping. In the section 4.4, we exhibited a realistic scenario in which MPLS- MAC can cause packet duplication and/or looping to occur, even though everything is properly configured and operating correctly. It is also worth considering the effects if the coordination procedures for partitioning labels were to fail. The result could be uncontrollable packet duplication and/or looping. Distributed coordination procedures like this have certainly been known to fail in practice. 4.6. Protocol Layering A fundamental design approach that has aided the specification and deployment of new networking protocols is the maintenance of protocol layer separation. When this design approach is followed, lower layers can be reused as is. This enables one to take advantage of the many years of testing and debugging of the lower layers, and it minimizes the amount of new work that must be done, and new alternatives that must be considered. MPLS-SHIM maintains this layered approach. MPLS-MAC overloads and changes the semantics of the data link layer, by stealing fields from the data link header and assigning them semantics which are different than the semantics which the data link layer assigns them. In Rosen, et al. [Page 10] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 section 4, we have given several examples of how this overloading can cause problems that may not have been foreseen. Even if it is possible to consider each such problem one at a time and develop a fix, the risk is high that some problems will go undiscovered until the protocol is deployed in some unforeseen way. 5. Hardware Implementation An important consideration is the ability to implement an LSR out of standard hardware. It is clear that MPLS-SHIM, if implemented in hardware, would require hardware that differs significantly from that in standard LAN switches and bridges. MPLS requires that the top label change each time a transit frame is switched. For MPLS-SHIM, the Destination Address must also match the address of the next LSR in the path, so at every hop at least the Destination Address and the top label would change. Similarly, MPLS-MAC, if implemented in hardware, would require hardware that replaced the Destination Address field every time a transit labelled packet were switched. Replacing the Destination Address is not a function of either existing 802.1D bridges or the newer 802.1p and 802.1Q bridges. MPLS-MAC also requires that the MAC Source Address field be overwritten as packets pass through (see section 4). This is another function that does not exist in current bridges/switches, and is not envisioned to exist in future bridges/switches. It does not appear that there is any way to use standard LAN switching/bridging hardware to provide MPLS functionality, regardless of the proposal adopted. 6. Leveraging the ATM Encapsulation The MPLS encoding for ATM is rather analogous to the MPLS-MAC procedures. In MPLS over ATM, the top of the stack is encoded in the VPI/VCI field of the AAL5 cell header, and processed by standard ATM switching hardware. Shouldn't it be possible to do the same thing for LANs? The functionality provided by MPLS is very similar to the functionality provided by the ATM "forwarding plane". Both sets of procedures are based on the lookup of a "label", and the replacement of the label by another label before forwarding. Both the MPLS label and the AAL5 VPI/VCI have the the semantics of a "connection" identifier. Thus it is very easy to map MPLS functionality onto ATM Rosen, et al. [Page 11] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 forwarding plane functionality, by mapping the MPLS label swap directly to ATM's VPI/VCI replacement. MPLS does not change the semantics of any of the fields used by the ATM forwarding plane. However, the semantics of a MAC address field is that of an "address" which identifies either a unique physical device, or a unique virtual device within one particular physical device. In either case, this is expected to be a globally unique mapping. This is very different than the semantics of an MPLS label, which is an identifier having only local significance. As we have shown, the forwarding functionality of LAN switches is very different than the forwarding functionality of MPLS, as no LAN address replacement operation exists which is equivalent to the VPI/VCI replacement in ATM. So it does not appear to be possible to map MPLS functionality directly into LAN switching functionality. 7. Summary - Frame Size MPLS-MAC generates frames which are four bytes shorter than those generated by MPLS-SHIM. - Time to Live Although MPLS-MAC, as proposed in [3], does not have a TTL field, such a field could added; the handling of TTL is not a fundamental difference between the proposals. - Installed Equipment * MPLS-MAC requires ALL LSRS on a LAN to run in promiscuous mode; MPLS-SHIM does not. Running in promiscuous mode may have a significant performance impact. * MPLS-MAC causes a large increase in the size of the forwarding tables in existing bridges/switches. MPLS-SHIM does not. If the maximum forwarding table size is reached, flooding down the spanning tree results, reducing the effective forwarding capacity for both MPLS and non-MPLS traffic. * MPLS-SHIM does not overwrite the SA. MPLS-MAC does. This has an effect on source address learning in existing bridges. The procedures introduced to compensate for this effect may impact the use of existing administrative tools, or may cause extra overhead. It also appears that such procedures will Rosen, et al. [Page 12] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 not produce correct results in LANs with mixed bridging/routing. * MPLS-SHIM maintains protocol layering, allowing the LAN data link protocol to be used without changes. MPLS-MAC overloads fields of the LAN data link protocol by assigning them new semantics, thereby introducing significant risk of additional unforeseen problems. - Hardware Implementation Neither MPLS-SHIM nor MPLS-MAC enable one to implement MPLS on standard (or soon-to-be-standard) bridging/switching hardware. 8. Authors' Addresses Eric C. Rosen Cisco Systems, Inc. 250 Apollo Drive Chelmsford, MA, 01824 E-mail: erosen@cisco.com Andre N. Fredette Bay Networks, Inc. 3 Federal St. Billerica, MA 01821 Phone: (978) 916-8524 email: fredette@baynetworks.com Tony Li Juniper Networks, Inc. 385 Ravendale Dr. Mountain View, CA 94043 Email: tli@juniper.net Voice: +1 650 526 8006 Fax: +1 650 526 8001 Keith McCloghrie Cisco Systems, Inc. 170 Tasman Drive San Jose, CA, 95134 E-mail: kzm@cisco.com Milan J. Merhar Lucent Technologies 300 Baker Ave. Concord, MA, 01742-2168 Rosen, et al. [Page 13] Internet Draft draft-rosen-mpls-lan-encaps-compar-01.txt November 1997 Voice: (978) 287-2841 Fax: (978) 287-2810 E-mail: milan@lucent.com 9. Bibliography [1] "MPLS Label Stack Encoding," draft-ietf-mpls-label-encaps-00.txt, Rosen, Rekhter, Tappan, Farinacci, Fedorkow, Li, Conta [2] "MPLS Label Stack Encoding on LAN Media", draft-rosen-mpls-lan- encaps-00.txt, Rosen, Rekhter, Tappan, Farinacci, Fedorkow, Li, Conta [3] "Labels for MPLS over LAN Media", draft-srinivasan-mpls-lans- label-00.txt, Bussiere, Esaki, Ghanwani, Matsuzawa, Pace, Srinivasan. Rosen, et al. [Page 14]