Internet Draft Network Working Group Dino Farinacci Internet Draft Yakov Rekhter Expires: April, 1999 cisco Systems November 1998 Multicast Label Binding and Distribution using PIM <draft-farinacci-multicast-tagsw-01.txt> Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document describes a method for advertising labels for multicast flows. It strives to use downstream label assignment to be consistent with unicast label distribution. This proposal is media- type independent. Therefore, it works for multi-access/multicast capable LANs, point-to-point links, and NBMA networks. 1.0 Overview We propose to use PIM and combine the (*,G) and (S,G) join state with label assignment and distribution. Labels and multicast routes will be sent together in one message. 1.1 Goals i. We are motivated to have the upstream Label Switch Router (LSR) Farinacci & Rekhter Multicast Tagging using PIM [Page 1] Internet Draft November 1998 use one label for multicast data delivery on a network so we can make use of data-link multicast delivery where available. ii. We are motivated to use downstream label assignment to achieve: o Simplicity and consistency with unicast label assignment. o A per interface Label Information Base (LIB) that guarantees unique label assignments on any interface. o Consistent algorithms for label assignment and distribution among different media types. o Both routing table state and the label binding information associated with the state are advertised together in a single control message thus reducing race conditions. o Avoid label reallocation or reassignment when there are RPF changes (i.e. the multicast distribution tree takes different shape). o To improve utilization of label space by randomizing label assignment among all downstream routers joining for a group. iii. Works with dense-mode or sparse-mode operation. 2.0 Proposal A LSR that supports multicast sends PIM Join messages on behalf of hosts that join groups. It sends Joins messages to upstream neighboring LSRs toward the RP for the shared-tree (*,G) or toward a source for a source-tree (S,G). If the LSR creates the state for the group, it will assign a label for the respective (*,G) or (S,G) state. It includes the label in the Join message associated with the multicast routing table entry. The entry is created in its LIB using the label as its incoming label component. The upstream LSR, when it receives the Join, will cache the new multicast routing table state along with the label. An entry is created in the LIB and the label is used as the outgoing component. This label will be used by the upstream LSR to forward multicast data packets. Since PIM Join messages are multicast on a LAN, other downstream LSRs, that are interested in the group, will hear the message and can cache the binding of multicast routing table state and label state together. Since the upstream LSR is going to forward data packets Farinacci & Rekhter Multicast Tagging using PIM [Page 2] Internet Draft November 1998 using the advertised label, they must be ready to accept the data packet with that advertised label. The first downstream LSR that joins for a group, is the label assigner (or called in other forums as the Label Allocation Server) on a LAN for a multicast route. All other downstream LSRs that send PIM Join messages will use the same label that the assigner selected. A LSR that sends a PIM Join message with a label of 0 means that it doesn't know the label for the associated multicast routing table entry. When this occurs, the assigner can trigger a PIM Join message making the label known. This algorithm works on point-to-point links because there is only one downstream LSR on the link which always becomes the label assigner. On NBMA networks, all PIM routers are known to each other through pseudo-broadcast mechanisms provided by the data-link layer. However, PIM Join messages are unicast to the upstream LSR. Therefore, other downstream LSRs will not hear the label assigner's advertisement. To overcome this issue, we have each downstream LSR become the label assigner on NBMA networks. Since the upstream LSR is going to pseudo-broadcast the data anyways it can supply a label for each packet that goes to each respective downstream LSR. 2.1 Corner cases Multiple downstream LSRs cannot assign the same label value for any multicast route because they partition the label space into non- overlapping ranges according to [4]. When a LSR is enabled on an interface, it obtains a unique label range for the LAN. When the label assigner leaves the group, the label that it assigned still remains active. The next highest IP addressed downstream LSR becomes the owner of that label and may change it if it sees fit. However, it is not required to change it. All downstream LSRs can continue to use the assignment in their Join messages. If two systems both join for the first time (they do not have state), at the same time and each choose a different label value, the highest IP addressed downstream LSR's label will be used by the upstream LSR. The lower addressed LSR will hear the higher addressed LSR's Join too and will also use it's label. If the label assigner crashes, the highest IP addressed downstream LSR assigns a new label to the multicast routes, which were assigned by the crashing LSR, and triggers a Join message so all other LSRs on Farinacci & Rekhter Multicast Tagging using PIM [Page 3] Internet Draft November 1998 the LAN to use the new label. When a LAN partitions due to a layer-2 switch failure, it follows the same logic for the case when a LSR stops joining for a group. When the partition heals, there may be an RPF neighbor change in one of the partitions. When there is an RPF neighbor change and the downstream routers trigger joins to their new RPF neighbor with a different label assignment than the other partition is using, one of two resolutions occur: 1) The LSR which is the allocator in the partition of the new RPF neighbor will trigger a join if it has a higher IP address than the allocator in the other region. The downstream routers in the other partition use the new label assignment immediately. 2) If the LSR which is the allocator in the partition of the new RPF neighbor has a lower IP address, all downstream routers and the new RPF neighbor will switch to the label assigned by the allocator in the other partition. If an RPF change occurs (the topology changed so the upstream LSR is different), the PIM protocol spec indicates that a PIM Join may be triggered to get on the new distribution tree as soon as possible. In this case, if the label assigner becomes the upstream LSR, then the new highest IP addressed downstream LSR may become the label assigner. It may change the label if it sees fit. Otherwise, the same label is used. 3.0 Coexistence of Label-Capable and Label-Incapable multicast routers An upstream router will know if all routers on a subnet are LSRs or not. If there are any label incapable routers, the upstream router will not label encapsulate multicast data packets. The PIM Hello message will indicate if the router is label capable. The PIM Hello message is sent by every multicast capable router. If the upstream router detects any non-PIM neighbors on the subnet, it will assume that they are label capable and will not label encapsulate multicast data packets. An optimization may be achieved, if the upstream router knows that all downstream routers interested in the group are LSRs, it may label encapsulate multicast data packets even though there are other label incapable routers on the subnet. Related to the above cases, if there is a group member on a LAN, co- located with a multicast LSR, only a single packet will be forwarded. Farinacci & Rekhter Multicast Tagging using PIM [Page 4] Internet Draft November 1998 It is the responsibility of the upstream router to decapsulate the labeled packet and forward it on the LAN as an IP packet so the member can receive it. The downstream routers may forward the IP packet or label encapsulate it. 4.0 Label Conflict Resolution The use of different data-link layer code-points (i.e. Ethertypes, PPP protocol types) for unicast and multicast label switching allows to disambiguate between labels associated with unicast routes versus labels associated with multicast routes. Therefore, the assignment of labels for unicast routes could be done completely independent from the assignment of labels for multicast routes, without creating any risk of ambiguity. For example, the same label value could be allocated for a unicast route and for a multicast route. 5.0 Modifications to PIMv2 PIMv2 has a packet format for each address type it may support when encoding both multicast and unicast addresses. We will define a new address type called "Label Address" for unicast address encoding. The label will accompany the source address in the Encoded Source Address format as specified in [2]. The label value will be in a 32-bit quantity following the source address. So, for example, an IPv4 Label Address format would look like: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Rsrvd |S|W|R| Mask Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Current Multicast Route Timer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Label A 20-bit value assigned by the LSR sending the Join/Prune message. The 20 bit value is inserted in the low-order 20-bits of this 32- bit field. Current Multicast Route Timer The sender of a Join/Prune message inserts the current time left before expiration for the multicast route table entry described by Farinacci & Rekhter Multicast Tagging using PIM [Page 5] Internet Draft November 1998 the Source Address (either the (S,G) or (*,G) entry). This is needed so all routers on a common multi-access subnet can time-out the entry close to the same time without each other recreating the state when the source goes inactive. Refer to [2] for other field descriptions not specified here. 6.0 Label Distribution for dense-mode groups In dense-mode PIM, there is no downstream Join message traveling upstream to perform the binding of multicast routes with labels. However, since we don't want a separate algorithm for dense-mode groups, we extend this basic design for dense-mode PIM. When a downstream LSR creates (S,G) state from the receipt of 1) data, or 2) Join/Prune or Graft messages, it will start a periodic timer to send Join messages with label assignment information present. The messages look no different and are treated on receipt no differently than in the sparse-mode case. The periodic Join message will be multicast on the LAN with an upstream target address of 0.0.0.0. All multicast LSRs on the LAN must know the group operates in dense-mode. This is accomplished using standard PIM mechanisms. 7.0 Security Considerations Security considerations are not discussed in this memo. 8.0 Acknowledgments The authors would like to thank Fred Baker and Eric Rosen from cisco Systems for their insightful comments on this draft. 9.0 Author's Address Dino Farinacci Cisco Systems, Inc. 170 Tasman Drive San Jose, CA, 95134 Email: dino@cisco.com Yakov Rekhter Cisco Systems, Inc. Farinacci & Rekhter Multicast Tagging using PIM [Page 6] Internet Draft November 1998 170 Tasman Drive San Jose, CA, 95134 Email: yakov@cisco.com 10.0 References [1] Multiprotocol Label Switching Architecture, draft-ietf-mpls- arch-02.txt, Rosen, Viswanathan, Callon, July, 1998. [2] Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification, RFC 2362, Estrin, Farinacci, Helmy, Thaler, Deering, Handley, Jacobson, Liu, Sharma, Wei, June, 1998 [3] LDP Specification, <draft-ietf-mpls-ldp-01.txt>, Andersson, Doolan, Feldman, Fredette, Thomas, August, 1998 [4] Partitioning Label Space amoung Multicast Routers on a Common Subnet,, Farinacci, October, 1998 [5] "MPLS Label Stack Encoding", draft-ietf-mpls-label-encaps-03.txt, Rosen, Rekhter, Tappan, Fedorkow, Li, Conta, September, 1998 Farinacci & Rekhter Multicast Tagging using PIM [Page 7]