Internet Draft Network Working Group Arup Acharya Internet Draft Frederic Griffoul <draft-acharya-ipsofacto-mpls-mcast-00.txt> Furquan Ansari C&C Research Labs, NEC February 23, 1999 Expires August 23, 1999 IP Multicast Support in MPLS Networks <draft-acharya-ipsofacto-mpls-mcast-00.txt> Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other group may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Multicast support in a MPLS network has yet to be defined. This document discusses both dense-mode and sparse-mode IP multicast within the context of a MPLS network. Unlike unicast routing, dense-mode multicast routing trees are established in a data-driven manner and it is not possible to topologically aggregate such trees, which are rooted at different sources. In sparse-mode multicast, source-specific trees may coexist with a core/shared tree, and it is not possible to assign a common label to traffic from different sources on a branch of the shared tree. This leads us to suggest a per-source traffic-driven label allocation scheme for supporting all three types of multicast (dense mode, shared tree, source tree) routing trees in a MPLS network. Acharya, Griffoul & Ansari [Page 1] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 Table of Contents 1. Introduction 3 2. Dense-mode multicast: problem definition 4 3. Sparse-mode multicast: problem definition 6 3.1 Existing proposals for PIM-SM in MPLS 6 3.2 Shared tree/source tree co-existence problem 6 3.3 Per-source label assignment 7 4. Building block for proposed MPLS multicast 8 4.1 Assumptions 8 4.2 Upstream "implicit" label distribution 9 4.2.1 Label assignment 9 4.2.2 Label withdrawal 10 4.3 Downstream LDP-based label distribution 11 4.4 Comparison of the distribution procedures 13 5. Proposed Solution for PIM-DM in MPLS 13 5.1 Basic operations 13 5.2 Label Binding triggered by PIM-Graft 14 5.3 Label Reclamation triggered by PIM-Prune 14 5.4 Label Reclamation triggered by PIM inactivity timer 14 5.5 Example 15 6. Proposed solution for PIM-SM in MPLS 16 6.1 Source-specific/shortest-path tree 16 6.2 Shared tree 17 6.2.1 Label Reclamation 17 6.2.2 Example 18 7. Proposed solution for DVMRP and MOSPF in MPLS 19 8. Effects of L3 topology change on multicast LSP 20 8.1 Loops 20 8.2 Change of upstream router 20 9. Conclusions 20 10. Security Considerations 21 11. Acknowledgments 21 12. References 21 13. Authors Addresses 22 Appendix A: LDP Multicast FEC Definitions 23 Appendix B: LDP Initialization Session Multicast Parameter 24 Table of Abbreviations DVMRP Distance Vector Multicast Routing Protocol IGMP Internet Group Management Protocol IP Internet Protocol LSP Label Switched Path LSR Label Switching Router MFC Multicast Forwarding Cache Acharya, Griffoul & Ansari [Page 2] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 MRT Multicast Routing Table NHLF Next Hop Label Forwarding PIM-DM Protocol Independent Multicast-Dense Mode PIM-SM Protocol Independent Multicast-Sparse Mode RP Rendezvous Point (S,G) (Source, Group) pair (*,G) (Match any source, Group) pair UL Unused or Unassigned Label iif Incoming Interface oif Outgoing Interface 1. Introduction This document considers the problem of supporting IP multicast efficiently within an MPLS environment.Both PIM dense-mode and sparse-mode multicast routing protocols are discussed. We observe that, in dense-mode operation, multicast routing entries do not exist prior to arrival of data packets and unlike unicast routing entries, cannot be aggregated. This suggests that labels need to be assigned on a per-flow (source, group) basis in a traffic-driven fashion. In case of sparse-mode, we observe that source specific trees may co-exist with the shared/core tree for a multicast group, and so, nodes of the shared tree may prune data packets based on the source. This implies a single label cannot be assigned to all flows on the shared tree, independent of the source. We suggest a data-driven, per-source assignment of labels to traffic on the shared tree. For the three different types of trees (dense mode, sparse mode shared and sparse mode source specific), we present a common scheme for implicitly distributing and binding labels to multicast FECs. Presently, support for multicast in MLPS networks [7] is undefined, and this document suggests a possible solution for forwarding multicast traffic at layer 2. For a review of multicast routing protocols and their implications for a MPLS environment, the reader is referred to [1]. For PIM-SM, [3] suggests that the multicast forwarding cache (MFC) which contains forwarding entries for currently active multicast flows, be used as a trigger method to setup a label-switched path (LSP), but no specific methods for label binding are suggested. It notes that coexistence of shared and source specific trees in PIM-SM is problematic for L2 forwarding and suggests that L3 forwarding be used in such situations. In this document, we present a data-driven scheme for label assignment to setup LSPs for both dense-mode and sparse mode multicast, and is based on our prior work on IP switching over ATM, IPSOFACTO ([IPSO1, IPSO2]). Acharya, Griffoul & Ansari [Page 3] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 2. Dense-Mode Multicast: problem definition The current MPLS specifications for unicast traffic [ARCH,LDP] advocate control-driven label binding and downstream label assignment. In this section, we will point out why such a topology-driven approach is not suitable to the multicast dense-mode case. Let us consider a unicast example to see how topology-driven label binding works. [NET1] [NET2] | | R1 R2 \ / A \ / B \ / R3----[NET3] / C / D / R4 | [NET4] Figure 1 Let us assume the LSR R3 is either a packet-switched LSR or a VC-merge capable ATM LSR (i.e. it supports label aggregation). A partial view of the R3 label tables is: Next Hop | IIF | Incoming Label | OIF | Outgoing Label ----------+------+-------------------+-----+------------------ R2 | A | l1 | B | l2 R2 | C | l3 | B | l2 R2 | D | l3 | B | l2 The key points of MPLS unicast forwarding are the following: 1. Routing table updates trigger the creation or destruction of label bindings. 2. The label bindings are advertised using a dedicated Label Distribution Protocol (LDP). It happens before any data is received on the corresponding ports, thus all the packets are forwarded at the layer 2. Acharya, Griffoul & Ansari [Page 4] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 3. All packets whose destination is NET2, are aggregated in R3: they are forwarded to R2 on interface B using a single common label l2. Now let us suppose a multicast group G has members in NET2 and NET4 and a source S1 in NET1. According to PIM-DM, R3 receives the packets to G on interface A and forwards them on its outgoing interfaces B, C and D. R3 creates the following multicast routing table entry: (S1, G) iif={A} oif={B, C, D} prune={} Packets are then forwarded at layer 3 since no label has been assigned to (S1, G) so far. Subsequently, a PRUNE message is received on interface C (since NET3 has no G member) and the multicast routing table entry is modified as: (S1, G) iif={A} oif={B, D} prune={C} The interface C is added again to the outgoing interface list after the Prune timer expires. Note the following points: 1. There is no routing entry at the LSR R3 corresponding to (S1, G) prior to arrival of data from S1. 2. It is not possible to aggregate multicast routing entries in Dense Mode. Suppose a source S2 in NET2 starts sending traffic to G. R3 creates a new multicast routing table entry: (S3, G) iif={B} oif={A, C, D} prune={} which is then modified after receiving PRUNE messages from interfaces A and C to: (S3, G) iif={B} oif={ D} prune={A, C} The (S3, G) entry cannot be aggregated with the entry for (S1,G), since the incoming and outgoing interfaces are different. 3. A given routing table entry changes dynamically (even without any change in the unicast routing/network topology) due to periodic pruning of branches and/or arrival of new members. 4. All packets are forwarded at L3 till such a time incoming and outgoing labels are assigned to the (S1, G) entry. Acharya, Griffoul & Ansari [Page 5] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 Points (1) and (3) lead us to conclude that label assignment for dense-mode traffic needs to be hop-by-hop traffic-driven. Furthermore, from (2), each (S, G) entry needs to be assigned separate incoming and outgoing labels. When the first packet from source S to destination G is received by an LSR, multicast IP forwarding carries out the RPF check and creates an (S, G) entry in the multicast routing table. Once this (S, G) entry exists, the procedure to bind a label to the (S, G) FEC is activated. Till such labels are assigned, all packets are forwarded at L3, and therefore, the label bindings need to be done as quickly as possible (to keep L3 processing at a minimum) after the routing entry is created. Arrival of a PIM Graft (S, G) message requires adding an outgoing branch to the existing LSP. From (3), labels need to be withdrawn in two cases, on Prune (S, G) reception and/or emission; on activity timer expiration. 3. Sparse Mode Multicast: Problem definition 3.1 Existing proposals for PIM-SM in MPLS [FAR1] suggests a piggy-backing methodology to assign and distribute labels for multicast traffic for sparse-mode trees. The idea is that PIM Join messages are augmented to carry labels. Besides requiring changes to existing PIM message formats, [OOMS1] lists other drawbacks of this piggybacking approach. As we discuss below, it is not also possible to assign a single label, common to all sources, for sparse-mode shared trees, and thus the piggybacking approach is not adequate for this case. [OOMS2] recognizes the (*, G)/(S, G) coexistence problem but only proposes to have recourse to IP L3 forwarding. 3.2 Shared tree/source tree co-existence problem PIM-SM allows receivers to join a shared tree (*,G) for the group G with a common core/Rendezvous Point (RP) as the root, or a shortest-path (S, G) tree rooted at a specific source S. A receiver may thus receive traffic for a given source S through the (S, G) tree, and for other sources, through the shared tree. Note also that, some members may receive the source traffic from the shared (*, G) tree while other members may receive it from the (S, G) tree. Consequently, the source Designated Router needs to forward the source traffic on both the (*, G) and (S, G) trees. Acharya, Griffoul & Ansari [Page 6] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 In a MPLS context, a problem arises from the situation when a node on the shared (*, G) tree needs to forward data differently depending on the source, for instance, because some members have joined a source specific shortest-path tree. Let us consider the case of Figure 2. The node R1 is not interested in receiving S1's traffic from the (*, G) tree, since it has joined the source-specific tree for S1. It sends a Prune(S1, G) message to R1 to prevent S1's traffic from being forwarded on link 1. As a result, R1 forwards traffic from S1 on interface 3, while traffic from S2 is forwarded on interfaces 1 and 3. To accomplish the same forwarding behaviour at L2 within a MPLS network, a common label can not be assigned to all traffic on R1's incoming link 2; the traffic from S1 on R1's interface 2 must be assigned a distinct label from that of S2. R2 -------> Join(S1,G) -------> S1 \ / | \ 1 / | \ / | +----+ 2 +----+ +---> | R1 |--------------------| RP | Prune(S1,G) +----+ ------> +----+ / Join(*,G) \ / 3 \ / \ R3 S2 at R1: (*, G) iif={2} oif={1, 3} (S1, G) iif={2} oif={3} Figure 2 It is easy to see that such selective forwarding may be necessary at different points of the shared tree depending on the source of the traffic. For PIM-SM, a naive topology-driven procedure to assign labels leads to incorrect data delivery. 3.3 Per-source label assignment PIM-SM shortest path tree support can be equivalent to PIM-DM tree support: a label is assigned in a hop-by-hop traffic-driven way for each (S, G) entry. Acharya, Griffoul & Ansari [Page 7] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 To solve the (S, G)/(*, G) coexistence problem without resorting to IP forwarding, source specific labels are to be assigned on intermediate nodes of the shared tree. Multiple labels will be associated with one (*, G) entry, corresponding to one label per active source. In order to unambiguously distinguish a per-source (*, G) label binding from a (S, G) binding, we propose to introduce a (G, S) FEC representing IP packets from source S forwarded on the (*, G) tree. The other obvious FEC, the (S, G) FEC represents IP packets from source S forwarded on the (S, G) tree. PIM-SM could then be supported using per-source label assignment. More details are given in section 6. 4. Building block for proposed MPLS multicast 4.1 Assumptions Our proposal is based on the following basic assumptions: 1. There is a label table associated with each interface of a multicast-capable LSR. 2. On a multi-access link, multicast-capable LSRs must use disjoint label spaces that are used for binding labels to FECs. An exact mechanism to achieve (2) through extensions to LDP is deferred to a later draft. [FAR2] describes a solution for (2); however, it augments PIM-Hello messages to achieve disjoint multicast labels across PIM-capable LSRs on a multi-access link. [FAR2] proposes label allocation from the downstream node; however, such a partitioned label space can be used for upstream label allocation as well. In the rest of the document, we use the term "Unused Label" or UL to denote a free multicast label, i.e. a label within the multicast range with no current binding. We propose two types of label bindings: the first uses upstream allocation with an "implicit" distribution, the second uses downstream allocation based on explicit LDP-like control messages. For both the approaches, a label binding is initiated when a FEC is detected in the multicast flows. Acharya, Griffoul & Ansari [Page 8] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 4.2 Upstream "implicit" label distribution 4.2.1 Label assignment This proposal imposes an additional requirement: 3. When a multicast-capable LSR receives a packet with a label that has no current binding on the incoming interface, L3 processing is invoked. When a multicast-capable LSR detects a new multicast FEC, it invokes L3 routing to determine the outgoing interfaces. For each outgoing interface, it selects a UL and binds the UL to the corresponding multicast tree. It then forwards the packet downstream. A downstream LSR receives the packet with the UL, invokes L3 routing (since the incoming label has no binding) to determine the outgoing interfaces and again selects UL for each of those interfaces. An entry is added to the label table consisting of the incoming interface/label and outgoing interfaces/label list. Subsequent traffic on the corresponding multicast tree is label-switched at L2. In Figure 3, consider a new multicast flow arriving on interface 1. The UL selected by the upstream LSR is A, and reception of the packet invokes L3 processing. As a result of L3 processing, interfaces 2, 3 and 4 are selected as the outgoing interfaces. ULs X, Y and Z are then picked for the interfaces 2, 3 and 4 respectively, and a copy of the packet is forwarded on each of those interfaces with the corresponding labels. An entry is added to the label tabel: < input = (1, A), output = {(2,X), (3,Y), (4,Z)} > Subsequent packets that arrive at interface 1 with label A are switched at L2, without invoking L3 processing. Thus, only the first packet undergoes L3 processing. Acharya, Griffoul & Ansari [Page 9] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 L3 UL=X Processing ^ / ^ \ / /(2) / \-> / / |--/---------|/ ------> UL=Y ___(1)___| / R (LSR) |_________________ |/-----------|\ (3) ---------/ \ UL=A \ \ (4) \ \ UL=Z V \ Figure 3 Note that this scheme works well for both point-to-point and multi-acess interfaces. A partitioned label space between multicast and unicast traffic avoids a situation where a label l is allocated by a downstream LSRd for unicast traffic from LSRu1, and is then subsequently allocated by another LSRu2 for multicast traffic downstream. LSRu1 LSRu2 | ^ l / | | |l <-/ | -------\------------- \ | \ | LSRd Figure 4 A disjoint label space amongst multicast LSRs ensures that no two LSRs assign the same label on a common multi-access link, e.g LSR u1 and u2. Moreover, since there can only be one forwarder on the link for a given (S, G), a per-source upstream label binding requires no further coordination among multicast LSRs on a common link. 4.2.2 Label withdrawal Once a label has been assigned on a LSR's outgoing interface, there needs to be a mechanism to reclaim that label. To prevent traffic from being switched along the wrong LSP, it is sufficient that the following relation holds: Acharya, Griffoul & Ansari [Page 10] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 (relation 1) if "L" is a UL on an outgoing interface of LSRu then "L" must also be an UL on the corresponding incoming interface of any LSRd on the same link as LSRu. Note that traffic is not forwarded incorrectly at L2, if l is an UL on LSRd's incoming interface, but not a UL on LSRu's outgoing interface. In this case, any traffic that LSRu sends with a label l invokes L3 processing at LSRd. In our multicast solution for MPLS, we need to ensure that a label is first reclaimed as an UL on the downstream LSR(s) first and only then on the upstream LSR. When the label withdrawal is triggered by a routing protocol control message, such as a PIM Prune, the L2 label can be immediately reclaimed without additional coordination, since the control message is sent from the downstream to the upstream node. In the case where the label binding for a FEC is broken due to expiration of the activity timer at a LSR, an explicit control message needs to be sent to revoke the label binding. In a a point-to-point link, we propose to send a LDP Label Release message from the downstream to the upstream. Alternatively, the upstream LSR may send a Label Withdraw message to the downstream node, followed by a Label Release response. In case of a multi-access link, a similar functionality needs to be supported. However, LDP as defined currently, operates over a point-to-point (TCP) reliable connection between adjacent LSRs. An analogous mechanism for the muti-party interactions (e.g. Label Release/Withdraw) over a multi-access link is to be discussed in a subsequent draft. 4.3 Explicit label allocation An alternative to the above mechanism is to use explicit control messages to bind a label to a FEC. On point-to-point links, we propose to use the Label Distribution Protocol [LDP] in downstream label distribution mode, along with new definitions for multicast FEC elements. This approach is useful if requirement (3) above cannot be met by a LSR. In that case, the traffic for a new FEC is first forwarded on a default routed path (e.g. (VPI=0,VCI=32) for LDP over ATM VC). Acharya, Griffoul & Ansari [Page 11] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 To members <-- LSRd1 \ LSRu --- <---- from Source / To members <-- LSRd2 Figure 5 As shown in Figure 5, LSRu will initially receive packets (on a default, routed path) that belong to a FEC for which it has no label binding. Two options are then possible: -- LSRu detects a new multicast FEC according and sends a Label Request message to all the next hops (for the MRT entry corresponding to the FEC). Each downstream LSR selects a free multicast label for its corresponding incoming port and eventually sends a Label Mapping message for the FEC to LSRu. -- No Label Request is sent by LSRu. Instead, arrival of packets at LSRd1 and LSRd2 on the routed path, trigger an unsolicited Label Mapping message to LSRu. Besides traffic-driven multicast FEC detection, a LSR initiates a label binding procedure, when the oif list of a MRT entry is modified, e.g. arrival of PIM-DM Graft messages and PIM-SM Join(*, G). On point-to-point links, the above LDP procedures can be used without additional protocol support. Multicast FEC elements and LDP initialization session multicast extension are defined in Appendix A and B. As noted in the previous section, LDP messages are currently not defined for multi-party interactions. In this document, we assume that such a mechanism exists for assigning and withdrawing multicast labels on a multi-access link, without specifying the exact mechanism. Such a multicast analogue for LDP, e.g. periodic link-local multicast of label bindings, will be described in a subsequent draft. Acharya, Griffoul & Ansari [Page 12] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 4.4 Comparison of the distribution procedures For multicast traffic, upstream label allocation is simpler since there can only be one upstream node (per link), and therefore, there can be only one entity that binds the label. In downstream allocation schemes, there may be multiple receivers (on a multi-access link) and one of them needs to be chosen as the label allocator. Additionally if the original allocator of a label (on a multi-access link) leaves the multicast tree, either the label binding for the tree needs to be changed and/or another LSR needs to be elected as the label allocator. For traffic-driven approaches, upstream allocation is preferable since it allows the label-binding (and consequently L2 switching) to happen earlier than for downstream allocation. In general, the advantage of an implicit coordination is that only the first packet carrying an UL requires L3 processing. In contrast, an explicit control message to propagate labels incurs a delay between the arrival of a traffic stream and label binding. During this interval, each incoming packet is processed at L3 and requires a L3 copy-and-forward operation for each outgoing branch of the multicast tree. In the next sections, we describe in more details our proposed solution to support PIM-DM and PIM-SM in MPLS. Although we will focus on the upstream label distribution procedure, the solutions are equally applicable with downstream-on-demand LDP-based label distribution, assuming that the necessary multicast extensions will be defined LDP at a later time. 5. Proposed Solution for PIM-DM in MPLS 5.1 Basic operations In PIM-DM, there is a one-to-one mapping between a multicast routing entry and a LSP, so that the only FEC to be considered is the (S, G) FEC. We use the building block described in section 4 to propose a solution for PIM-DM as follows. When a multicast packet with source S and destination G is received at an incoming interface, the UL associated with the packet triggers PIM-DM processing, e.g. RPF check, followed by selecting the outgoing entries. A (S, G) routing table entry is installed. An UL is selected for each outgoing link, and the packet is forwarded onto the next hop using the selected labels. A corresponding LSP <(iif, label) set of (oif, label)> is created. Acharya, Griffoul & Ansari [Page 13] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 In PIM-DM, there is a one-to-one mapping between a multicast routing entry and a LSP, so that the only FEC to be considered is the (S,G) FEC. Following the first packet that is processed at L3 (which triggers the LSP setup) all other packets are forwarded in L2. In all the solutions, all PIM-DM control messages, Prune and Graft, can be sent on a single hop LSP between adjacent LSRs. 5.2 Label Binding triggered by PIM-Graft Arrival of a Graft(S, G) message requires adding an outgoing branch to the existing LSP. For upstream implicit label allocation, it means to select an UL on the link on which the Graft(S, G) was received. 5.3 Label Reclamation triggered by PIM-Prune Subsequent to setting up the LSP, arrival of a PIM Prune message removes the corresponding outgoing branch of the LSP, i.e. the previously assigned label is now marked as UL. Suppose LSR1 is upstream to LSR2 and the label assigned for a (S, G) FEC on the LSR1--2 link is L1. Once Layer 3 processing at LSR2 sends the Prune to LSR1, LSR2 marks the incoming label L1 as a UL on the LSR1--LSR2 link (so that any subsequent assignment of L1 by LSR1 to a new FEC will trigger L3 processing at LSR2). LSR1 marks L1 as a UL on receiving the Prune, and modifies the LSP associated with the (S, G) entry. LSR1 is now free to assign L1 to a new FEC. 5.4 Label Reclamation triggered by PIM inactivity timer In PIM-DM, the (S, G) forwarding state is associated with an inactivity timer ([PIM-DM]), which is used to remove inactive (S, G) entries, i.e. flows with no traffic for a specified amount of time T. In a L3 router, this is achieved by resetting the timer whenever a packet is forwarded using the (S, G) entry. When forwarding traffic in L2 mode, no traffic will be observed at L3 and therefore, we propose that the inactivity timer is reset based on forwarding activity on the LSP. If no activity is observed within T, both the LSP and the multicast routing entry should be removed. To ensure that a label is first reclaimed as UL on the incoming interface of a LSRd prior to that of an outgoing interface of a LSRu on the same link, LSRu will send an LDP Label Withdraw message (see section 4.2.2). Acharya, Griffoul & Ansari [Page 14] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 5.5 Example Let us come back to the example of the section 2. A multicast group G has members in NET2 and NET4 and a source S1 in NET1 sends traffic to G. [NET1] [NET2] | | R1 R2 \ / A \ / B \ / R3----[NET3] / C / D / R4 | [NET4] R3 receives the first packet to G on interface A with an unused label l1. This unused label has been assigned by the upstream router R1. The packet has to be forwarded on the outgoing interfaces B, C and D. R3 creates the following multicast routing table entry: (S1, G) iif={A} oif={B, C, D} prune={} In the same time, R3 chooses 3 unused labels, one for each outgoing interface and stores the following bindings: +------------+-----+----------------+----------------------+ |FEC Element | IIF | Incoming Label | OIF - Outgoing label | +------------+-----+----------------+----------------------+ | | | | B l2 | | (S1, G) | A | l1 | C l3 | | | | | D l4 | Figure 6: R3 's DM bindings after first packet arrival Subsequently, a PRUNE message is received on interface C, since NET3 has no member of G and the multicast routing table entry is modified as: (S1, G) iif={A} oif={B, D} prune={C} while the label binding is now: Acharya, Griffoul & Ansari [Page 15] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 +------------+-----+----------------+----------------------+ |FEC Element | IIF | Incoming Label | OIF - Outgoing label | +------------+-----+----------------+----------------------+ | | | | B l2 | | (S1, G) | A | l1 | | | | | | D l4 | Figure 7: R3 's DM bindings after Prune arrival The label l3 on the interface C is again in the pool of unused labels. 6. Proposed solution for PIM-SM in MPLS Unlike PIM-DM, an entry in the MRT already exists in a sparse mode (SM) tree prior to arrival of data packets. SM trees are either source-specific shortest-path trees (SPT) or shared trees (RPT). The MRT entries for a SM source-tree are similar to that of a dense-mode tree: both are (S, G) entries. However, while DM entries are installed on arrival of the first packet, SM entries are established and refreshed via periodic PIM-Join messages towards the sender. For a SM shared-tree, a single (*, G) entry in MRT is used to forward traffic from multiple sources. 6.1 Source-specific/shortest-path tree Since MRT entries for a source-specific tree are (S, G) entries, it is natural to do a one-to-one mapping of the L3 tree to a LSP. [FAR1] suggests piggybacking the label on PIM-Join messages. This requires modifying L3 protocol messages. The solution that we propose for label assignment/binding is the same as that for PIM-DM, i.e. (S, G) routing entry label assignment in a data-driven fashion, using upstream implicit distribution. Expiration of the L3 forwarding state (eg non-arrival of Join(S, G) messages) leads to either removal of outgoing branch from the (S, G) entry (and the corresponding label of the the LSP) or to the removal of both the MRT entry and the LSP (if it is the last branch to be deleted). Note that in this scheme, the downstream LSR marks an incoming label as UL before the same label is marked as UL on the outgoing interface of the upstream LSR. Thus, the label is correctly reclaimed (section 4.2.2). Both solutions use one label for every branch; however, in our proposed solution, the PIM protocol messages are unchanged and no labels are assigned till the source becomes active. Acharya, Griffoul & Ansari [Page 16] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 6.2 Shared tree The need for assigning source-specific labels on the intermediate nodes of a shared tree was described in section 3.2. Our proposed solution is similar to that for PIM-DM and SM source trees, as follows. When the first multicast data packet from source S (via the core/RP) is received at an incoming interface of a LSR on the shared tree, the UL associated with the packet triggers L3 routing. If a matching MRT entry exists (either a (*, G) or a (S, G) entry), then UL for each outgoing interface of the matching entry, is selected and the packet is forwarded onto the next hop(s) using the selected label(s). A corresponding LSP <(iif, label) set of (oif, label)> is created. If the matching MRT entry was a (S, G) entry, then as with source specific PIM-DM tree, there can be atmost one LSP associated with the entry. If the matching MRT entry was a (*, G) entry, then multiple LSPs may be associated with each entry, corresponding to one LSP per active source. For each active source S, the association between the MRT entry and LSP should be explicitly recorded at the LSR. It is possible that at a later time, the arrival of a PIM-Prune(S, G) message triggers creation of a (S, G) entry (e.g. when a downstream node of the shared tree starts to receive data from the source-specific tree for S); the oif set for this newly created (S, G) entry will equal that of the (*, G) entry but minus the interface on which the Prune was received. This should trigger modification of the LSP, i.e. the label associated with the outgoing interface on which the Prune is received, is now marked a UL. PIM-SM allows a sender to transmit packets either as encapsulated messages (PIM-Register) to the RP, or as native multicast (which typically happens when the RP joins the source specific tree). In the former case, end-to-end LSP cannot be created since the LSP between the source and the RP may have been setup using labels for an aggregate (unicast) route; additionally, the data packets need to be decapsulated at L3. In the latter case, i.e. when the RP receives native multicast packets, end-to-end LSP can be created. 6.2.1 Label Reclamation All LSPs associated with a (*, G) MRT entry are reclaimed when the L3 forwarding state times out, due to non-arrival of PIM-Join(*, G) messages from all downstream nodes. Acharya, Griffoul & Ansari [Page 17] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 Arrival of a Prune (S, G) message triggers label reclamation of a LSP associated with a (*, G) entry (which then becomes a (S, G) entry (section 6.2)), or of a LSP associated with (S, G) entry, if such a LSP exists. When there is (*, G) state at L3, and there are multiple active sources, a LSP per source is setup. However, when a source S goes inactive, there is no L3 mechanism that can act as a trigger to reclaim the LSP. Notice that LSPs setup with PIM-DM had a similar situation but since, PIM-DM maintains per-source timers at L3, the LSP reclamation is triggered by expiration of such timers. In PIM-SM shared tree, there is no per-source timer maintained at L3 (as part of the protocol definition; specific implementations may use a per-source MFC entry). In order to reclaim labels, we propose that the many-to-one mapping between a MRT entry and multiple LSPs be associated with an activity timer per LSP, that is used in the same fashion as PIM-DM activity timers (see 5.4). Note however, that is not a change to the L3 protocol (PIM-SM), but is an additional data structure maintained with the L3 to L2 mapping entries. Like PIM-DM, once the L2 LSP inactivity timer expires, the LSR must send an LDP Label Withdraw to each LSP downstream nodes, as described in section 4.2.2. 6.2.2 Example Let us consider the case of section 3.2: R2 -------> Join(S1,G) -------> S1 \ / | \ 1 / | \ / | +----+ 2 +----+ +---> | R1 |--------------------| RP | Prune(S1,G) +----+ ------> +----+ / Join(*,G) \ / 3 \ / \ R3 S2 Initially both R1 and R2 have joined the shared (*, G) tree, so that the LSP and MRT entries at R1 look like: Acharya, Griffoul & Ansari [Page 18] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 MRT: (* , G) iif={2} oif={1, 3} LSP: (G, S1) in={2,L12} out={(1,L11);(3,L13)} (G, S2) in={2,L22} out={(1,L21);(3,L23)} Note that we have per-source LSP for the group G, bound to the FEC (G, S1) and (G, S2) as defined in section 3.3. The incoming labels L12 and L22 are distinct. Now R2 joins (S1, G) specific tree and we suppose R1 is not part of the (S1, G) tree. R2 eventually sends a Prune(S1, G) message to R1. The MRT entries for G become: MRT: (* , G) iif={2} oif={1, 3} (S1, G) iif={2} oif={3} Moreover the Prune(S1, G) message leads to the removal of one outgoing branch of the (G, S1) LSP: LSP: (G, S1) in={2,L12} out={(3,L13)} (G, S2) in={2,L22} out={(1,L21);(3,L23)} With this procedure, R2 is still receiving the traffic from S2 on an LSP following the L3 shared tree, while the traffic from S1 follows a shortest-path tree. R3 is not affected and keeps on receiving the whole traffic to G on the (*, G) interface. 7. Proposed solution for DVMRP and MOSPF in MPLS DVMRP [DVMRP] is supported in the same fashion as PIM-DM: both are flood-and-prune techniques which create a (S, G) entry in the MRT on arrival of the first data packet. The difference between the two is mainly at L3, e.g. DVMRP uses RIP specific information to disambiguate equal-cost paths, while PIM-DM uses explicit PIM-Assert messages. Our proposed solution for PIM-DM is equally applicable to setting up LSPs when the L3 protocol is DVMRP. MOSPF is not a flood-and-prune technique [MOSPF]. It uses link-state advertisements to flood group membership to all routers within a area. On arrival of the first data packet, a shortest path (S, G) tree computation is triggered, and a (S, G) entry is installed in the MRT. Again, our proposed solution for PIM-DM in MPLS is equally applicable to setting upLSPs when the L3 protocol is MOSPF. Acharya, Griffoul & Ansari [Page 19] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 8. Effects of L3 topology change on multicast LSP 8.1 Loops Multicast packet forwarding in a L3 router is preceded by a Reverse Path Forwarding (RPF) check, i.e. a packet is forwarded only if it arrives on the "right" interface, as specified in a matching routing entry ((S,G) or (*, G)). Thus, L3 routing for multicast packets never creates routing loops. In our solution, the L3 entry is mapped to a L2 forwarding path, and so, the LSP is also loop-free. 8.2 Change of upstream router Change in unicast routing entries at L3 may lead to a change in the multicast routing tree at L3 as well. A given router R, may thus be associated with a new upstream router Ru of the multicast tree, and/or a different set of downstream routers Rd. A change in a L3 MRT entry triggers a corresponding change in an existing LSP as follows. If the incoming interface of the L3 MRT entry changes, then the incoming label of an existing LSP for that entry is marked UL (and a new LSP will be setup mirroring the changed L3 MRT entry). If a downstream interface is deleted from the MRT entry, then the corresponding L2 label is marked UL. (That label will also be reclaimed by the downstream LSR as it notices its upstream router/LSR has changed). 9. Conclusions In this document, we first make the following observations for existing multicast routing protocols (PIM, DVMRP, MOSPF): 1a. Dense-mode trees are created in a data-driven fashion; no L3 messages are used to create the tree. 1b. Dense-mode trees are created on a per-source basis, with no known mechanisms to aggregate different (S, G) trees. 1c. Source-specific sparse-mode trees are setup via explicit L3 control messages, but like dense-mode trees, multiple (S, G) trees cannot be aggregated. 1d. Nodes of a shared sparse-mode tree may forward traffic selectively based on the traffic source. Acharya, Griffoul & Ansari [Page 20] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 From these observations, it appears that: 2a. The (S, G) structure of DM and source-specific SM trees at L3 favours a per-source label-assignment. 2b. Sparse-mode trees should also be mapped to a per-source LSP to avoid L3 routing at intermediate nodes of the shared tree. This led us to suggest a per-source LSP setup that is applicable to all three trees. No changes are needed to any L3 routing protocol. Further, at the level of individual nodes, we observe that: 3a. Data-driven creation of MRT entry at DM tree nodes can be coupled with label assignment, thus avoiding L3 processing beyond the first packet. 3b. PIM-Prune messages can be exploited to trigger immediate reclamation of labels on the upstream and downstream nodes of the pruned branch (DM or SM). 3c. Nodes on a shared SM tree need to perform data-driven per-source label assignment since the sources are not known a-priori (see 1d and 2b) As a result, we presented a basic building block, using the dual notions of "unused labels" and "implicit binding", to achieve a data-driven, per-source LSP that binds labels to FECs at the earliest possible time, i.e the first packet. 10. Security Considerations Security considerations are not addressed in this document. 11. Acknowledgments Ajay Bakre, Kojiro Watanabe, D Raychaudhuri at Princeton Sibylle Schaller,Jurgen Roethig and Heiner Stuettgen at Heidelberg. 12. References [OOMS1] D.Ooms, W.Livens, B.Sales, M.Ramahlo, "Framework for IP Multicast in MPLS", draft-ooms-mpls-multicast-00.txt, August 1998. [OOMS2] D.Ooms, W.Livens, B.Sales, "MPLS for PIM-SM", draft-ooms-mpls-pimsm-00.txt, November 1998. Acharya, Griffoul & Ansari [Page 21] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 [PIM-SM] D.Estrin, D.Farinacci, A.Helmy, D.Thaler, S.Deering, M.Handley, V.Jacobson, C.Liu, P.Sharma, L.Wei; "Protocol Independent Multicast (PIM), Sparse Mode Protocol: Specification", RFC 2362, June 1998. [PIM-DM] S.Deering, D.Estrin, D.Farinacci, V.Jacobson, A.Helmy, D.Meyer, L.Wei; "Protocol Independent Multicast Version 2 Dense Mode Specification", draft-ietf-pim-v2-dm-01.txt [DVMRP] T.Pusateri; "Distance Vector Multicast Routing Protocol", draft-ietf-idmr-dvmrp-v3-07. [MOSPF] J.Moy; "Multicast Extensions to OSPF", draft-ietf-mospf-mospf-01.txt. [IPSO1] A.Acharya, R.Dighe, F.Ansari; "IPSOFACTO: IP Switching Over Fast ATM Cell Transport, draft-acharya-ipsw-fast-cell-00.txt [IPSO2] A.Acharya, R.Dighe, F.Ansari; "IP Switching Over Fast ATM Cell Transport (IPSOFACTO) : Switching Multicast Flows", Globecom 97. [LDP] L.Andersson, P.Doolan, N.Feldman, A.Fredette, B.Thomas, "LDP Specification", draft-ietf-mpls-ldp-03.txt, January 1999 [ARCH] E.Rosen, A.Viswanathan, R.Callon, "Multiprotocol Label Switching Architecture", draft-ietf-mpls-arch-03.txt, February 1999. [FAR1] D.Farinacci, Y.Rekhter, "Multicast Label Binding and Distribution using PIM",draft-farinacci-multicast-tagsw-01.txt, November 1998. [FAR2] D.Farinacci, "Partitioning Label Space among Multicast Routers on a Common Subnet", draft-farinacci-multicast-tag-part-01.txt, November 1998. 13. Authors' Addresses Arup Acharya C&C Research Labs, NEC USA 4 Independence Way, Princeton, NJ, USA Phone : 1 609 951 2992 Fax : 1 609 951 2499 E-mail: arup@ccrl.nj.nec.com Acharya, Griffoul & Ansari [Page 22] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 Frederic Griffoul C&C Research Labs, NEC Europe Ltd. Adenauerplatz 6 D-69115 Heidelberg, Germany Phone : 49 6221 905 1120 Fax : 49 6221 905 1155 E-mail: griffoul@ccrle.nec.de Furquan Ansari C&C Research Labs, NEC USA 4 Independence Way, Princeton, NJ, USA Phone : 1 609 951 2965 Fax : 1 609 951 2499 E-mail: furquan@ccrl.nj.nec.com Appendix A: LDP Multicast FEC Definitions In order to use LDP for multicast traffic, three new FEC elements need to be defined: - the source-group (S, G) element, type 0x04 - the group (*, G) element, type 0x05 - the group-source (G, S) element, type 0x06 The source-group element corresponds to PIM-DM and PIM-SM source specific multicast routing entry. The group element corresponds to PIM-SM shared entry, The group-source FEC is required to support per-source PIM-SM LSP, as described in section 3.3 and 6.2. Note that the (G, S) FEC definition impacts the processing of the LDP messages. For instance, when searching for the Next Hop of a (G, S) FEC, the lookup must be performed only on the (*, G) entries. The group FEC could be used in Label Withdraw/Release messages to break label bindings related to a (*, G) routing entry that has been removed. Source-group element value encoding: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SrcGrp (4) | Address Family | S/G Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Acharya, Griffoul & Ansari [Page 23] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 Address Family: Two octet containing a value from ADDRESS FAMILY NUMBERS in RFC1700 that encodes the address family of both the source and the group address. S/G Len: One octet unsigned integer containing the length in bits of the source address that follows. The group address length in bits is also S/G Len, so that the length of the FEC element after the S/G Len field is 2 * S/G Len Source Address: An address encoding according to the Address Family field, Group Address: An address encoding according to the Address Family field. Group element value encoding: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Grp (5) | Address Family | Grp Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Address Family: Two octet containing a value from ADDRESS FAMILY NUMBERS in RFC1700 that encodes the address family of both the source and the group address. Grp Len: One octet unsigned integer containing the length in bits of the group address that follows. Group Address: An address encoding according to the Address Family field. Acharya, Griffoul & Ansari [Page 24] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 Group-source element value encoding: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | GrpSrc (6) | Address Family | G/S Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Address Family: Two octet containing a value from ADDRESS FAMILY NUMBERS in RFC1700 that encodes the address family of both the source and the group address. G/S Len: One octet unsigned integer containing the length in bits of the source address that follows. The group address length in bits is also S/G Len, so that the length of the FEC element after the S/G Len field is 2 * S/G Len Source Address: An address encoding according to the Address Family field, Group Address: An address encoding according to the Address Family field. Appendix B: LDP Initialization Session Multicast Parameter During the LDP session establishment procedure, Label Switching Routers have to advertise their multicast label binding support and the advertisement discipline. We propose to add a Multicast Session Parameters TLV in the optional parameters list of the LDP Initialization message (see [LDP]). If the Multicast Session Parameters are not present in the Initialization message received from LSR1 by LSR2, LSR2 will consider LSR1 as non-multicast capable. Acharya, Griffoul & Ansari [Page 25] Internet Draft draft-ipsofacto-mpls-mcast-00.txt February 1999 The encoding of the Multicast Session Parameters experimental TLV is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U|F| Mcast Sess Parms (0x3F01) | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | A | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A = Multicast Label Advertisement Discipline Indicates the type of Multicast Label advertisement. 00 means upstream "implicit" distribution 01 means downstream-on-demand LDP-based distribution. If one LSR proposes upstream "implicit" and the other proposes downstream-on-demand, a default discipline must be imposed.