Internet Draft
Network Working Group                                     Dino Farinacci
Internet Draft                                             Yakov Rekhter
Expiration Date: December 1999                             Eric C. Rosen
                                                     Cisco Systems, Inc.

                                                               June 1999

        Using PIM to Distribute MPLS Labels for Multicast Routes

                 draft-farinacci-mpls-multicast-00.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   This document specifies a method of distributing MPLS labels for
   multicast routes.  The labels are distributed in the same PIM
   messages that are used to create the corresponding routes.  The
   method is media-type independent, and therefore works for multi-
   access/multicast capable LANs, point-to-point links, and NBMA
   networks.

Farinacci, Rekhter & Rosen                                      [Page 1]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

Table of Contents

    1          Overview  ...........................................   2
    2          Proposal  ...........................................   3
    2.1        Piggybacking  .......................................   3
    2.2        Labels for LANs with Multiple Downstream Nodes  .....   5
    2.3        Labels for Point-to-Point Links  ....................   5
    2.4        Labels for NBMA Networks  ...........................   5
    2.5        Corner cases  .......................................   6
    2.6        When NOT to Send a Labelled Multicast Packet  .......   7
    2.7        No Conflict between Unicast and Multicast Labels  ...   7
    3          Modifications to PIMv2  .............................   7
    4          Label Distribution for dense-mode groups  ...........   8
    5          Security Considerations  ............................   9
    6          Acknowledgments  ....................................   9
    7          References  .........................................   9

1. Overview

   PIM [2] is used to combine MPLS label distribution with the
   distribution of (*,G) join state, (S,G) join state, or (S,G)RPT-bit
   prune state. Labels and multicast routes are sent together in one
   message.

   The design of this method has been motivated by the following goals:

     o If an interface attaches to a network with data-link broadcast
       capability, an LSR should never have to send more than one copy
       of a given multicast data packet out that interface.  However, it
       is NOT a goal for that LSR to be able to send the same packet,
       with the same label, out multiple interfaces.

     o When an interface supports data link multicasting, it must be
       possible to have a single Label Information Base (LIB) for that
       interface.  That is, the receiver of a labeled packet should be
       able to interpret the label without knowing who the transmitter
       is.

     o When a LAN contains multiple label distribution peers, it should
       be possible to use data link multicast to distribute the label
       distribution control packets themselves.  Other aspects of label
       distribution methodology should remain as consistent with unicast
       label distribution as possible.  Multicast label distribution

Farinacci, Rekhter & Rosen                                      [Page 2]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

       procedures should not depend on the media type.

     o Once the label for a particular multicast tree on a given LAN has
       been assigned, unicast routing changes should not cause
       redistribution or reassignment of the label for that group on
       that LAN.

     o When a multicast routing table change requires a label
       distribution change, the latency between the two should be
       minimized, both to improve performance and to minimize the
       possibility of race conditions.

     o The procedures should work with either dense-mode or sparse mode
       operation.

2. Proposal

2.1. Piggybacking

   A LSR that supports multicast sends PIM Join/Prune messages on behalf
   of hosts that join groups. It sends Join/Prune messages to upstream
   neighboring LSRs toward the RP for the shared-tree (*,G) or toward a
   source for a source-tree (S,G).  Labels are distributed by being
   associated with addresses in the join list or the prune list.  In
   particular:

      1. If an LSR, Rd, joins the shared tree for a group, the
         Join/Prune message it sends upstream will contain the group
         address followed by a join-list.  The join-list will contain an
         element which contains the address of the RP.  This element
         will also contain a a label, and this label can be used by the
         upstream LSR, Ru, when it sends multicast data down the shared
         tree.  Intuitively, this label represents the route downstream
         from the current node along the shared tree.

      2. If an LSR, Rd, joins a source tree for a group, the Join/Prune
         message it sends upstream will contain the group address
         followed by a join-list.  The join-list will contain an element
         which contains the address of the source.  This element will
         also contain a label, and this label can be used by the
         upstream LSR, Ru, when it sends multicast data down the source
         tree.  Intuitively, this label represents the route downstream
         from the current node along the specified source tree.

Farinacci, Rekhter & Rosen                                      [Page 3]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

      3. Suppose an LSR, Rd, has (S,G)RPT-bit state with a null output
         interface list.  This indicates that all of its downstream
         neighbors on the shared tree for G have pruned source S from
         the shared tree.  Rd sends a Join/Prune message upstream (on
         the shared tree), containing the group address followed by a
         prune-list.  The prune-list contains an element which contains
         the address of the source.  In this case, no label is included
         in the element.

      4. Suppose an LSR, Rd, as the result of receiving, from a
         downstream neighbor on the shared tree, a Join/Prune message
         such as described in 3, creates (S,G)RPT-bit state with a non-
         null output interface list.  In this case, it may send a
         Join/Prune message upstream on the shared tree, containing the
         group address followed by a prune-list.  An element of the
         prune list will contain the address S and a corresponding
         label.  However, a special bit (the "don't prune" bit) in the
         element will be set indicating to the upstream LSR that the
         source S is not really to be pruned from the shared tree.  The
         result is that the upstream LSR, Ru, will still send packets
         from S to G to Rd, and will label those packets as specified.
         When Rd receives such packets, it forwards them according to
         the output interface list of the (S,G)RPT-bit entry.
         Intuitively, this label represents a route along the shared
         tree, but only for packets from the specified source.

      5. An LSR which receives a Join/Prune message as described in 4
         may send a corresponding Join/Prune message (with the "don't
         prune" bit set) to its upstream LSR on the shared tree. Again,
         this label represents a route along the shared tree, but only
         for packets from the specified source.

   Rules 3-5 above ensure that if a source is pruned off the shared tree
   at some point, any packets from that source which is sent down the
   shared tree will have a label that implicitly identifies the source.
   Thus if those packets encounter a node with (S,G)RPT-bit state, they
   will be sent according to the output interface list of the (S,G)RPT-
   bit entry, NOT according to the output interface list of the (*,G)
   entry.

Farinacci, Rekhter & Rosen                                      [Page 4]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

2.2. Labels for LANs with Multiple Downstream Nodes

   Since PIM Join/Prune messages are multicast on a LAN, other
   downstream LSRs that are interested in the group will hear the
   message.  They must cache the binding of multicast routing table
   state and label state together. Since the upstream LSR is going to
   forward data packets using the advertised label, they must be ready
   to accept the data packet with that advertised label.

   The first downstream LSR that joins a group is the label assigner on
   that LAN for that multicast route. All other downstream LSRs that
   send PIM Join/Prune messages will use the same label that the
   assigner selected. A LSR that sends a PIM Join/Prune message with a
   label of 0 means that it doesn't know the label for the associated
   multicast routing table entry. When this occurs, the assigner can
   trigger a PIM Join/Prune message making the label known.

2.3. Labels for Point-to-Point Links

   The procedure of section 2.2 works on point-to-point links because
   there is only one downstream LSR on the link which always becomes the
   label assigner.

2.4. Labels for NBMA Networks

   On NBMA networks, all PIM routers are known to each other through
   pseudo-broadcast mechanisms provided by the data-link layer. However,
   PIM Join messages are unicast to the upstream LSR. Therefore, other
   downstream LSRs will not hear the label assigner's advertisement.
   Therefore we treat an NBMA network with one upstream and n downstream
   LSRs as n point-to-point links, from the upstream LSR to each of the
   downstream LSRs.  Each downstream LSR then assigns its own label, and
   the upstream LSR must replicate the multicast data packets.
   Therefore the procedure of section 2.2 applies.

   Note that this is not incompatible with the use of native point-to-
   multipoint capabilities at the data link layer.

Farinacci, Rekhter & Rosen                                      [Page 5]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

2.5. Corner cases

   Multiple downstream LSRs cannot assign the same label value for any
   multicast route because they partition the label space into non-
   overlapping ranges according to [4]. When a LSR is enabled on an
   interface, it obtains a unique label range for the LAN.

   When the label assigner leaves the group, the label that it assigned
   still remains active. The next highest IP addressed downstream LSR
   becomes the owner of that label and may change it if it sees fit.
   However, it is not required to change it. All downstream LSRs can
   continue to use the assignment in their Join messages.

   If two systems both join for the first time (they do not have state),
   at the same time and each choose a different label value, the highest
   IP addressed downstream LSR's label will be used by the upstream LSR.
   The lower addressed LSR will hear the higher addressed LSR's Join too
   and will also use it's label.

   If the label assigner crashes, the highest IP addressed downstream
   LSR assigns a new label to the multicast routes, which were assigned
   by the crashing LSR, and triggers a Join message so all other LSRs on
   the LAN to use the new label.

   When a LAN partitions due to a layer-2 switch failure, it follows the
   same logic for the case when a LSR stops joining for a group. When
   the partition heals, there may be an RPF neighbor change in one of
   the partitions.  When there is an RPF neighbor change and the
   downstream routers trigger joins to their new RPF neighbor with a
   different label assignment than the other partition is using, one of
   two resolutions occur:

      1) The LSR which is the allocator in the partition of the new RPF
         neighbor will trigger a join if it has a higher IP address than
         the allocator in the other region. The downstream routers in
         the other partition use the new label assignment immediately.

      2) If the LSR which is the allocator in the partition of the new
         RPF neighbor has a lower IP address, all downstream routers and
         the new RPF neighbor will switch to the label assigned by the
         allocator in the other partition.

   If an RPF change occurs (the topology changed so the upstream LSR is
   different), the PIM protocol spec indicates that a PIM Join may be
   triggered to get on the new distribution tree as soon as possible. In
   this case, if the label assigner becomes the upstream LSR, then the
   new highest IP addressed downstream LSR may become the label
   assigner. It may change the label if it sees fit. Otherwise, the same

Farinacci, Rekhter & Rosen                                      [Page 6]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

   label is used.

2.6. When NOT to Send a Labelled Multicast Packet

   PIM Hello messages, sent periodically by all PIM-capable routers,
   will indicate if the router is MPLS-capable.  An upstream router on a
   LAN will therefore know if all routers on the same LAN are LSRs or
   not.  If there are ANY MPLS-incapable routers which are interested in
   a particular group, the upstream router will transmit to the LAN only
   unlabelled multicast data packets for that group.

   If there are any group members on a LAN, only unlabelled multicast
   data for that group will be transmitted onto that LAN.

   Routers that support non-PIM multicast are assumed, for the purposes
   of this procedure, to be MPLS-incapable.

2.7. No Conflict between Unicast and Multicast Labels

   MPLS uses different data-link layer code-points [5] to distinguish
   multicast labeled packets from unicast labeled packets.  Therefore,
   the assignment of labels for unicast routes is completely independent
   from the assignment of labels for multicast routes.  For example, the
   same label value could be allocated for a unicast route and for a
   multicast route, without any possibility of ambiguity.

3. Modifications to PIMv2

   PIMv2 has a packet format for each address type it may support when
   encoding both multicast and unicast addresses. We will define a new
   address type called "Label Address" for unicast address encoding.
   The label will accompany the source address in the Encoded Source
   Address format as specified in [2].  The label value will be in a
   32-bit quantity following the source address. We also take one bit
   from the PIMv2 reserved field to be the "don't prune" bit (shown
   below as the "D" bit).  So, for example, an IPv4 Label Address format
   would look like:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Rsrvd |D|S|W|R|   Mask Len    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Source Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Farinacci, Rekhter & Rosen                                      [Page 7]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

   |                            Label                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  Current Multicast Route Timer                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Label
      If the high-order bit is clear, the low-order 20 bits are a label
      value (as described in [5]) assigned by the LSR sending the
      Join/Prune message.  All other bits should be set to 0 by the
      sender and should be ignored by the receiver.

      If the high-order bit is set, the low-order 28 bits are a label
      value in the VPI/VCI format of (as described in [7]) assigned by
      the LSR sending the Join/Prune message.  All other bits should be
      set to 0 by the senderand should be ignored by the receiver.

   Current Multicast Route Timer
      The sender of a Join/Prune message inserts the current time left
      before expiration for the multicast route table entry described by
      the Source Address (either the (S,G) or (*,G) entry). This is
      needed so all routers on a common multi-access subnet can time-out
      the entry close to the same time without each other recreating the
      state when the source goes inactive.

   Refer to [2] for other field descriptions not specified here.

4. Label Distribution for dense-mode groups

   In dense-mode PIM, there is no downstream Join message traveling
   upstream to perform the binding of multicast routes with labels.
   However, since we don't want a separate algorithm for dense-mode
   groups, we extend this basic design for dense-mode PIM.

   When a downstream LSR creates (S,G) state from the receipt of 1)
   data, or 2) Join/Prune or Graft messages, it will start a periodic
   timer to send Join messages with label assignment information
   present. The messages look no different and are treated on receipt no
   differently than in the sparse-mode case.

   The periodic Join message will be multicast on the LAN with an
   upstream target address of 0.0.0.0. All multicast LSRs on the LAN
   must know the group operates in dense-mode. This is accomplished
   using standard PIM mechanisms.

Farinacci, Rekhter & Rosen                                      [Page 8]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

5. Security Considerations

   Security considerations are not discussed in this memo.

6. Acknowledgments

   The authors would like to thank Fred Baker for his comments.  We also
   thank the authors of [6] for their critique of an earlier version.

   9.0 Author's Addresses

      Dino Farinacci
      Cisco Systems, Inc.
      170 Tasman Drive
      San Jose, CA, 95134
      Email: dino@cisco.com

      Yakov Rekhter
      Cisco Systems, Inc.
      170 Tasman Drive
      San Jose, CA, 95134
      Email: yakov@cisco.com

      Eric C. Rosen
      Cisco Systems, Inc.
      250 Apollo Drive
      Chelmsford, MA, 01824
      Email: erosen@cisco.com

7. References

   [1] "Multiprotocol Label Switching Architecture", draft-ietf-mpls-
   arch-05.txt, Rosen, Viswanathan, Callon, April 1999.

   [2] "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
   Specification", RFC 2362, Estrin, Farinacci, Helmy, Thaler, Deering,
   Handley, Jacobson, Liu, Sharma, Wei, June 1998.

   [3] "LDP Specification", <draft-ietf-mpls-ldp-05.txt>, Andersson,
   Doolan, Feldman, Fredette, Thomas, June 1999.

Farinacci, Rekhter & Rosen                                      [Page 9]

Internet Draft   draft-farinacci-mpls-multicast-00.txt         June 1999

   [4] "Partitioning Label Space amoung Multicast Routers on a Common
   Subnet", , Farinacci,
   October 1998.

   [5] "MPLS Label Stack Encoding", , Rosen, Rekhter, Farinacci, Tappan, Fedorkow, Li, Conta,
   April 1999.

   [6] "Framework for IP Multicast in MPLS", , Ooms, Livens, Sales, Ramalho, Acharya, Griffoul,
   Ansari, May 1999.

   [7] "MPLS using LDP and ATM VC Switching", , Davie, Lawrence, McCloghrie, Rekhter, Rosen, Swallow,
   Doolan, April 1999.

Farinacci, Rekhter & Rosen                                     [Page 10]