Internet Draft Data Networking Group Pankaj K. Jha Internet Draft Cypress Semiconductor draft-jha-optical-hdt-00.txt November, 2000 Expiration Date: April, 2001 A Hybrid Data Transport Protocol for Optical Networks 1. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 2. Abstract Next-generation optical network configurations are increasingly becoming a complex mix of SONET/SDH and DWDM-based direct data-over- fiber links. As SONET/SDH networks are primarily geared for TDM support, transporting IP/ATM packets along with PDH (Plesiochronous Digital Hierarchy) channels (such as T1/T3/E1/E3) results in inefficient utilization of provisioned bandwidth. In addition, hybrid networks of ring and mesh configurations for short and long haul LAN/MAN/WAN communications need a unified end-to-end protocol that can seamlessly transport packets (native or otherwise) over any mix of optical networks. This draft describes a Hybrid Data Transport (HDT) protocol that provides a unified multi-service optical transport across SONET/SDH and direct data-over-fiber networks. In addition, it allows transmission of any combinations of Ethernet, T1/T3, ATM, IP (or any other protocol), NxT1/T3, fractional T1 (in increments of DS0), lower-rate SONET/SDH or any raw data stream over SONET/SDH networks. 3. Conventions used in this document Jha Expires April, 2001 [Page 1] INTERNET-DRAFT Hybrid Data Transport November, 2000 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. TABLE OF CONTENTS 1. Status of this Memo ............................................1 2. Abstract .......................................................1 3. Conventions used in this document ..............................1 4. Glossary .......................................................5 5. Assumptions & Nomenclature .....................................5 6. Organization of this Draft .....................................6 7. Objectives for Multiservice Transport over Optical Networks ....6 8. Conventional Approaches for Multiservice Transport .............7 8.1 Bandwidth Allocation on SONET for Data Transport .............8 8.1.1 Virtual concatenation ....................................8 8.1.2 Inverse Multiplexing .....................................9 8.1.3 Disadvantages of VT-based Packet Transports ..............9 8.1.4 Using entire SONET SPE for Packet Data ..................11 8.1.5 Bandwidth Considerations for Data over Fiber Networks ...12 8.2 Traditional Data Transport Protocols ........................12 8.2.1 Packet-Over-SONET (POS) .................................12 8.2.2 ATM VP Multiplexing .....................................13 8.2.3 Ethernet-like Treatment of SONET ........................14 9. A Unified Data Link Layer for Optical Networks ................15 9.1 Data Link Header for Control Packets ........................17 9.2 Data Link Layer for Data Packets ............................17 10. Hybrid Data Transport .......................................18 11. Motivations for HDT .........................................21 12. Advantages of HDT ...........................................21 13. HDT Frame Structure Overview ................................22 13.1 Payload Header (PH) Structure ..............................23 13.1.1 Core Header (cHdr) ......................................23 13.1.2 Next Fragment Offset ....................................23 13.1.3 Header with MPLS labels .................................24 13.1.4 Header with OAM bytes ...................................25 13.1.5 MPLS labels followed by OAM bytes .......................25 13.1.6 Core Header HEC (cHEC) ..................................26 13.2 User Payload ...............................................26 13.2.1 Payload .................................................26 13.2.2 Payload CRC (pCRC) ......................................26 14. HDT Framing Examples ........................................27 Jha Expires April, 2001 [Page 2] INTERNET-DRAFT Hybrid Data Transport November, 2000 15. Transmission of HDT Frames ..................................27 15.1 SONET Networks .............................................27 15.1.1 Concatenated SONET ......................................28 15.1.2 Non-concatenated - Inverse Multiplexing .................28 15.1.3 Virtual Concatenation over SONET ........................28 15.2 Data-over-Fiber and WDM Networks ...........................28 16. Protocol Implementation Considerations ......................29 17. Preserving Payload CRC for Robust Transmission ..............30 17.1 Payload Error Suppression with MPLS Shim Headers ...........30 17.2 Fault Detection and Isolation with an External MPLS stack ..31 18. Data Transport over Optical Networks using HDT ..............32 18.1 Protocol-independent operation for MPLS ....................32 18.2 Transport without MPLS .....................................32 18.3 Transport using MPLS .......................................33 18.3.1 MPLS as a Shim Header within Layer 2 Frames .............33 18.3.2 MPLS in Payload Header for True Multiservice Transport ..33 18.3.3 Interconnection of MPLS Networks over Public Networks ...34 19. Single MPLS Control Plane for Multiservice Transport ........35 20. Minimization of Number of LSPs ..............................35 21. Multiservice Switching over WDM Links .......................36 22. Frame Delineation Methods ...................................37 22.1 HDLC .......................................................37 22.2 SDL (Simple Data Link, rfc2823) ............................38 22.2.1 Length (LHdr) ...........................................39 22.2.2 Length HEC (LHEC) .......................................40 22.2.3 Payload CRC (pCRC) ......................................40 23. DC-balancing ................................................40 24. Scrambling ..................................................40 25. Special SDL Frames ..........................................40 25.1 Null/Idle Frame ............................................41 25.2 Scrambler State ............................................41 25.3 A/B Messages ...............................................41 25.4 Single ATM Cell Transport ..................................41 26. HDT Payload Header (PH) Structure Details ...................42 26.1 Core Header (cHdr) .........................................42 26.1.1 Payload Identifier (PI) .................................43 26.1.2 Header Extension (HEX) ..................................44 26.1.3 Bandwidth Allocation (BA) ...............................45 26.1.4 Multi-Frame Rate Channel (MFRC) .........................46 26.1.5 Fragment Indication (FI) ................................46 26.1.6 Payload CRC .............................................47 26.1.7 Tail-end Padding (TEP) ..................................48 26.1.8 Reserved bits ...........................................48 26.1.9 Time-to-Live (TTL) ......................................48 Jha Expires April, 2001 [Page 3] INTERNET-DRAFT Hybrid Data Transport November, 2000 26.1.10Header Length (HLEN) ....................................48 27. Data Transport Operations ...................................49 27.1 Transmit Operation .........................................50 27.2 Receive Operation ..........................................51 28. End-to-end OAM Support ......................................51 29. ATM cell transport ..........................................52 29.1 Single Cell ................................................52 29.2 Multiple ATM Cells .........................................52 29.3 ATM over Frame Relay .......................................53 30. Instant Bandwidth Allocation with Statistical Multiplexing ..54 30.1 Dynamic Bandwidth Region (DBR) Allocation at Nodes .........56 30.2 Example of a Multiservice Transport Network ................57 30.3 Data Transmission at a Sending Node ........................59 30.4 Processing at an Intermediate Node .........................60 30.5 Processing at a Destination Node ...........................60 31. Segmented Dynamic Bandwidth Allocation ......................60 32. Fragmentation of Packets ....................................61 33. Tail-end Padding ............................................63 34. Multi-Frame Rate Channels ...................................63 35. Bandwidth Reuse on SONET ....................................64 36. Fault-resilient Packet Networks with Recovery and Restoration65 37. Security Considerations .....................................68 38. Acknowledgments .............................................68 39. Intellectual Property Considerations ........................68 40. Author's Address ............................................68 41. Full Copyright Statement ....................................68 42. References ..................................................69 Jha Expires April, 2001 [Page 4] INTERNET-DRAFT Hybrid Data Transport November, 2000 4. Glossary DBR Dynamic Bandwidth Region. A set of bytes allocated to a SONET node in which the node can send fixed-bandwidth data. DoF Direct Data-over-Fiber networks, e.g., over WDM or dark fiber. Data is sent directly over fiber with no SONET/SDH synchronous framing. Packets are delineated using standard frame delineation such as HDLC, SDL, or ATM HEC. Frame A frame in this draft refers to a user packet that has been given frame delineation either using HDLC (0x7E) or SDL (length/CRC construct) before transmission on SONET or WDM. HDT Hybrid Data Transport protocol LSP Label Switched Path LSR Label Switch Router NBMA Non-Broadcast Multiple Access network O-E-O Optical-Electrical-Optical conversion at an intermediate node. Packet A packet in this document refers to a user payload or a user payload encapsulated with HDT header and/or MPLS labels POS Packet-over-SONET PDH Plesiochronous Digital Hierarchy (such as T1/T3/E1/E3) SDL Simple Data Link (rfc2823). A frame delineation technique that uses length of the frame and CRC value on the length value to form a 32-bitconstruct. This 32- bit construct precedes every packet. SPE Synchronous Payload Envelope (for SONET) VT Virtual Tributary 5. Assumptions & Nomenclature Throughout this document SONET is used to refer to SONET/SDH, with similar implications for SDH. Protocol architecture and operation are same for both SONET and SDH transport mechanisms. All references to T1/T3 have similar implications for E1/E3. Similarly, IP has been used to refer to any generic variable-length packet-oriented data. Fixed-bandwidth channels have been referred to as TDM channels, although all fixed-bandwidth channels are not necessarily used in TDM fashion. All discussions relating to PDH or T1/T3 apply to E1/E3 as well. Jha Expires April, 2001 [Page 5] INTERNET-DRAFT Hybrid Data Transport November, 2000 Since packet structure for Hybrid Data Transport protocol does not change across SONET/SDH and direct data-over-fiber networks, all packet descriptions for SONET also apply to data-over-fiber networks, unless stated otherwise. A detailed discussion of SDL (Simple Data Link) is not provided here. Please refer to reference readings [6] for details on this protocol. HDT works in a unified manner over any NBMA network configuration such as optical networks (both SONET and DoF) and DSL. Since the framing is consistent across these media, an HDT frame could travel from an LSR at CPE end to go over DSL lines to an access multiplexer to optical networks, providing an end-to-end MPLS-based protocols transport. This draft uses optical networks for all protocol discussions. 6. Organization of this Draft This draft first presents a brief overview of current methods and issues in multiservice transport over SONET networks. This is followed by a detailed description of protocol framing structure and operational examples for different networking configurations. Due to a greater complexity for multiprotocol transport in SONET networks this draft initially uses SONET to illustrate protocol functionality. Protocol operation and packet formats, however, do not change over SONET and direct Data over Fiber (DoF) networks. Hence all discussions apply equally for both types of networks. 7. Objectives for Multiservice Transport over Optical Networks As more and more packet data is being sent over SONET and direct data-over-fiber (DoF) optical networks a need has emerged for a unified way of sending all types of traffic over any mix of short and long haul optical networks. Desired features of such a protocol are, but not limited to: o Unified transport over any optical network configuration mix for any type of data. o Ability to transport multiple data types over a fiber (or a wavelength). o Support for native packet transport over any mix of optical networks o Use of a single control plane for link setup and transport of different types of data Jha Expires April, 2001 [Page 6] INTERNET-DRAFT Hybrid Data Transport November, 2000 o Sharing of a single MPLS LSP by multiple data types. o Consistent protocol routing as a packet travels across any mix of SONET and non-SONET networks. o Instant bandwidth reservation for any type of data on a packet- by-packet basis at any node with a low granularity (preferably 64kbps). o Data-independent support for MPLS based switching so intermediate nodes can switch and add/drop packets with only MPLS label- switching logic and minimal additional support. o Ability to use MPLS (or native protocol) based protection switching and fault-restoration for all types of data. o Add/drop operation for any type of packet at any node in an optical network. o Robust frame delineation over optical networks for any data type and packet size. 8. Conventional Approaches for Multiservice Transport Constraints in optical networking for packet data transport have made development of packet data transport protocols quite challenging. Until recently, SONET has been a primary form of optical networking, due to prevalence of TDM-style transport that was required for PDH (Plesiochronous Digital Hierarchy) channels such as T1/T3. With more packet (and ATM) data being sent over these networks new architectures needed to be developed. One of the primary considerations when sending packets over SONET is to be able to provide continued support for TDM-style channels such as PDH (T1/T3). Despite growing popularity of packet transport, TDM support must remain for the foreseeable future. This restriction has been a driving force in many techniques for preserving TDM-style partitioning of the SONET payload area. Regardless of SONET line speed, each byte in the SONET payload represents 64kbps (DS0) of bandwidth, since (irrespective of number of bytes in a payload) each byte inside the payload repeats every 125 microseconds. To provide TDM channels, a SONET payload is divided into many fixed-size slots called virtual tributaries (VT). Bandwidth of a VT is NxDS0, where N is the number of bytes in each VT. Location of each VT inside the payload is fixed, with each VT repeating every 125 microseconds. The size of an individual VT (and consequently its bandwidth) is significantly smaller than what is required by typical packet transport standards (such as 10Mbps LAN or higher). Special methods are needed to carry high-bandwidth LAN traffic over these low- bandwidth VTs. Jha Expires April, 2001 [Page 7] INTERNET-DRAFT Hybrid Data Transport November, 2000 Sending packet data over SONET involves two steps - allocating bandwidth on SONET for sending high-bandwidth packet data, and choosing an efficient transport protocol. 8.1 Bandwidth Allocation on SONET for Data Transport Different methods of multiservice transport over SONET use techniques that fall in one of these two broad categories: o Use one or more VT (virtual tributary) channels of SONET for data traffic, while remaining VTs continue to transport PDH (such as T1/T3) channels. Special methods are often used to combine the low-speed VTs to allocate higher bandwidth required for IP or ATM traffic. o Abandon support for T1/T3 altogether and use the entire SONET payload for sending IP or ATM cells. Support for PDH is provided through emulation techniques over IP or ATM. +-+-+-----+-----+---+-----+-----+ +-+-+-----+-----+---+-----+-----+ | | | | | | | | | | | | | | | | |T|P| | |<--GbE-------->| |T|P| | |<--GbE-------->| |O|O| | | | | | |O|O| | | | | | |H|H| STS | STS |...| STS | STS | |H|H| STS | STS |...| STS | STS | | | | 1 | 1 | | 1 | 1 | | | | 1 | 1 | | 1 | 1 | | | | | | | | | | | | | | | | | | | |<---ATM--->| | | | | | |<---ATM--->| | | | +-+-+-----+-----+---+-----+-----+ +-+-+-----+-----+---+-----+-----+ |<------ 125uSec Duration ------->| Figure 1: Combining lower-rate channels for high bandwidth transport These fixed-bandwidth circuits are provisioned for the entire session. Because of the way these are provisioned scalability of SONET networks is extremely limited, in addition to an inefficient use of SONET bandwidth. A session stays provisioned even when there is no data to send on the channel. The only way to reuse the channel is to re-provision it. For instance, if multiple STS-1s or STS-3s are combined (using inverse multiplexing or virtual concatenation) for sending gigabit Ethernet then those channels are dedicated for the purpose; no other data can be transmitted on the channels. 8.1.1 Virtual concatenation Jha Expires April, 2001 [Page 8] INTERNET-DRAFT Hybrid Data Transport November, 2000 In this method some VTs are used to transport data such as ATM and IP while others continue to carry usual PDH traffic. Since individual VTs have limited bandwidth (due to the small number of bytes in each VT), additional steps must be taken to send high bandwidth packet data. Virtual concatenation [8] combines several VT channels into a larger virtual pipe to carry higher bandwidth traffic. This technique allows creation of multiple, independent pipes that can carry different types of traffic with different bandwidth requirements. However, virtual concatenation allocates a fixed-bandwidth for packet traffic, and it cannot dynamically adjust bandwidth usage on a packet-by-packet basis. While it is possible to change the concatenated bandwidth through software in a reasonably short time, the bandwidth utilization is poor because bandwidth requirement of network changes on a packet-to-packet basis. Another problem with virtual concatenation is that while one virtual pipe may be overloaded with traffic, others may be underutilized; and virtual concatenation cannot dynamically adjust network loads on different channels. 8.1.2 Inverse Multiplexing In this method, data from a high bandwidth data source (such as a 10Mbps Ethernet LAN) is sent over multiple low bandwidth tributaries (such as VT1.5) using inverse multiplexing protocols. At the receiving end the high bandwidth data is later recovered by recombining packets from the low bandwidth streams. 8.1.3 Disadvantages of VT-based Packet Transports Jha Expires April, 2001 [Page 9] INTERNET-DRAFT Hybrid Data Transport November, 2000 ATM ^ | +-+-+ +------>| S |------+ | +-----| 6 |<---+ | | | +---+ | | | v | v +---+ +---+ Gigabit Ethernet ----+ S | | S +--> ATM OC-12----+ 1 | | 5 +--> T1 +---+ +---+ ^ | SONET ^ | | v | v +---+ OC-192 +---+ T1 ----+ S | | S +--> OC-12 Frame Relay ----+ 2 | | 4 +--> Gigabit Ethernet +---+ +---+ ^ | +---+ | | | +---->| S |----+ | +-------| 3 |<-----+ +-+-+ | | v Frame Relay Figure 2: Multiservice Transport Network Consider a SONET network in Figure 2 providing a multiservice packet transport along with TDM channels. In addition to terminating an OC- 12 traffic, this OC-192 ring connects gigabit Ethernet, frame relay, T1, and ATM networks. To achieve this type of multiservice transport traditional approaches allocate permanently (or semi-permanently - with configurable bandwidth under software control) separate circuits for carrying OC- 12, gigabit Ethernet, T1, and ATM traffic. Higher bandwidth transport capacity is achieved by inverse multiplexing high bandwidth data over many STS-1 channels, or by virtually concatenating various STS-1 channels [8]. Virtual tributaries are of a fixed-bandwidth, and their use as a single entity or as a group creates fixed-bandwidth (higher) channels. Packet-based traffic such as LAN is bursty in nature, and bandwidth usage changes drastically on a packet-by-packet basis. Average Jha Expires April, 2001 [Page 10] INTERNET-DRAFT Hybrid Data Transport November, 2000 bandwidth usage is typically quite low; resulting in significant waste when used over a fixed-bandwidth pipe. Whenever fixed-bandwidth channels are provisioned for transporting data packets the fiber capacity is poorly utilized. Statistically it has been observed that average use on a 10Mbps Ethernet link, for example, is only about 20% [9]. With inverse multiplexing or virtual concatenation, when 7 VT1.5 virtual tributaries (each VT1.5 is 1.5Mbps) are used to create a high-bandwidth pipe to carry 10Mbps traffic, only about 20% of this capacity is used on an average, and the remaining capacity of this pipe is unused most of the time. When virtual concatenation or inverse multiplexing is not used an entire STS-1 (51.84Mbps) channel needs to be allocated for transporting 10Mbps traffic. When an entire STS-1 is used the bandwidth efficiency goes down to about 4%, with the remaining 96% of the STS-1 channel capacity remaining largely unused for anything else. This problem of inefficiency was not of much concern when only PDH was transported since these PDH channels were used for leased lines and telephony applications. In this case, network capacity was increased as leased lines and telephony circuits increased. Because the number of these leased lines and telephony circuits wouldn't usually go down with time, network capacity would be always fully utilized. As more packet data is being transported on these SONET links more T1/T3 channels are being replaced. Because packet data bandwidth usage is low, a large portion of fiber capacity is inefficiently utilized. It is important to devise a protocol to take advantage of the bursty nature of packet data traffic and use the remaining fiber bandwidth for other applications or to add extra packet data traffic so the fiber is completely utilized. 8.1.4 Using entire SONET SPE for Packet Data Another way to use SONET for packet transport removes support of PDH (T1/T3) and uses the entire SONET payload for transporting data packets. No virtual tributaries (VT) are configured and the entire SPE (Synchronous Payload Envelope) is used for sending data such as IP and ATM. When ATM cells are sent, PDH (such as T1) support can be provided by ATM CES (Circuit Emulation Service) protocol. Techniques to support Jha Expires April, 2001 [Page 11] INTERNET-DRAFT Hybrid Data Transport November, 2000 PDH using IP are still evolving; these methods are tricky because IP packets vary in size from packet to packet and timing consistencies are hard to meet. Identification of the type of data sent inside the SONET SPE is done in following ways: o A parameter inside the Path Over-Head (POH) bytes (contained in byte C2) called the Path Signal Label (PSL) has a byte-wide value to denote the type of data being transported (such as POS or ATM). o With numerous new protocols for SONET evolving it is not possible to outline values in the PSL byte. An administrative configuration is used between nodes to establish the type of data being sent in the path section. Bandwidth usage limitations using this method are same as that of the VT approach for bandwidth allocation. Use of a fixed-bandwidth, either through VTs or the entire SONET SPE, always results in very low average bandwidth utilization when used for bursty packet data transport. 8.1.5 Bandwidth Considerations for Data over Fiber Networks For direct Data-over-Fiber (DoF) networks packets are sent using individual frame delineation without an encapsulating synchronous frame (such as that provided by SONET). With no timing or framing restrictions, each data packet (whether IP packet, ATM cells, or any other data type) is transmitted on its own with its own delineation. There are no bandwidth management requirements per se, allowing the fiber (or a wavelength) to be used to the capacity and limits of transmitting and receiving nodes. 8.2 Traditional Data Transport Protocols Once bandwidth has been allocated in the SONET SPE, different types of data can be sent on the different bandwidth channels. When virtual tributaries are configured (with or without virtual concatenation) individual VT channels can be used for sending different types of data. While one or more VTs are transmitting PDH, others could be sending ATM cells, while the remaining could be sending IP traffic, and so on. 8.2.1 Packet-Over-SONET (POS) Packet data such as IP packets (or other protocol types) is encapsulated using Packet-over-SONET (POS) protocol (rfc2615) [3] and delimited by HDLC framing. Many packets can be put inside a single Jha Expires April, 2001 [Page 12] INTERNET-DRAFT Hybrid Data Transport November, 2000 SONET SPE. Packet transport using POS only supports packet protocols such as IP and cannot support T1/T3 or ATM cells. Network connectivity and operation for POS is provided by PPP (Point- to-Point Protocol). On SONET networks, connections between end points are provisioned and administratively established for a point-to-point interface. All traffic originating from station A on a provisioned link will terminate on station B at the other end of the link. Inherent limitations of PPP make link and bandwidth management on SONET and DoF networks impossible. PPP cannot adjust link usage based on bandwidth requirements and actual traffic patterns. Consequently, while one VT may be overloaded with traffic others may be underutilized. With POS, a SONET network looks like a group of point-to-point links that are permanently configured with each link no relationship to another for traffic sharing and bandwidth management. 8.2.2 ATM VP Multiplexing Another way to fully utilize SONET bandwidth is to fill the payload area with ATM cells using a technique known as ATM VP (Virtual Path) multiplexing [9]. All types of data are encoded in ATM cells and then put inside the SONET SPE. Protocol characteristics of ATM are used for providing different types of services. For example, T1 is provisioned using ATM Circuit Emulation Service (CES). Similarly, IP and other protocols are transmitted using multiprotocol encapsulation over ATM [4]. Frame relay is transported using FR-ATM interworking protocols, and so on. Small fixed-size ATM cells also work efficiently with cross-connect or add/drop multiplexer devices to switch traffic to different destinations. Using ATM cells is an ideal way to route any type of service to any network. This is possible because once protocols are converted to ATM, all cells are of the same size and structure, and optical nodes can easily route them based on their VPI/VCI without having to look at the payload. However, there are many limitations in using ATM based multiplexing for multiservice transport over optical networks. Some of these are as follows: o Methods for OAM (Operations, Administration, and Maintenance) of ATM networks are different from that of SONET networks, and management of the two becomes quite cumbersome. o ATM network routing and switch-to-switch signaling data paths differ from IP network routes, resulting in significant Jha Expires April, 2001 [Page 13] INTERNET-DRAFT Hybrid Data Transport November, 2000 additional network operational complexity. o It has been found [1] that a good percentage of network traffic consists of IP packets with small packet sizes of around 40-44 bytes. With IP over ATM (rfc2684 - Multiprotocol Encapsulation over ATM AAL5), payload size slightly exceeds what can fit inside a single ATM cell. These result in transmission of two ATM cells, with the second cell that has mostly ATM overhead and stuffing bytes. This, with other ATM overhead, means having to allocate extra SONET bandwidth for data transport. o Using ATM for different services requires implementation of ATM interworking for all related protocols (such as IP-over-ATM, Frame Relay-ATM interworking, circuit emulation, etc.) 8.2.3 Ethernet-like Treatment of SONET To improve on limitations of POS (such as fixed-bandwidth provisioning, poor link management, etc.) many new protocols have been proposed that treat a SONET network as a big shared Ethernet- type network. Most of these protocols address nodes sharing the network as Ethernet end-points by using MAC addresses for identifying each other. In traditional SONET networks a frame originating from a node must travel completely around the ring until the sending node removes it, resulting in a waste of bandwidth. Ethernet-based approaches (an example of such a protocol is Spatial Reuse Protocol (SRP) [7]) avoid bandwidth waste by allowing the destination node to strip the packet based on a destination MAC address match, thereby allowing other downstream nodes to reuse the bandwidth freed up by the removed packet. This transport of Ethernet-like data, however, only work with packet- oriented protocols, and leave issues such as guaranteed bandwidth and QoS to the network layer protocols such as IP. Also, as with POS, a ring carrying these protocols cannot support PDH, real-time ATM traffic, frame relay, PPP, etc. SONET rings usually span a large area, and it would be greatly inefficient if these rings can only connect Ethernet networks at end points. These protocols do away with TDM-style operations of SONET to support requirements for newer packet-oriented data transport applications. Nodes are addressed using MAC addresses, with Ethernet-like framing. Packet transport achieves better bandwidth utilization because it benefits from statistical multiplexing of bursty nature of packet Jha Expires April, 2001 [Page 14] INTERNET-DRAFT Hybrid Data Transport November, 2000 data. In addition, destination nodes strip a packet from the ring, allowing bandwidth reuse on downstream link. In Ethernet-centric models, both control and data planes use Ethernet-like framing and IP-based operations are commonly used for sending all types of data. In addition to forcing IP for non-IP protocols, it requires complex hardware and software for MAC address lookup (for both slow and high-speed data paths), IP routing protocols, and other IP-based operations that were developed for traditional Ethernet networks. Numerous limitations of this approach make it a poor choice for an efficient, general-purpose data link layer for optical networks: o Use of Ethernet-like framing imposes specific layer 2 technology for all types of applications, resulting in complexity for non- Ethernet traffic such as ATM, frame relay, PPP, and raw byte streams. o An IP-centric model (common with Ethernet-style framing) does not blend well with transport of ATM, T1/T3, raw byte stream, lower rate SONET, and other data streams. These technologies require additional transport mechanisms over IP, and these services may not be possible to support if the complexity involved in the IP- based transport is high. o End-to-end resource allocation and signaling, performed by IP, does not work well with other protocols such as ATM or PDH. For instance, ATM networks require PNNI for signaling, and network routes and bandwidth allocations done by IP routing protocols and RSVP may not match the network configuration demanded by ATM networks. o In a similar way, routing of TDM-style traffic may not be possible by using IP techniques since providers of these services would like to use their specific routes for provisioning. o Development of MPLS does not help in multiservice transport using this approach since MPLS is carried inside Ethernet-type L2 frames. Different types of services must still be carried inside Ethernet-type frames. Even though an LSP is set up for transporting a certain type of traffic, each node must first look for an Ethernet MAC address match before looking for an MPLS label match. Over a long-haul optical network mix, transporting data from one end- node to another requires an all-IP network operation. Multiservice transport would require conversion at every network boundary and a lot of interworking operations is required. 9. A Unified Data Link Layer for Optical Networks Jha Expires April, 2001 [Page 15] INTERNET-DRAFT Hybrid Data Transport November, 2000 Data link layer-networking requirements for optical networks are different from those of legacy networks such as Ethernet. In Ethernet, a packet originating from a source node goes on a link where many stations receive the packet at the same time. On Ethernet networks, nodes use hardware with MAC-address lookup capability so only those packets that are meant for the node are processed and received. Other nodes get the packet at the same time but ignore it once they look up the initial few bytes to check for MAC-address match. +===+ +---+ _+ W | +------>| S |-----+ /-+ 3 | ----+ | +-----| 4 |<---+| / +=+=+ | Tx| | +---+ || / | ----+ +---+ | | Rx || / | | | |==+ Rx +---+ +---+ Rx +===+ +=+=+ ----+ | A |W +=====+ S | | S +=====+ W |---+ W | +--+ +1 +=====+ 1 | SONET/SDH | 3 +=====+ 2 | | 4 | LAN | | |==+ Tx +---+ +---+ Tx +===+ +=+=+ ----+ +---+ ^ |Tx +---+ || \ | | Rx| +---->| S |----+| \ | ----+ +-------| 2 |<----+ \ +=+=+ + --+ \--+ W | + 5 | +===+ Figure 3: Data Link Operation for Optical Networks and LANs On an Ethernet network, therefore, it is essential to have a consistent data link layer mechanism based on MAC addresses. The purpose of a MAC address is to have one particular node out of many that get the packet on the network. The situation is quite different in the case of optical networks. An optical link terminates at a single node, and the node examines the packets to determine their destination(s). Unlike OXC (Optical Cross- Connects) nodes, where an entire optical link is switched without looking inside the packets, normal optical switching nodes perform an O-E-O operation for switching packets onto optical links. Packets are delivered to a node through an optical link. These are converted into electrical signals and analyzed (after appropriate frame extraction, such as for SONET networks) and are then switched to the appropriate output port where the packets are converted back to optics. In Figure 2, for example, all packets coming from SONET to W2 must be examined electrically before deciding which of the downstream nodes (W3, W4, or W5) is the recipient. The same is true Jha Expires April, 2001 [Page 16] INTERNET-DRAFT Hybrid Data Transport November, 2000 of all the optical nodes in this network. On the LAN side, the situation is different, where all nodes receive packets at the same time. This node-to-node data transport is true also for SONET rings where packets are delivered to the receive port of a node and then are sent out on the appropriate transmit port. Since packets travel from node-to-node and not to many nodes at one time, data link requirements can be simplified for the case of optical networks. With these characteristics in mind it is possible to design a unified data link layer for optical networks. With a small core header for parameters necessary for NBMA networks it is possible to use existing data link layer and protocol technologies and provide a true multiservice transport for all types of optical networks. Being able to support all legacy protocols and data formats over optical networks is quite attractive, since it preserves all native data traffic over short and long haul optical networks. 9.1 Data Link Header for Control Packets MAC addresses are used for sending control packets and for node addressing. This is to allow use of IP-based control protocols for node adjacency discovery, route selections, MPLS label assignment and distribution, and any other signaling operations. MAC addresses are also present in initial packets for a session. Since control packets are infrequent there is no need for extensive hardware for MAC address lookups. One can process the control packets and initial packets with simple hardware and software solutions. 9.2 Data Link Layer for Data Packets Once an LSP has been set up and labels have been assigned for a flow, there isn't a need for using MAC addresses for sending data packets on optical networks. On optical networks, once MPLS label(s) have been set up for an LSP, the MPLS labels themselves can serve as data link layer addresses, as labels have per-interface uniqueness between two nodes. For normal data (with no MPLS set up as yet), usual data link addressing would work just fine. All that is needed is a consistent way of transporting MPLS labels as data link header so receiving Jha Expires April, 2001 [Page 17] INTERNET-DRAFT Hybrid Data Transport November, 2000 nodes can tell whether they should process MPLS labels or the native packet data link header for processing. Once this is done, MPLS labels becomes addressing mechanism for high- speed data switching (regardless of payload), and a lot of operations can be easily done on networks: o Optical network links can be treated as label switched paths regardless of type of data o Multiple data types can be sent on an LSP o Segments of network can be LSPs, and protection mechanisms using MPLS can easily be applied for all types of data o Existing nodes with native packets can all participate in the networking, without being forced to adapt to Ethernet-style framing as some protocols would require them to do o Fault recovery and restoration can be done for all data types o Backup links can be used as alternative LSPs, and these backup links can be used for data traffic to maximize network resources. 10. Hybrid Data Transport The draft proposes a data transport method, called Hybrid Data Transport (HDT) protocol, which provides a unified multiservice data- transport mechanism for optical networks. Using HDT, an optical network can be used to its full capacity with dynamic bandwidth management on a packet-by-packet basis. HDT supports end-to-end native data transmission of any type (such as Ethernet, IP, ATM, PDH, raw data, etc.), spatial reuse of bandwidth, instant bandwidth allocation (with 64kbps granularity), and seamless operation over point-to-point and ring networks with any mix of SONET or direct data-over-fiber (DoF) networks. Multiple types of data (such as IP, Ethernet, PPP, frame relay, ATM, even raw byte stream can be sent over a single fiber link (such as using WDM) or inside a SONET SPE. In Figure 4 below, multiple networks are connected using a SONET ring. Nodes on the ring can terminate packets and cells using data link information contained in layer 2. A gigabit Ethernet packet entering at S-1 is received at S-4 when a destination MAC address match is found. Intermediate nodes S-2 and S-3 need not have any interfaces with gigabit Ethernet support. These nodes detect an Ethernet frame and pass the frame on. Before forwarding the packet, these and other intermediate nodes decrement a TTL (Time-to-Live) value in the frame header. Jha Expires April, 2001 [Page 18] INTERNET-DRAFT Hybrid Data Transport November, 2000 ATM (VPI/VCI) ^ | +-+-+ +------>| S |-----+ |+------| 6 |<---+| || +--++ || || | || (MPLS or MAC) +---+ VPI/VCI +---+ Gigabit Ethernet ----+ S |\/-----+----| S +--> ATM (VPI/VCI) (VPI/VCI) ATM ----+ 1 |/\ -------| 5 +--> T1 (MPLS) +---+ \ / +---+ || \ || || / \MAC || +---+ / \ +---+ (MPLS) T1 ----+ S |/ \ | S +--> ATM (VPI/VCI) Frame Relay ----+ 2 |\ DLCI \----| 4 +--> Gb Ethernet (DLCI) +---+ \---\ +---+ (MAC or MPLS) ^| +-\-+ || |+----->| S |----+| +-------| 3 |<----+ +-+-+ |(DLCI) | v Frame Relay Figure 4: Multiple Native Data Link Layer Support on a Fiber When the frame reaches S-4, the node detects that the frame contains an Ethernet packet. Destination MAC address is compared for a match and the frame is taken off the ring. At the same time that a gigabit Ethernet frame is traveling inside a SONET SPE - ATM cells, frame relay, T1/T3, lower-rate SONET, or any other data stream could also travel within the same SPE and reach their destination node. In a ring configuration, packets can be destination-stripped to achieve bandwidth reuse. The packet stripping at a destination node can be done on the basis of data link layer information contained within the native packet, if available. Examples of such link layer parameters are - Ethernet MAC address, ATM VPI/VCI, frame relay DLCI, an MPLS label stack as a shim header, etc. Jha Expires April, 2001 [Page 19] INTERNET-DRAFT Hybrid Data Transport November, 2000 If no nodes receive the frame, the TTL counter inside the frame eventually expires and the frame is taken off the ring (or a point- to-point fiber). On ring networks frames can be source-stripped instead of waiting for a TTL count expiration for frames that contain packets in which sending node address can be identified (such as for Ethernet). The sending node gets the frame back when no other node has received and taken the frame off the ring. It then compares source MAC address to its own address, and if a match is found it takes the frame off the ring. Multiservice operation on data-over-fiber networks is similar to that of a SONET ring. Multiple frames containing different types of packets can travel on the same fiber (or wavelength) to maximize available bandwidth on the link. Intermediate nodes can switch these frames based on the protocol(s) supported. With the OPTIONAL support for MPLS within the data transport frame structure (external to the native data packet), it is possible to provide multiservice transport with a unified MPLS switching. This OPTIONAL support for MPLS works independently of and/or in conjunction with traditional MPLS shim headers. +-+-+ | | +----------+ /----+ C + | Ethernet | / | | +----------+ +---------+ +-----------+ 2 / +---+ | Ethernet| | ATM cells | / +---------+ +-----------+ / +--++ +-+-+ / +-+-+ ------->| | 1 | | / 3 | | +------------+ ------->| A +--------+ B +------------+ D + |Frame Relay | ------->| | | | \ | | +------------+ +---+ +---+ \ +---+ MPLS \ 4 +------------+ Switching \ |Frame Relay | \ +------------+ \ +-+-+ \ | | +-----------+ \- -+ E + | ATM cells | | | +-----------+ +---+ Figure 5: Multiservice Transport with Unified MPLS switching Jha Expires April, 2001 [Page 20] INTERNET-DRAFT Hybrid Data Transport November, 2000 This MPLS support is payload-independent in frame structure and data transport characteristics. This results in a unified architecture for all payload types for fault resilience, bandwidth sharing, and control and data plane support for optical networks. For a payload that does not have any specific data link layer - such as TDM, raw byte stream, SONET/SDH, MPLS can be used for routing the data across optical networks. Subsequent sections in this draft document the framing structure, types of WAN configurations that can support this transport protocol, and a detailed functional description of each feature. 11. Motivations for HDT o HDT is being proposed as a common payload transport mechanism for all types of optical networks that work with or without MPLS support. o One motivation for this protocol is to design general-purpose MPLS-based optical switches and wavelength switching devices that work on MPLS label alone, without having to worry about all the legacy end-node protocols such as Ethernet, frame relay, ATM, PPP, AppleTalk, SONET/SDH, and even raw data stream. o Continued support for SONET/SDH and PDH for all networks. Many newer protocols are gravitating towards Ethernet-based models and they would no longer support any other type of data. HDT is designed to work with all types of packets. o Instant bandwidth allocations anywhere in a SONET network with smallest possible granularity of 64kbps (DS0). o A consistent and unified control plane for setting up light paths with traffic engineering parameters for sending all types of data. o Support for alternative LSP setups that can carry all types of critical data on backup links, instead of individual LSPs for different protocols. o Common LSPs for traffic-engineered links for different protocols leading to fewer refreshes in RSVP-TE at each node. 12. Advantages of HDT o A unified transport with the same size and structure regardless of the type of data inside the payload area. o All existing data link layer technologies can be used without any modifications. o Frame structure is unchanged as the frame travels from a point- to-point network to a ring network, across any mix of SONET and non-SONET framing. Jha Expires April, 2001 [Page 21] INTERNET-DRAFT Hybrid Data Transport November, 2000 o A single fiber link can be used for sending different kinds of traffic to the full capacity of the link. o There is no need to set up VT or sub-VT channels in advance. ATM cells, IP (and other protocols) packets, PPP, frame relay, NxDS0, T1/T3 and others can be mixed inside the SPE on a packet-by- packet basis. o PDH channels such as T1/E1 can be dynamically allocated anywhere inside the SONET payload in 64kbps bandwidth increments. Bandwidth can be reserved with 64Kbps granularity for any data type on a packet-by-packet basis. o All types of network protocols can use a SONET SPE or a WDM link. HDT can take many packets of one or different data types and put them inside a single SONET SPE or data-over-fiber frame while preserving the time dependency of data packets such as PDH on a dynamic basis. o It is not necessary to terminate the whole payload capacity at each node. o Any type of packet can be dropped at a node and a similar or a different type of traffic can be added to the SONET payload. For instance, an IP packet can be dropped at a node and the node can reuse the packet area for inserting ATM cells, frame relay, PDH traffic, or even a raw bit stream. The new packet can use the packet area instantaneously for a fixed-bandwidth allocation (bandwidth equal to NxDS0, where N is the number of bytes in the payload space) regardless of the type of packet. o Direct data-over-fiber networks can be easily supported without HDT frame structure changes, with full link monitoring and fault management. o Protocol-independent support for MPLS labels. Optical nodes can switch and add/drop packets with only MPLS label-switching logic and some extra hardware for initial packets. The nodes do not need to implement high-speed protocol parsers just to get to the MPLS labels. o Any node (or set of nodes) on the SONET network can use variable portions of SONET payload for creating fixed-bandwidth channels for any type of data. Other nodes on the network beyond this set of nodes can use same sections of the SONET payload for other packets with best-effort packet data delivery. 13. HDT Frame Structure Overview A payload is encapsulated with a header that provides identification of the packet and OPTIONALLY may contain MPLS labels and/or OAM bytes. A detailed view of the frame structure is shown below. Detailed bit descriptions are given later in this draft in [26]. Jha Expires April, 2001 [Page 22] INTERNET-DRAFT Hybrid Data Transport November, 2000 +===============+-----------------------------------------+ (bytes) | | Core Header (cHdr) (32 bits) | 4 | +-----------------------------------------+ | |(Optional) Next Fragment offset (32 bits)| 4 | Payload +-----------------------------------------+ | Header (PH) |(Optional) MPLS/OAM bytes (N x 32 bits) | N | +-----------------------------------------+ | | Core Header CRC cHEC (CRC-16) | 2 +===============+-----------------------------------------+ | | | | User Payload ~~~ ~~~ | Data | Payload data | | ~~~ ~~~ | | | | +-----------------------------------------+ | |(Optional) Payload CRC (CRC-32 bits) | 4 +===============+-----------------------------------------+ Figure 6: HDT Frame Structure with Native Payload Transport The frame structure shown in Figure 6 is uniform for all types of payload data. Core encapsulation consists of a 32-bit header (cHdr) and a 16-bit CRC-16 (cHEC) on the cHdr bytes. The header structure is uniform across all packet types and optical networks (both SONET and DoF). 13.1 Payload Header (PH) Structure 13.1.1 Core Header (cHdr) +=====+------+-----------------+------+ |cHdr | cHEC | ....Payload.... | pCRC | +=====+------+-----------------+------+ 4 2 4 (bytes) The core header precedes every packet (with the OPTIONAL exception of single ATM cell transport when using SDL frame delineation; described later) and carries all information required for payload identification, bandwidth management, and switching of packets across any mix of optical networks. The minimum length of a core header is 6 bytes. 13.1.2 Next Fragment Offset Jha Expires April, 2001 [Page 23] INTERNET-DRAFT Hybrid Data Transport November, 2000 It is possible to instantly provision bandwidth for any data type at any point in a SONET ring using HDT. As described later in this draft, when a packet occupies a fixed location within a SONET SPE, another newly added packet may need to be fragmented to go around the fixed packet(s) to use any free space in provisioned area in the SPE. When this fragmentation happens, HDT header contains a 32-bit offset to mark the beginning of next fragment. +-----+=================+----------+------+-----------+------+ |cHdr | Fragment offset | ..MPLS.. | cHEC |..Payload..| pCRC | +-----+=================+----------+------+-----------+------+ 4 4 Nx4 2 4 (bytes) (OPTIONAL) (OPTIONAL) This offset is taken from the start of current packet (or packet fragment). A 32-bit offset is required to be able to locate fragments that could be farther than 65,536 bytes apart in a SONET SPE (for OC- 192 and above). 13.1.3 Header with MPLS labels +-----+==========+------+-----------------+------+ |cHdr | ..MPLS.. | cHEC | ....Payload.... | pCRC | +-----+==========+------+-----------------+------+ 4 Nx4 2 4 (bytes) (OPTIONAL) Bits inside cHdr indicate presence of MPLS labels and any other optional parameters. While native payload data such as Ethernet, ATM, PPP, and frame relay can continue to carry MPLS as a shim header, MPLS labels MAY be independently and simultaneously stacked between cHdr and cHEC fields. These labels can be the same ones that would otherwise have been put inside the shim header, or these labels could belong to an intervening VPN or a service provider network. Instead of building specialized data link layer logic and then applying MPLS labels for switching packets, intermediate nodes can switch packets in high-speed data path by just using MPLS labels. Once MPLS labels have been configured for a path using RSVP-TE, CR- LDP, or generalized MPLS signaling [9], multiple packet types can be sent using same the MPLS label, if so desired for bandwidth sharing. Because MPLS label switching is done _outside_ the payload (rather than by inserting the MPLS label stack between the data link and the network layer), traditional data link layer limitations do not apply (and need not apply, as described later) to optical nodes. Jha Expires April, 2001 [Page 24] INTERNET-DRAFT Hybrid Data Transport November, 2000 From an ATM network, ATM cells (as one cell or a collection of cells) can be switched and sent over non-ATM networks over traffic- engineered MPLS links to another ATM network. Similarly, native Ethernet packets from one local area network can be MPLS-switched across any mix of optical networks to another local area network without any modification to the original packet. Because MPLS labels can be used for packet switching regardless of the type of packet, efficient and economical optical fiber protection schemes based on MPLS [5] can be utilized for efficient protection switching. Since there is no dependence of these labels on the type of packet, all data can be switched on the protection fiber. Traffic from some sources (containing any type of packet such as ATM, Ethernet, multimedia stream, PDH, etc.) can be prioritized by putting these on labels that are switched on high speed links while other flows are switched on low speed links. 13.1.4 Header with OAM bytes +-----+========+------+-----------------+------+ |cHdr | ..OAM..| cHEC | ....Payload.... | pCRC | +-----+========+------+-----------------+------+ 4 M 2 4 (bytes) (OPTIONAL) On data-over-fiber networks (such as WDM) OAM bytes can be sent in- band with payload data. These bytes MAY be sent between the cHdr and cHEC fields. The OAM bytes need not accompany every payload but are inserted only when required. The frequency of inclusion of these network OAM bytes for administration and health monitoring is typically quite low compared to packet transfer rate, especially at high data rates. 13.1.5 MPLS labels followed by OAM bytes +-----+==========+==========+------+-----------------+------+ |cHdr | ..MPLS.. | ..OAM.. | cHEC | ....Payload.... | pCRC | +-----+==========+==========+------+-----------------+------+ 4 Nx4 M 2 4 (bytes) (OPTIONAL) It is also possible to provide for transport of MPLS labels followed by OAM bytes. Using this method, in-band OAM information can be sent on a specific LSP. In addition to normal in-band OAM operations, out- of-band OAM can be performed using a route involving a different Jha Expires April, 2001 [Page 25] INTERNET-DRAFT Hybrid Data Transport November, 2000 fiber (or a wavelength) or a low bandwidth LSP while high-speed data travels on high-bandwidth links. End of MPLS labels (and beginning of OAM bytes) is determined by end- of-stack bit in the bottom label. 13.1.6 Core Header HEC (cHEC) +-----+----------+----------+======+-----------------+------+ |cHdr | ..MPLS.. | ..OAM.. | cHEC | ....Payload.... | pCRC | +-----+----------+----------+======+-----------------+------+ 4 Nx4 M 2 4 (bytes) (OPTIONAL) Core Header HEC is a CRC-16 (2 bytes) computed over all 4 bytes of cHdr, and any MPLS and OAM bytes. 13.2 User Payload User payload area immediately follows the cHEC field in HDT payload header (PH). 13.2.1 Payload The payload can contain any data type from end-nodes or core networks. The user payload data MAY be followed by payload CRC. Indication of whether a CRC field is present after payload is contained in the cHdr fields. A few sample types are as follows: o Ethernet & IEEE802.3, with or without MPLS shim header o ATM cell(s) o PPP, Frame Relay, with or without MPLS shim header o MAN networking and newer data transport protocols over SONET such as SRP [7] and other variants o SONET/SDH o T1/T3 o Raw byte stream 13.2.2 Payload CRC (pCRC) If a CRC field is present it MUST be based on standard ITU-T CRC-32. Presence or absence of pCRC is indicated in a cHdr field parameter. Jha Expires April, 2001 [Page 26] INTERNET-DRAFT Hybrid Data Transport November, 2000 14. HDT Framing Examples Frame structure for some common data types are as follows: Native packet data: +=====+======+-----------------------------+------+ |cHdr | cHEC | Ethernet/IP/PPP/Frame Relay | pFCS | +=====+======+-----------------------------+------+ 4 2 4 ATM Cells: +=====+======+-----+-----+-----+-----+-----+-----+ |cHdr | cHEC | ATM | ATM | ATM | ATM | ATM | ATM | +=====+======+-----+-----+-----+-----+-----+-----+ 4 2 53 53 53 53 53 53 ATM Cells with external MPLS: +=====+==========+======+-----+-----+-----+-----+ |cHdr | ..MPLS.. | cHEC | ATM | ATM | ATM | ATM | +=====+==========+======+-----+-----+-----+-----+ 4 N 2 53 53 53 53 Packet Data with external MPLS & OAM: +=====+======+======+=====+--------------------------------+------+ |cHdr | MPLS |.OAM. |cHEC | Ethernet/Frame Relay/PPP, etc. | pFCS | +=====+======+======+=====+--------------------------------+------+ 4 N M 2 4 Ethernet with MPLS shim header, OAM, and external MPLS stack: +=====+======+======+=====+----+-----------------+----------+------+ |cHdr | MPLS |.OAM. |cHEC | L2 |MPLS Shim Header | IP, etc. | pFCS | +=====+======+======+=====+----+-----------------+----------+------+ 4 N M 2 <--- Ethernet Frame with MPLS ----> 4 o ATM cells are transported in their entirety. o Ethernet packets are sent natively, with or without MPLS shim headers. o The HDT MPLS stack is external to the actual payload for all cases, and its use is OPTIONAL. 15. Transmission of HDT Frames 15.1 SONET Networks HDT frames are transmitted in the payload area. Consequently, these frames can be sent over an entire SPE in a concatenated mode, or over virtual tributaries (with or without virtual concatenation). Each Jha Expires April, 2001 [Page 27] INTERNET-DRAFT Hybrid Data Transport November, 2000 frame consists of a header and a payload (with the exception of a single ATM cell, when used with SDL delineation). 15.1.1 Concatenated SONET +---+---+------------------------------------------------+ | | | ++--------++-----------++----------++-------+ | | T | P | || T1 || IP ||ATM cells || NxDS0 | | | O | O | ++--------++-----------++++--------++-------+--+ | H | H | ||PPP ||IP (Fixed BW)|| Raw Data Stream | | | | ++--------++--++---------++-----------++-------+ | | | || Ethernet || Low rate SONET/SDH || PPP | | | | ++------------++-++-----------++------++-------+ | | | ||Frame Relay || Ethernet || ATM cells | +---+---+-++---------------++-----------++---------------+ Figure 7: Multiple Data Types in a single SONET SPE Frames are sent inside the SONET payload area, as shown above. Each frame has a header as described earlier. Frames are allowed to cross over SPE boundaries. 15.1.2 Non-concatenated - Inverse Multiplexing With non-concatenated SONET, HDT frames can be sent over one or more Virtual Tributaries (VT) or lower-rate SONET channels. While one or more VTs (or channels) carry HDT frames, others can continue to carry PDH, ATM, or POS (Packet-over-SONET) data as usual. End nodes as well as intermediate SONET ADMs can add/drop channels without any changes. To create higher data rates, frames can be sent over multiple channels by using inverse multiplexing over the channels. 15.1.3 Virtual Concatenation over SONET To create higher data rates, one or more lower-rate SONET channels may be combined using virtual concatenation to create a high- bandwidth pipe. This pipe can then be used for sending HDT frames, while other channels continue to send other data. Virtual concatenation only requires modification in SONET hardware at the two end nodes, with full transparency to intermediate nodes. 15.2 Data-over-Fiber and WDM Networks On Data-over-Fiber (DoF) links, HDT frames are transmitted just the same way as they are done over SONET, except that there is no encapsulating envelope for housing the frames. Jha Expires April, 2001 [Page 28] INTERNET-DRAFT Hybrid Data Transport November, 2000 HDT frames can travel across a mix of SONET and DoF without any frame modifications. SDL framing mechanism uses length/CRC pair as a header and a frame delimiter. Its robust CRC and frame locator technique makes it a good candidate for direct data-over-fiber networks without SONET framing. It is possible that over time, use of OAM packets will obviate the need for complex SONET framing and link management overheads. In the case of direct data-over-fiber networks the HDT protocol structure will run unmodified over fiber. Because HDT packet framing is the same for both SONET and non-SONET networks, design of optical nodes in a long-haul network with a hybrid of WDM point-to-point networks without SONET framing and a complex interconnection of SONET rings. 16. Protocol Implementation Considerations +--------------------+ +-----------------+ Legacy | | | | Legacy System | ATM IP PPP T1 | | ATM IP PPP T1 | System | \ | | / | | \ | | / | +--------------------+ +-----------------+ \ | | / \ | | / +----------+ +-----------+ | MPLS | | MPLS | (Optional) | Switching| | Switching | (Optional) +------+---+ +-----+-----+ | | | | +-------+------+ +-------+------+ | HDT Encaps | | HDT Encaps | | (HDLC/SDL | | (HDLC/SDL | | Framing) | | Framing) | +-------+------+ +-------+------+ | | +-+-+ \ / +-+-+ | | \/ | | | A +===~~~~~~=====+ B | | | /\ | | +---+ / \ +---+ (Mix of SONET & DoF) Figure 8: Multiservice Transport Network using HDT A typical system network configuration is shown in Figure 8. Jha Expires April, 2001 [Page 29] INTERNET-DRAFT Hybrid Data Transport November, 2000 Implementing HDT transport in systems require just one additional logic added at their input and output ports for HDT encapsulation. Since HDT allows MPLS label transport that is independent of layer-2 protocol, a generic MPLS switching logic can be used for switching all types of data. Use of SDL requires a hardware implementation for SDL delineation and frame synchronization. If HDLC is used for frame delineation existing hardware can be used for generating and processing HDT packets. Once a packet has been de-encapsulated, normal hardware and software operations can be performed for processing native packets such as Ethernet, ATM, frame relay, PPP, PDH, and any other byte stream. 17. Preserving Payload CRC for Robust Transmission Using packets with traditional MPLS shim header structure to short and long haul optical networks has serious implications in line error detection and fault location. When a MPLS shim header is used (between layer 2 and layer 3 fields), the payload CRC (for the entire packet) needs to be re-computed at every node where MPLS label add/drop/swap operation takes place. Any data corruption problems (due to memory problems or any other device failure) within the node before the packet is sent on output port are permanently hidden as the packet travels downstream. Even if a data corruption problem is detected, it is almost impossible to find out which node caused the error. As payload sizes are becoming larger (such as for jumbo frames, and multiprotocol over frame relay and ATM where the payload size can be about 9K), probability of errors due to node hardware errors while bytes are being stored and sent out on the transmit port becomes higher. 17.1 Payload Error Suppression with MPLS Shim Headers Let us consider a case of MPLS shim header operation over an optical network. Jha Expires April, 2001 [Page 30] INTERNET-DRAFT Hybrid Data Transport November, 2000 +---+ +-+-+ +-+-+ +-+-+ | | 1 | | 2 | | 3 | | ------>| A +------+ B +---------+ C +----------+ D +----> | | | | | | | | +---+ +---+ +---+ +---+ +----+------------------+----------+------+ | L2 | MPLS Shim Header | IP, etc. | pFCS | +----+------------------+----------+------+ <--- Ethernet Frame with MPLS --> 4 At node A: o Payload enters A. o Payload CRC is verified for correct frame. o After MPLS label lookup, label add/drop/swap done o Internal memory or device error causes packet data corruption within A o As node A sends the packet out (with new MPLS label), payload CRC is re-computed after the last payload byte is sent. At node B: o Payload enters B. o Payload CRC is verified for correct frame. o The payload in error is received as a correct frame. o MPLS label operations done o Packet sent out with a re-computed CRC o Errors are hidden for the node and all downstream nodes. At downstream Nodes: o C & D may never find out if there was an error, especially if the byte errors happen within the payload data area and not the network headers. o Even if C, D, or other downstream nodes find out the error, they cannot isolate the faulty node. This is because no node ever found an error. o If payload error occurs infrequently it may be even harder to isolate the faulty node. 17.2 Fault Detection and Isolation with an External MPLS stack An ideal solution is to check the CRC at input, but not to re-compute the CRC on packet output. An efficient way to do this is to separate header CRC from payload CRC. This way, header CRC is re-computed easily and quickly at intermediate nodes while the payload CRC is preserved end-to-end. Jha Expires April, 2001 [Page 31] INTERNET-DRAFT Hybrid Data Transport November, 2000 +=====+======+======+=====+----+-------------------------+------+ |cHdr | .. MPLS .. |cHEC | L2 | IP, etc. | pFCS | +=====+======+======+=====+----+----------+--------------+------+ 4 N M 2 <-- Ethernet Frame with pFCS ---------> In HDT, MPLS labels are carried inside the payload header and the payload header bytes have their own CRC for error checks. As packets go from node to node, only the cHEC value changes when MPLS labels are modified, leaving payload CRC (pFCS) value unchanged. Whenever a node causes byte corruption in payload data, the downstream node will immediately find the error and discard the packet. 18. Data Transport over Optical Networks using HDT 18.1 Protocol-independent operation for MPLS In current MPLS stack encoding methods [11] MPLS labels are carried inside a packet, their existence known either by a unique protocol identifier. For multiservice transport over optical nodes, however, having to go deeper into the frame at intermediate nodes to get MPLS tags requires that all nodes be protocol-aware for all types of protocols. For instance, to learn of Ethernet MPLS shim headers, PPP MPLS, etc., the node must know the packet protocol to find out whether MPLS labels are present and how they are encoded inside. While physical medium-dependent link layer protocols such as Ethernet must have an Ethernet-style framing with embedded MPLS labels, optical transport and switching nodes need not do so. Because MPLS labels are used for switching packets at nodes and are not part of the payload, having pure MPLS fields outside the payload simplifies node design significantly. Using this technique it is possible to design high-speed switching logic at nodes without having to incorporate protocol-specific knowledge for packets. Notice that HDT is part of an optical framing, just as POS (Packet-Over-SONET) is for data packets. MPLS labels can be looked at and processed by optical nodes outside the actual payload. If optical nodes are MPLS capable, for instance, packets can be carried from network-to-network without performing an inter-working function (IWF) at the network boundaries. 18.2 Transport without MPLS With its support for per-packet payload identification and robust frame delineation, HDT is able to send multiple types of packet over any mix of SONET and data-over-fiber networks. Jha Expires April, 2001 [Page 32] INTERNET-DRAFT Hybrid Data Transport November, 2000 +---+ +---+ ATM ----+ | | +---- ATM Ethernet ----+ | | +---- Ethernet | | Any mix of | | Raw Data ----+ A +================+ B +---- Raw Data | +================+ | PPP ----+ |SONET and/or DoF| +---- PPP Frame Relay --+ | | +---- Frame Relay +---+ +---+ Figure 9: Data Mixing over same Fiber Since each data type is clearly identified through a uniform header, all types of packets can be mixed over the same fiber (or wavelength) with or without SONET support. 18.3 Transport using MPLS The use of MPLS in HDT header is OPTIONAL. HDT supports current MPLS label stack encoding schemes in addition to, or in lieu of, an external MPLS label stack for data transport over optical networks. 18.3.1 MPLS as a Shim Header within Layer 2 Frames In this case, while HDT provides payload header for payload identification, bandwidth allocation, and other parameters, MPLS stack is carried in a shim header over native protocols such as Ethernet, ATM, Frame Relay, or PPP. With this method, networks can transmit multiple native protocol packets over the same fiber (or wavelength) and use MPLS switching with native data packets. 18.3.2 MPLS in Payload Header for True Multiservice Transport True multiservice transport with traffic engineering can be achieved with the use of MPLS in HDT header fields. De-linking MPLS from layer 2 dependencies over optical networks allows the use of MPLS without constraints of traditional data link layer technologies. Any type of payload including PDH, SONET/SDH, IP, ATM, and raw byte stream can be switched using MPLS. MPLS labels are carried in an outside header that contains payload identification, among other parameters. Jha Expires April, 2001 [Page 33] INTERNET-DRAFT Hybrid Data Transport November, 2000 +---+ +---+ ATM --------+ | | +-------- ATM Ethernet --------+ | | +-------- Ethernet | +------------+ | Raw Data --------+ A | LSP | B +-------- Raw Data | +------------+ | PPP --------+ | | +-------- PPP Frame Relay ------+ | | +-------- Frame Relay +---+ +---+ Figure 10: Statistical Multiplexing of Services over the same LSP Because HDT provides per-packet payload identification, the same MPLS labels (belonging to an LSP) can be used for sending different LAN/WAN data types, including raw data stream. A bounded set of LSPs, limited only by provisioned link capacity, can be created with different bandwidth allocations, with each LSP carrying multiservice traffic. 18.3.3 Interconnection of MPLS Networks over Public Networks /--------------------\ / \ / +---+ \ / +------>| S |------+ \ ----+ / |+------| 4 |<----+| \ +---- | / Tx|| +---+ || \ | ----+ / ||Rx || \ +---- | +===+ Rx +---+ Pt-to-Pt +---+ Rx\ +===+ | ----+ | R |=====| S | & | S |=====| R |---+---- +-----+ 1 |=====| 1 | Ring Network| 3 |=====| 2 | | LAN | +===+ Tx +---+ +---+ Tx/ +===+ |LAN ----+ \ ^|Tx +---+ || / +---- | \ Rx|+----->| S |-----+| / | ----+ \ +-------| 2 |<-----+ / +---- \ + --+ / MPLS Shim \ PUBLIC NETWORK / MPLS Shim Header \---------------------/ Header External MPLS Stack LSPs for carrying (ENTERPRISE NETWORK) Different Data Flows (ENTERPRISE NETWORK) Figure 11: Interconnection of MPLS domains Jha Expires April, 2001 [Page 34] INTERNET-DRAFT Hybrid Data Transport November, 2000 By using external MPLS labels with HDT header, two private enterprise internal MPLS-based networks (working on legacy protocols with MPLS shim header) can be connected over a public network. In this case, LSRs on the two private networks shown in Figure 11 use MPLS stack spanning the two networks in a transparent fashion. When the packet enters a public network, LSRs R1 and R2 running HDT use another label stack 'outside' the native packet from private networks, as shown below. +=====+======+======+----+-----------------+----------+------+ |cHdr | cHEC |.MPLS.| L2 |MPLS Shim Header | Payload | pFCS | +=====+======+======+----+-----------------+----------+------+ 4 2 N <--- Ethernet Frame with MPLS ----> 4 (PUBLIC) (ENTERPRISE) Figure 12: MPLS label stacks for interconnection of domains Packets coming from private networks do not, however, need to have MPLS stack _ native packets can be transported along with packets with shim headers since MPLS switches within the public network wouldn't look at the payload. Since HDT allows multiservice transport, many different types of data can be put in an LSP in the public network. The public network may set up different LSPs with various traffic engineering and QoS parameters and put different types of packets from the private network to be carried over the public network. 19. Single MPLS Control Plane for Multiservice Transport Since a single LSP can carry multiple protocols with a uniform frame structure for carrying all types of traffic, a single optical control plane can be used for provisioning a path (or wavelength) using a consistent traffic management scheme such as RSVP-TE or CR-LDP. This path can then be used for sending all types of traffic to fully utilize the provisioned link. Thus, a single IP based control plane can be used for all types of data to be transported using MPLS. Since data is transported opaquely a single control plane is needed for multiservice transport. 20. Minimization of Number of LSPs Since a single LSP can send multiple types of data, fewer LSPs are required for creating traffic-engineered links. A consistent and Jha Expires April, 2001 [Page 35] INTERNET-DRAFT Hybrid Data Transport November, 2000 unified control plane for setting up light paths with traffic engineering parameters for sending all types of data. For non-SONET style fault resilience and recovery, a minimum number of alternative LSPs can be set up to carry all types of critical data on backup links. There is no need to set up individual LSPs for different protocols. Fewer LSPs for carrying all the different protocols lead to lower management overhead and fewer refreshes in RSVP-TE at each node. Different LSPs can be created with different traffic engineering and QoS parameters and then different types of data can be sent on them depending on the requirements of data traffic. For instance, a high-bandwidth LSP can be provisioned to transport a mix of some IP flows, ATM traffic, raw byte stream, and some frame relay traffic. Other IP flows, some other ATM traffic and other frame relay flows can be given to another LSP with a differently provisioned bandwidth. 21. Multiservice Switching over WDM Links Many topologies are under development for switching of WDM links based on MPLS labels. One of the premises of HDT is that with more research into MPLS and deployment of MPLS in optical networks for data switching, traffic engineering, route aggregation, and fault protection, MPLS would become a common switching mechanism for all types of data flows. Jha Expires April, 2001 [Page 36] INTERNET-DRAFT Hybrid Data Transport November, 2000 +-+---------+ +-+---------+ +-+----------+ |M| Ethernet| |M| Ethernet| |M| Raw Data | +-+---------+ +-+---------+ +-+----------+ +-+-----------+ ================================ |M| ATM cells | 2/ +-+-----------+ / +-+------------++-+---------+ +---+ +---+ / |M|Frame Relay ||M| Ethernet| | | 1 | | / 3 +-+------------++-+---------+ ------>| A +======+ B +===================================== | | | | \ +---+ +---+ \ 4 +-+------------+ MPLS-based \ +-+-----------+ +-+---------+ |M| Frame Relay| WDM \ |M| ATM cells | |M| Ethernet| +-+------------+ Switching \ +-+-----------+ +-+---------+ ============================== Figure 13: Multiservice transport over each WDM link For optical networks MPLS will become the dominant mechanism for switching packet data, wavelengths, optical TDM, and anything else that requires switching data over a traffic-engineered path. MPLS labels as a common header facilitate uniform processing for all types of payload without regard to payload content. MPLS labels are established using common signaling methods; and once the labels are assigned traffic can be sent on the labels with MPLS being the only data link layer, for switching purposes. In the figure above, different types of data arrive on switch at node B. The switching logic at node B looks at labels and switches frames to appropriate output links (after label modification). 22. Frame Delineation Methods HDT frames can be delineated over optical networks using any robust frame delineation mechanism, such as HDLC (0x7E) or SDL (Simple Data Link, rfc2823). 22.1 HDLC HDLC-based delineation and scrambling are done as per Packet-over- SONET (POS) protocol as defined in rfc2615 [3], without the PPP protocol encapsulation requirements. +====+-----+------+-----------------+------+====+ | 7E |cHdr | cHEC | ....Payload.... | pCRC | 7E | +====+-----+------+-----------------+------+====+ Jha Expires April, 2001 [Page 37] INTERNET-DRAFT Hybrid Data Transport November, 2000 1 4 2 4 1 (bytes) Figure 14: HDLC Delineation Unlike POS, there is no PPP HDLC header (such as FF 03) encapsulation in HDT, as PPP is one of the many types of payload that can be carried inside HDT. In a system, HDT encapsulation is always processed before getting to the payload. Therefore, it is not possible to transport a non-HDT HDLC frame (containing PPP, for example). Using delineation based on HDLC has many limitations when working with large frames and higher speeds of operation for optical networks: o Every outgoing byte needs to be monitored, and stuffing needs to be performed to prevent flag emulation by data bytes. And the receiver also needs to monitor every incoming byte to perform the de-stuffing. Processing every byte at speeds of OC-48/192/768 requires hardware running at high speeds. o Malicious long packets (killer packets) can defeat scrambling, causing loss of synchronization. This is more likely now with increasingly large packet sizes with newer protocols such as multiprotocol over frame relay and jumbo Ethernet frames. o No advance knowledge of length prevents efficient use of variable-size queues for handling packets on incoming ports. Length of HDLC frame can only be determined when terminating 0x7E is encountered. Since nodes have no knowledge of frame length a priori, they must allocate buffer with the highest size for every incoming frame. o Loss of synchronization can become a hopeless situation because only one value (0x7E) acts both as a start marker and an end marker for a packet. o HDLC implicitly is bandwidth inefficient as byte stuffing causes the number of bytes transmitted to be much larger than the actual number of bytes in the packet. 22.2 SDL (Simple Data Link, rfc2823) SDL framing (rfc2823, [6]) prefixes a packet with a 32-bit word. First 16 bits of this word (LHdr) hold the length of the entire payload and the other 16 bits (LHEC) contain CRC-16 (Cyclic Redundancy Check) calculated on the 16-bit length field, as shown below. <----SDL --->|<------------ Packet --------------->| +=====+======+-----+------+-----------------+------+ |LHdr | LHEC |cHdr | cHEC | ....Payload.... | pCRC | Jha Expires April, 2001 [Page 38] INTERNET-DRAFT Hybrid Data Transport November, 2000 +=====+======+-----+------+-----------------+------+ 2 2 4 2 N 4 (bytes) Figure 15: SDL Delineation SDL provides a robust CRC-16 based framed boundary delineation mechanism that solves all current POS issues like robustness in bad BER conditions, variable packet size expansion, and malicious long packet scrambler manipulation. Packets are located by hunting for a length/CRC match, much the same way as ATM cells are located by HEC synchronization. The next packet within a SONET payload or on a DoF link is located by jumping length bytes in the frame and again looking for a length/CRC match. In case of data corruption at the location of a length CRC field, the hardware begins a byte-by-byte hunting for the length/CRC construct until a match is found. 22.2.1 Length (LHdr) The field is 16 bits and it contains length (in bytes) for all the bytes following LHEC field. This includes cHdr, cHEC, payload, and pCRC fields. The SDL draft (rfc2823) specifies the length field to include all bytes following the LHEC (length CRC) field up to the end of payload but not including pCRC. It mandates an advance negotiation or provisioning for presence or absence of pCRC in all frames. This scheme for pCRC is sufficient for sending a single type of packet. In a multiservice transport, however, some data types may not have an explicit CRC field present at the end of the payload. Examples of such data types are _ ATM cell(s), SONET/SDH frame(s), T1/T3/NxDS0 frames, etc. In addition, in systems frame delineation logic is always separate from CRC computation/verification logic. To accommodate this variation, and to de-couple frame HEC delineation from payload CRC dependence, this draft specifies that the LHdr length value MUST include length of pCRC field (4 bytes), if present. With this scheme, frame delineation mechanism is simpler - to get to the next frame, the delineation logic simply crosses over an offset given by LHdr value. Length field values can range from 0 to FFFF, allowing a maximum of 65536 bytes to be sent (including HDT header, payload, and pCRC). Jha Expires April, 2001 [Page 39] INTERNET-DRAFT Hybrid Data Transport November, 2000 Minimum length of bytes following the LHEC field is 7 bytes. This is equal to the minimum size of an HDT header (6 bytes) plus 1 byte of a payload (with no pCRC). Consequently, LHdr values of 0-6 (inclusive) are reserved and used for special SDL frames (described in the next section). 22.2.2 Length HEC (LHEC) This field is 16 bits, and it contains ITU-T CRC-16 calculated on the 16-bit LHdr field. 22.2.3 Payload CRC (pCRC) The payload CRC is an OPTIONAL field. When present it MUST be an ITU- T CRC-32. Presence or absence of the payload CRC is indicated in the payload header cHdr. 23. DC-balancing The four-octet length/CRC construct is DC balanced by exclusive-OR (also known as "modulo 2 addition") with the hex value B6AB31E0. This is the maximum transition, minimum sidelobe, Barker-like sequence of length 32. No other scrambling is done on the header itself. 24. Scrambling By default, an independent, self-synchronous X^43 + 1 scrambler is used on the data portion of the message including the 32 bit CRC (if present). This is done in exactly the same manner as with the ATM X^43 + 1 scrambler on an ATM channel. The scrambler is not clocked when SDL header bits are transmitted. Thus, the data scrambling MAY be implemented in an entirely independent manner from the SDL framing, and the data stream may be pre-scrambled before insertion of SDL framing marks. An OPTIONAL set-reset scrambler defined for SDL is an X^48 + X^28 +X^27 + X + 1 independent scrambler initialized to all ones when the link enters PRESYNCH state and reinitialized if the value ever becomes all zero bits. 25. Special SDL Frames Length values 0-3 (inclusive) for the frame are reserved for special SDL packets. For these special frames, the length value (LHdr) does Jha Expires April, 2001 [Page 40] INTERNET-DRAFT Hybrid Data Transport November, 2000 not reflect the actual number of bytes in the payload - it is a fixed length depending on the packet type (determined by the LHdr value). 25.1 Null/Idle Frame (Length = 0) A null packet is a 4-byte length/CRC construct with the LHdr field equal to zero. +-----+------+ |0000 | LHEC | +-----+------+ 2 2 (bytes) Null frames are used as fillers when there are no packets to be sent. 25.2 Scrambler State (Length = 1) +------+------+-----------------+--------+ | 0001 | LHEC | Scrambler State | CRC-16 | +------+------+-----------------+--------+ 2 2 6 2 (bytes) The special value of 1 for Packet Length is reserved to transfer the Scrambler State from the transmitter to the receiver for the optional set-reset scrambler. In this case, the SDL header is followed by six octets (48 bits) of Scrambler State. Neither the Scrambler State nor the CRC are scrambled. 25.3 A/B Messages (Length = 2, Reserved) +-----------+------+-----------------+--------+ | 0010 | LHEC | .._ Payload _.. | CRC-16 | +-----------+------+-----------------+--------+ 2 2 6 2 In SDL specification, special values of 2 and 3 for Packet Length were reserved for "A" and "B" messages, which are also six octets in length followed by two octets of CRC-16. Each of these eight octets is scrambled. These messages are reserved for use by link maintenance protocols, in a manner analogous to ATM's OAM cells. This draft uses code for "B" messages for sending a single ATM cell. Code for "A" message remains reserved. 25.4 Single ATM Cell Transport Jha Expires April, 2001 [Page 41] INTERNET-DRAFT Hybrid Data Transport November, 2000 (Length = 3) A length value of 3 is used for sending a single ATM cell. +--------+------+-----------------+ | 0011 | LHEC | ATM Cell | +--------+------+-----------------+ 2 2 53 (bytes) To save on overheads, no HDT header is used for sending a single ATM cell. Only SDL delineation is used. The ATM cell is sent in its entirety, with no payload CRC. 26. HDT Payload Header (PH) Structure Details +===============+-----------------------------------------+(bytes) | | Core Header (cHdr) (32 bits) | 4 | +-----------------------------------------+ | |(Optional) Next Fragment offset (32 bits)| 4 | Payload +-----------------------------------------+ | Header |(Optional) MPLS/OAM bytes (N x 32 bits) | N | +-----------------------------------------+ | | Core Header CRC cHEC (CRC-16) | 2 +===============+-----------------------------------------+ | User Payload ~~~ ~~~ | Data | Payload data | | ~~~ ~~~ | +-----------------------------------------+ | |(Optional) Payload CRC (16/32 bits) | 4 +===============+-----------------------------------------+ 26.1 Core Header (cHdr) As discussed earlier, HDT header (cHdr) is a 32-bit value precedes every payload (except for a single ATM cell when using SDL delineation) and provides all parameters that are needed to describe the type of data being transported and to advise nodes on how to process the packet. Bit(s) Description ------ ----------- D31: D24 Header Length in bytes (HLEN) - [8 bits] D23: D16 Time-to-Live (TTL) - [8 bits] D15: D14 Reserved for Future Use - [2 bits] Jha Expires April, 2001 [Page 42] INTERNET-DRAFT Hybrid Data Transport November, 2000 D13: D12 Tail-end padding (TEP)- [2 bits] 00: No padding 01: 1-byte padding 10: 2-byte padding 11: 3-byte padding D11 Payload CRC (pCRC) - [1 bit] 0: No Payload CRC (pCRC - 32 bits)at the end of payload 1: Payload CRC (pCRC) at the end of payload D10 Fragmentation Identifier (FI) - [1 bit] 0: No fragmentation, or a complete packet 1: Packet Fragment D9 Multi-frame Rate Channel (MF) - [1 bit] 0: No Rate channel 1: Rate channel, goes over multiple frames D8 Bandwidth Allocation (BA) - [1 bit] 0: Do not allocate bandwidth for this packet 1: Allocate bandwidth for this packet D7: D5 Header Extension (HEX) - [3 bits] 000: No additional header data bytes 001: MPLS Labels 010: OAM bytes 011: MPLS followed by OAM bytes 100: Reserved for future use - 111 D4: D0 Payload Identifier (PI) - [5 bits] Each of these parameters is described below. 26.1.1 Payload Identifier (PI) [D4: D0] These bits indicate the type of data in the payload. Payload identification is provided for systems to help process packets. HDT does not differentiate in treatment of packets. Bits Description ---- ----------- 00000 Null packet 00001 ATM Cell(s) 00010 PPP 00011 Ethernet / IEEE802.3 00100 PDH (T1/T3) 00101 Frame relay 00110 IP 00111 Raw byte stream 01000 SONET/SDH 01001- Reserved for future use Jha Expires April, 2001 [Page 43] INTERNET-DRAFT Hybrid Data Transport November, 2000 11111 There are only two types of payload: o Null packets: different from Null Frames of SDL (that are marked with length LHdr = 0), these are packets that have been dropped at a node, leaving the packet area reusable at the node or any other downstream node. o ATM/IP/Ethernet/Frame Relay/PPP/PDH/raw packets: standard packet/PDH data packets. Raw packets are any streams of data bytes, not necessarily having any protocol structure. These bytes are usually switched using MPLS labels. 26.1.2 Header Extension (HEX) [D7: D5] These bits indicate presence of any additional bytes in the HDT header between cHdr and cHEC fields. Currently, values are defined for MPLS and OAM fields; other values can be added in future. Bits Description ---- ----------- 000 Header contains no additional data bytes. 001 MPLS labels. 010 OAM bytes. The OAM bytes can be sent in-band with a packet, allowing a packet to carry OAM bytes to a destination node/network. Instead of using a repetitive frame or a special packet type to hold OAM bytes, any packet can be used to send OAM information. Consequently, every packet does not need to carry OAM bytes; the OAM bytes can be sent only when needed at the required intervals for link management and monitoring. At other times only packet data is sent with no OAM overheads. A packet MAY not be present in payload area when OAM bytes are sent in the header. In other words, OAM bytes can be sent using MPLS labels to the destination node. 011 MPLS labels followed by OAM bytes. In this case MPLS labels precede OAM bytes, allowing MPLS to Jha Expires April, 2001 [Page 44] INTERNET-DRAFT Hybrid Data Transport November, 2000 traverse the routes for sending both OAM and packet payload over a network. End of MPLS labels (and beginning of OAM bytes) is determined by end-of-stack bit in the bottom label. When the last label has been popped at a node, the node changes the bit value from 011 (MPLS+OAM) to 010 (OAM only). 100-111 Reserved for future use. These can be used for sending any other types of control and signaling information that would otherwise have required complex modifications and layer 2 shim header creations to native packets. 26.1.3 Bandwidth Allocation (BA) [D8] The Bandwidth Allocation/Reuse (BA) bit allows any packet inside a SONET frame to instantly reserve a guaranteed bandwidth on a packet- by-packet basis. Use of this bit allows a node to instantly provision and secure a position within a SONET SPE for a packet. This is similar to a virtual tributary, except that now any packet can become like a virtual tributary, the length of which is the same as the length of the packet. In traditional SONET architectures fixed size and location of virtual tributaries were defined. Bandwidth provided by a virtual tributary is equal to NxDS0, where N is the number of bytes in the tributary. Higher-bandwidth LAN/WAN traffic was sent over these low-bandwidth tributaries by complex mechanisms such as inverse multiplexing and virtual concatenation. Provisioning these circuit-switched virtual tributaries takes a long time, and sending packet-switched data over these circuit-switched tributaries leads to network complexity, both in configuration and maintenance. With the BA bit a sending node can send an entire packet inside a SONET SPE and mark it for bandwidth allocation. Subsequent nodes preserve location of this packet inside SONET, creating a virtual tributary on the fly. The size of this tributary is determined by the size of packet (and HDT overhead bytes). When the packet is received at the destination node and stripped off the ring (or, in case of multicast packets the sending node takes the packet off the ring), the area occupied by this packet is free for reuse for any other type of data. Bit Description Jha Expires April, 2001 [Page 45] INTERNET-DRAFT Hybrid Data Transport November, 2000 --- ----------- 0 No bandwidth allocation. The starting location occupied by packet inside a SONET frame is not guaranteed as it goes down the network. Downstream nodes can readjust location of packets in the SONET SPE as packets are removed and added at nodes. Depending on packet priority vis-a-vis other packets pending transmission, the packet may even miss the current SONET frame and get delayed until the next frame cycle. 1 A sending node (or any downstream node along the path) sets this bit to reserve packet location in the SONET SPE. Bandwidth reserved on the link for this packet is equal to NxDS0, where N is the number of bytes in this packet. With this bit set to 1, downstream nodes will not change the starting location of the packet within the SONET frame, giving a packet repetition rate of exactly 125 uS. Higher bandwidth packet stream is sent with multiple (different) packets in the same SPE. 26.1.4 Multi-Frame Rate Channel (MFRC) [D9] This bit, in conjunction with the BA bit (bit D8) set to a 1, provides a multi-frame rate channel for low-bandwidth requirements such as DS0, 2xDS0, etc. These are typically used for providing data communication channels, HDLC facility data links, and other applications. Details on multi-frame rate channels are given in section [34]. This bit set to a 1 indicates that the payload does not contain a complete packet, instead the payload contains byte(s) from a packet (such as an HDLC frame). These multi-frame rate channels appear in every SONET SPE at the same location. No payload CRC field follows the payload bytes. The CRC values, if present, contained in the data bytes that go through the rate channel. The pCRC bit (D11) is set to 0. Multi-frame rate channels are also created when HLEN value is in the range of 1-5. When HLEN value is in 1-5 range, there is no cHdr field. Instead, all bytes following HLEN contain user data that go through the rate channel. 26.1.5 Fragment Indication (FI) [D10] Jha Expires April, 2001 [Page 46] INTERNET-DRAFT Hybrid Data Transport November, 2000 On a direct data-over-fiber link, or when there are no fixed- bandwidth packets in a SONET SPE, packet data can be filled in as they come from system or incoming network side. All bytes belonging to a packet are sent continuously, with packets sent one after another. However, on SONET when some packets are fixed in location within the SPE, another packet that is dropped at an intermediate node leaves a space that may either be smaller or larger than the size of a new packet to be added. If the empty space is larger than the size of the new packet, the new packet is added and the remaining area is filled with idle packet(s) and/or another packet that can fit there. If the space is smaller, then the new packet to be added is arithmetically fragmented (no network level fragmentation and duplication of network headers), with fragments continued in other areas in the SPE. When fragmentation takes place, payload headers of first and subsequent fragments indicate presence of any following fragments. A 0 in FI field means that the payload area contains either a complete packet or last fragment in a fragmentation sequence. A 1 in FI field indicates the packet is a packet fragment. A 0 in FI field of a fragment marks end a fragment sequence. When FI is 1, a 4-byte Next Fragment Offset parameter follows the cHdr field. The FI bits are unused and set to 0 if none of the packets in an SPE requires a fixed-bandwidth allocation. 26.1.6 Payload CRC [D11] This bit indicates if there is a payload CRC (CRC-32) present at the end of payload. A 0 indicates there is no CRC-32 payload CRC present at the end of user payload data. A 1 indicates a CRC-32 for the payload bytes is present at the end of payload. Jha Expires April, 2001 [Page 47] INTERNET-DRAFT Hybrid Data Transport November, 2000 26.1.7 Tail-end Padding (TEP) [D13: D12] On SONET, sometimes there may be less than 4 bytes left between consecutive packets, making it impossible to place a SDL null packet. This can happen when the subsequent packet has a fixed-bandwidth allocation (more on this is discussed later) and cannot be moved within the SPE. In this case, these bytes (1, 2, or 3) are padded at the end of These bytes are then shown as tail-end padding for the preceding packet. When all packets to be sent inside a SONET SPE are normal data packets (IP, ATM, or any other type) with no fixed-bandwidth requirements, the TEP bits are unused and set to 00. 26.1.8 Reserved bits [D15: D14] Reserved for future use. 26.1.9 Time-to-Live (TTL) [D23: D16] Sending node sets this 8-bit field to a value that MUST be at least equal to the number of nodes on the network. To preserve consistency of operation across a hybrid nature of modern optical network types, nodes on both ring and linear networks set and process the TTL field. When a packet passes through a node, the node decrements TTL field by one. If the TTL count reaches zero, the packet is taken off the network. 26.1.10 Header Length (HLEN) [D31: D24] The header length includes the remaining 3 bytes of Payload Header cHdr (cHdr is 4 bytes long, and HLEN is the first byte), any additional payload header data bytes such as any MPLS/OAM bytes, and the 16-bit cHEC. Jha Expires April, 2001 [Page 48] INTERNET-DRAFT Hybrid Data Transport November, 2000 +------------------------------------+ | Bytes covered by HLEN | v v +======+=======+==========+==========+======+-----------+------+ |HLEN |D23: D0| ..MPLS.. | ..OAM.. | cHEC |..Payload..| pCRC | +======+=======+==========+==========+======+-----------+------+ 1 3 Nx4 M 2 4 (bytes) (OPTIONAL) <--- cHdr ----> Figure 16: Header Length HLEN Field in HDT Header As smallest size of a core header is 6 bytes (4 bytes cHdr + 2 bytes cHEC), the smallest value HLEN field can have is 6. Value of HLEN is between 6 and 255 (inclusive), giving a maximum header size of 256 bytes (including 1 byte of HLEN field). Values 1-5 are reserved for creating rate channels (starting at DS0 for value 1 and going up by DS0 value for each increment). 27. Data Transport Operations A device supporting Hybrid Data Transport protocol works much the same way as any other transport protocol over optical networks. Since packets can be natively and transparently transported over HDT, operations for processing Ethernet, ATM, POS (Packet-over-SONET), PDH, and any other packet types are the same. The only change in implementation is addition of an HDT header to packets to allow multiservice transport within a SONET SPE or over a single fiber (or wavelength) for data-over-fiber networks. There are two cases to consider for data transport using HDT: o There are no PDH channels (such as T1/T3) or lower-order SONET/SDH channels to be sent along with packets. All packets are statistically multiplexed over SONET with no specific timing relationships between successive packets. Payload consists typically of Ethernet (10M/100M/1G), ATM, Frame Relay, PPP, any other protocols, and raw data streams. o In the second case, in addition to data types described in the first case above, there are packets or data types (such as PDH, SONET/SDH, etc.) that need guaranteed bandwidth on a dynamic basis while other data types are statistically multiplexed and transmitted in remaining areas of SONET SPE. Jha Expires April, 2001 [Page 49] INTERNET-DRAFT Hybrid Data Transport November, 2000 The first of these two cases is quite straightforward, and is described here. The second case is described in the following section. Let us consider a typical system where many legacy protocols from a Host system (either from a physical port, such as an Ethernet port, or through internal LAN/WAN protocol translation such as PPP/Frame Relay/ATM) need to go to optical network. +----------+ +------+ ATM ----+ | | | Ethernet ----+ | | | | Host | | Line | Raw Data ----+ System +=====+ Card +---- SONET/SDH or WDM | +=====+ | PPP ----+ | | | Frame Relay --+ | | | +----------+ +------+ Figure 17: Multiservice Transport System 27.1 Transmit Operation In a transmit operation, a node takes inputs from one or more sources, adds an HDT header to each of the packets, encapsulates with SDL length/CRC construct (or HDLC framing) and then puts these frames inside a SONET SPE or directly over a fiber (or wavelength). A RECOMMENDED implementation approach is as follows: o Receive input data packet payload. The payload can be IP, Ethernet, PPP, frame relay, ATM (one or more cells), T1/T3, NxDS0, SONET/SDH, or just a raw byte stream. o Add any MPLS label stack, if needed o Create an HDT header to it with following parameters: data type identifier, bandwidth allocation (if needed), bit to indicate if MPLS and/or OAM bytes are present, a bit to indicate if a payload CRC is present, and finally a length value for the header including MPLS/OAM bytes. o Compute cHEC (CRC-16) for header bytes and add to the packet, or just add a cHEC (CRC-16) placeholder so line card can compute and fill in the CRC-16 as the packet is being transmitted. o If HDLC framing is used, send the packet to line card for scrambling, byte stuffing and HDLC framing. o If SDL framing is to be used, place a length/CRC construct in front of the packet with length (LHdr) equal to the entire packet including any tail end CRC-32. Compute LHEC (CRC-16) for LHdr and place it in the LHEC field, or just add a LHEC (CRC-16) Jha Expires April, 2001 [Page 50] INTERNET-DRAFT Hybrid Data Transport November, 2000 placeholder so line card can compute and fill in the CRC-16 as the frame is being transmitted. o The line card computes payload CRC (if needed), computes and places CRC values (as programmed) in placeholders, and scrambles HDT header, payload and CRC (as programmed). o On SONET the frame is sent in a tributary (or a lower-rate SONET) in non-concatenated mode, or on a higher-bandwidth virtually concatenated channel, or inside a concatenated payload. On data- over-fiber networks, the frame is sent directly after appropriate signal coding. 27.2 Receive Operation +------------------------------------+ v v +======+=======+==========+==========+======+-----------+------+ | HLEN |D23: D0| ..MPLS.. | ..OAM.. | cHEC |..Payload..| pCRC | +======+=======+==========+==========+======+-----------+------+ 1 3 Nx4 M 2 4 (bytes) (OPTIONAL) <--- cHdr ----> Figure 18: HDT Receive Operation When no fixed-bandwidth payloads are involved (BA bit = 0 in core header cHdr), receive operation is quite straightforward: o A receiver uses standard HDLC delineation in case of HDLC framing, and SDL length/CRC hunting and synchronization for an SDL framing. o Once a frame has been located, core header HEC (cHEC) for header bytes is examined. Should there be an error in cHEC, the entire payload that follows MUST be discarded. o If core header (cHdr) bits show presence of MPLS labels, the packet is handed over to a standard MPLS switching logic. If there are OAM bytes following MPLS labels these OAM bytes are processed after the packet reaches its destination. o If there are no MPLS labels, payload (containing native ATM cells, Ethernet, IP, PPP, etc.) is passed to system with payload identification. Once we get to payload area, the byte stream contains native packets and these can be processed by legacy system logic. o If there are no MPLS labels, and the frame contains only OAM bytes (in the header) followed by packet/cells, OAM bytes are delivered to the node for processing. Nodes can use L2 information contained in payload packet (such as MAC address, ATM VPI/VCI, frame relay DLCI, etc.) to process or forward OAM 28. End-to-end OAM Support Jha Expires April, 2001 [Page 51] INTERNET-DRAFT Hybrid Data Transport November, 2000 In HDT, OAM bytes can OPTIONALLY be sent inside the header. These OAM bytes can be sent in any of the following ways: o With native packets - in this case the OAM bytes are delivered to a destination node on a destination network by using layer 2/3 information contained in the packet. o Using MPLS labels in the HDT header - OAM bytes are sent on any LSP chosen by the MPLS labels, while payload area contains data packets. o Using MPLS labels in the HDT header - OAM bytes are sent on any LSP chosen by the MPLS labels. The payload area may not contain any data packets. This method is used for just sending OAM bytes at periodic intervals for link monitoring and management. This method of sending end-to-end OAM bytes has major benefits for both SONET and data-over-fiber networks. Recall that in SONET networks, the overhead bytes do not cross ring boundaries, making it difficult to monitor other networks. Also, in an optical network mix of SONET and non-SONET it is impossible to conduct OAM operations on any network from a node. By allowing attachment of OAM bytes with packet and a way to route OAM bytes to any node, it is easy to monitor nodes and networks anywhere. In an optical network mix, this can be used to detect link failure in both ring and point-to-point networks. Failures in LSPs can be detected by sending OAM bytes on them at required intervals. Since any type of payload can be used, networks running any type of traffic can be monitored using this method. For instance, if Ethernet-based transmission is used in metropolitan networks without any SONET framing periodic OAM bytes can be sent for network monitoring along with Ethernet packets. On a mesh network, these bytes can be sent using layer2/3 addressing or using MPLS to any network. By sending OAM bytes at only required intervals and not with every packet, overheads are minimized for non-SONET networks. 29. ATM cell transport 29.1 Single Cell If a single ATM cell needs to be sent, it can be sent with least amount of overhead using special SDL framing, as described in [25.4]. 29.2 Multiple ATM Cells Jha Expires April, 2001 [Page 52] INTERNET-DRAFT Hybrid Data Transport November, 2000 One or more ATM cells can be sent together, framed inside an HDT payload, as shown in [14]. The set of cells is switched as one entity from node to node using MPLS switching. When it reaches the destination ATM network, the cells are then taken out and delivered to the ATM switch by a de-framer. 29.3 ATM over Frame Relay Because ATM cells can now be sent as a group (as shown in []), the group can be also sent across a non-ATM network as long as there is external information in the packet to route the data. Over pure MPLS switching networks, the group can be switched using MPLS labels in HDT header fields. Whenever an ATM node must interface with a node inside a frame relay network, normal frame relay - ATM interworking function (IWF) specification should be followed for proper translations. However, no interworking functions need be performed when two ATM networks must be connected through an intervening frame relay network. In this case, one or ATM cells are sent as a group inside payload area of a frame relay packet. Format for this group is similar to the format shown in [14], with the removal of a cHEC field, as shown below: +-----+-----+-----+-----+-----+-----+-----+ |cHdr | ATM | ATM | ATM | ATM | ATM | ATM | +-----+-----+-----+-----+-----+-----+-----+ The HLEN field inside the cHdr gives length of the payload above (cHdr and ATM cells). For transmission over frame relay, frame format is similar to rfc1973 [12]. +----+--------+---------+-------+-----+-----+-----+-----+-----+----+ | 7E | Q.922 | Control | NLPID |cHdr | ATM | ... | ATM | CRC | 7E | +----+--------+---------+-------+-----+-----+-----+-----+-----+----+ ATM cells are transmitted without any modification. NLPID value for HDT has not been established yet. The cHEC parameter has been removed for transmission over frame relay because the payload bytes are protected by CRC at the end of frame Jha Expires April, 2001 [Page 53] INTERNET-DRAFT Hybrid Data Transport November, 2000 relay payload. The cHdr header has been retained to allow ATM and other types of packet data to be sent in the payload area with the same NLPID value. Bits inside cHdr define the type of packet being sent using this mechanism. 30. Instant Bandwidth Allocation with Statistical Multiplexing In addition to providing multiservice transport (with or without MPLS) over a single fiber (or wavelength) for data-over-fiber or over a SONET SPE, HDT supports dynamic provisioning and tear down of TDM- style channels along with multiservice packet data. With this support it is possible to send NxDS0, T1/T3, lower order SONET, and any other TDM channel along with any type of packet data over SONET networks. Both TDM and packet data types can be statistically multiplexed in any combination. Bandwidth can be provisioned instantly - either for the entire ring or across a few nodes in one direction or both directions (this is described in a different section). At any instant, and between any two nodes, whenever a TDM channel is not needed, the bandwidth that would have been used for the TDM channel can be used for packet data (or any other TDM channels, for that matter). Bandwidth can be provisioned on a per-packet basis in smallest possible fine granularity of DS0 (64Kbps) - same as the amount of bandwidth taken up by a single byte on a SONET network. As noted earlier, HDT frames can be sent either over an entire SONET payload area or within a single tributary (or STS-1/3/12/48 channel) or a virtually concatenated high-bandwidth channel. Consequently, a system can continue to use traditional TDM channels on other tributaries while using HDT on selected tributaries (or STS- 1/3/12/48 channels) or on a virtually concatenated high-bandwidth channel. For instance, one can allocate a high-bandwidth channel for HDT and use this channel for provisioning multiservice transport for packet data and TDM channels - all statistically multiplexed, while other tributaries on SONET are still used for TDM, POS (Packet-over- SONET), or ATM transport as usual. Bandwidth is allocated by provisioning a series of bytes inside a SONET payload for carrying the data. Since each byte corresponds to a 64kbps (DS0) channel, N-bytes inside the SONET payload give a bandwidth of NxDS0. Higher bandwidth operation is achieved by allocating more bytes, providing more occurrence of packets in the SONET payload from a source node, or a combination of both. Jha Expires April, 2001 [Page 54] INTERNET-DRAFT Hybrid Data Transport November, 2000 ATM ^ | +-+-+ +------>| S |------+ | +-----| 6 |<---+ | | | +---+ | | | v | v +---+ +---+ Gigabit Ethernet ----+ S | | S +--> ATM OC-12----+ 1 | | 5 +--> T1 +---+ +---+ ^ | SONET ^ | | v | v +---+ OC-192 +---+ T1 ----+ S | | S +--> OC-12 Frame Relay ----+ 2 | | 4 +--> Gigabit Ethernet +---+ +---+ ^ | +---+ | | | +---->| S |----+ | +-------| 3 |<-----+ +-+-+ | | v Frame Relay Figure 19: TDM and Packet Transport over SONET Consider the case of a gigabit Ethernet link between S-1 and S-4 over SONET as shown above. Traditional bandwidth allocation schemes for gigabit Ethernet, for example, allocate N bytes required for achieving gigabit speeds by taking these bytes from different STS-1/3 channels. These channels are then combined either using inverse- multiplexing or virtual concatenation techniques. However, the N bytes allocated for the gigabit Ethernet transport are not always filled completely with Ethernet packets. Ethernet traffic is bursty in nature, packets are of different sizes, and depending on traffic load the packets do not use all of the provisioned space every time. Link utilization is statistically quite low, resulting in poor use of provisioned bandwidth. An optimal use of link bandwidth would be to statistically multiplex and use the remaining bandwidth whenever available. Jha Expires April, 2001 [Page 55] INTERNET-DRAFT Hybrid Data Transport November, 2000 +-+-+---------------------------+ +-+-+---------------------------+ | | +----+------------+---------+ | | +----+-------------+--------+ |T|P| T1 | Ethernet-1 | ATM | |T|P| T1 | Frame Relay | PPP | |O|O+----+--+---------+----+----+ |O|O+----+--+----------+--------+ |H|H| PPP |Ethernet-2(BW)|Raw | |H|H| PPP |Ethernet-2(BW)|Raw | | | +-----+-+--------------+----+ | | +-----+-+--------------+----+ | | | ATM | SONET/SDH | | | | ATM | SONET/SDH | | | +-----+-------+-------+-----+ | | +-----+-------+-------+-----+ | | | Frame Relay | NxDS0 | ATM | | | | Ethernet-3 | NxDS0 | ATM | +-+-+-------------+-------+-----+ +-+-+-------------+-------+-----+ |<-- 125uS (Low rate SONET/SDH) -->| |<-- 125uS (Ethernet Fixed BW) -->| |<---- 125uS (NxDS0) ------------>| |<--------- 125uS (T1) ---------->| |<------ 125uS Inter-frame --->| Figure 20: Multiservice Packet and TDM Transport using HDT Procedure for transmission of packets with a guaranteed bandwidth over SONET is similar to that of sending any normal packet. A SONET payload with typical multiservice transport is shown in Figure 20. Note that while normal data packet location and contents change from frame to frame, fixed-bandwidth packets always repeat at the same location within a SONET payload, creating a 125uS repetitive pattern. Steps required for guaranteed bandwidth operation are as follows: 30.1 Dynamic Bandwidth Region (DBR) Allocation at Nodes +-+-+-------+------+-----+-----+ +-+-+-------+------+-----+-----+ | | | | | | | | | | | | | | |T|P| DBR | DBR | DBR |Free | |T|P| DBR | DBR | DBR | Free| |O|O| | | | | |O|O| | | | | |H|H| Node | Node | Node| | |H|H| Node | Node | Node| | | | | 1 | 3 | N | | | | | 1 | 3 | N | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-------+------+-----+-----+ +-+-+-------+------+-----+-----+ |<---- 125uSec Inter-frame ----->| Figure 21: Allocation of Dynamic Bandwidth Region (DBR) for Nodes First step in providing an instant bandwidth provisioning and data transmission between to nodes, or among a set of nodes, is allocating permissible bandwidth limits at the sending node. Jha Expires April, 2001 [Page 56] INTERNET-DRAFT Hybrid Data Transport November, 2000 For all fixed-bandwidth traffic, a sending node (node S-1, in this example) is allocated a set of N bytes, called Dynamic Bandwidth Region (DBR). A DBR of N bytes corresponds to a bandwidth of NxDS0. Size and location of this DBR can be changed at any time at the sending node. The concept of a DBR is similar to a VT (virtual tributary), however, unlike a VT a DBR is changeable both in size and location within a SONET SPE. A DBR can be as large as the entire allowable payload area inside the SONET SPE. In addition, unlike a VT, all nodes and all types of traffic for _normal_, bandwidth-insensitive data can share a DBR if the DBR is not fully utilized at any instant at any node. A sending node MUST use the bytes within its DBR to send all data with a fixed-bandwidth transmission requirement. The node MAY use DBR for sending normal data traffic, if at any instant it does not have any packets to send from a fixed-bandwidth source. At any instant, the sending node, or any downstream node, MAY use unused area in a DBR for sending any normal, bandwidth-insensitive traffic. Other nodes on the SONET network MAY also be allocated non- overlapping fixed-bandwidth sets, depending on network and traffic requirements and admission permissions for fixed-bandwidth data transport. If a node does not have a need for sending any fixed-bandwidth traffic, it MAY choose not to have any DBR allocated and yield this bandwidth to other nodes that may need to send fixed-bandwidth traffic. Number of bytes in DBR for different nodes MAY be different. The only upper limit on size of a DBR comes from size of SONET SPE. Total bandwidth for a DBR is adjustable in increments of a byte (DS0). Using this method, there is no need to allocate bandwidths in increments of STS-1 or similar finite granularity. One can precisely allocate a bandwidth for a gigabit Ethernet, a 100M Ethernet, fractional T-1, or even one DS0 channel. Each DBR can carry multiple HDT packets, with each HDT packet containing same or different types of data such as SONET, T1, frame relay, IP, Ethernet, ATM, or any raw byte stream. Provisioning and allocation of DBR to different nodes could be done through administrative, RSVP extensions, or other signaling means. This is for further study. 30.2 Example of a Multiservice Transport Network Jha Expires April, 2001 [Page 57] INTERNET-DRAFT Hybrid Data Transport November, 2000 To describe operation of a dynamic bandwidth management with statistical multiplexing it is best to illustrate with an example of a metropolitan or campus network over OC-48, as shown in Figure 22. ATM ^ | +-+-+ +------>| S |-----+ |+------| 6 |<---+| || +---+ || || || +---+ +---+ Frame Relay --+ S | | S +--> 100M Ethernet 100M Ethernet --+ 1 | | 5 | 1G Ethernet ----+ | | | ATM ----+ | | +--> T3 +---+ +---+ || SONET || || || +---+ OC-48 +---+ T3 ----+ S | | S +--> ATM 1G Ethernet ----+ 2 | | 4 | +---+ +---+ ^| +---+ || |+----->| S |----+| +-------| 3 |<----+ +-+-+ | | Frame Relay Jha Expires April, 2001 [Page 58] INTERNET-DRAFT Hybrid Data Transport November, 2000 +-+-+------+------+----+----+ +-+-+------+------+----+----+ | | | | | | | | | | | | | | |T|P| DBR | DBR |DBR |Free| |T|P| DBR | DBR |DBR |Free| |O|O| | | | | |O|O| | | | | |H|H| S-1 | S-5 |S-3 | | |H|H| S-1 | S-5 |S-3 | | | | | S-2 | S-6 | | | | | | S-2 | S-6 | | | | | | | | | | | | | | | | | | | | 1.2G | 150M |128K| | | | | 1.2G | 150M |128K| | | | | | | | | | | | | | | | +-+-+------+------+----+----+ +-+-+------+------+----+----+ Figure 22: Multiservice Transport over OC-48 In this example network, S-1 and S-2 are end-points of a gigabit Ethernet. S-1 and S-5 connect 100M Ethernet networks, and S-1 and S-3 have a 128 Kbps frame relay link. In addition, S-2 and S-5 have a T3 circuit-switched link. Many other variations of this network configuration are possible. It is RECOMMENDED that nodes on a bandwidth link share the same area in the SPE. In the configuration above, S-1 and S-2 are allocated an aggregate bandwidth equal to the largest need for bandwidth. As a better, fine-tuned alternative both S-1 and S-2 can share a 1G DBR area; and S-1 and S-2 can share additional DBR areas for requirements for 100M, T3, frame relay, and any other interface needs. S-3 gets a DBR allocation for 128Kbps bandwidth. Recall that bandwidth can be allocated with a fine granularity of DS0 (64 Kbps). All of these DBR areas can be freely used by any node anywhere in the ring for transmission of normal data packets (with BA bit = 0). Unassigned area in the SPE is free and any node can use it for sending any type of traffic. 30.3 Data Transmission at a Sending Node o To send a packet a sending node prepares a core header (cHdr) for the packet. The node specifies the type of data and presence of any MPLS labels and/or OAM bytes in cHdr parameter bits. o The node sets BA bit (bit D8) set to 1 to denote guaranteed bandwidth transmission for the packet. o The completed HDT frame is sent at an available location within the DBR. o The node MAY use remaining area in its own DBR or any other place within the SONET SPE with any other normal packet to fully utilize the provisioned bandwidth. Jha Expires April, 2001 [Page 59] INTERNET-DRAFT Hybrid Data Transport November, 2000 30.4 Processing at an Intermediate Node Processing at an intermediate node in Figure 19(in this figure intermediate nodes are S-2 and S-3 for sending node S-1 and destination node S-4) is relatively straightforward: o A node looks at the packet data link information _ MPLS label at the top of the stack in HDT header; or ATM VPI/VCI, Ethernet MAC, and frame relay DLCI to determine if the node is the destination node for this packet. If a match is found, the node is a destination node and it processes the packet as described in the next section. o As SONET frames containing fixed-bandwidth packets go around the ring, intermediate nodes detect these packets by checking the BA bit [bit D8]. o If the BA bit is 0, the packet is treated as a normal packet, and the node can place the packet anywhere inside an outgoing SONET SPE. o If the BA bit is 1, the node records the offset of this packet and preserves it in outgoing SPE - providing a 125uS-repetition rate for the packet. o The intermediate node MAY add its own data (fixed or variable bandwidth) to the SPE depending on availability of bandwidth. If the data requires a fixed-bandwidth allocation, the node can add it only within its own allocated DBR. 30.5 Processing at a Destination Node o A node looks at the packet data link information _ MPLS label at the top of the stack in HDT header; or ATM VPI/VCI, Ethernet MAC, and frame relay DLCI to determine if the node is the destination node for this packet. o It clears the PI (Payload Identifier) bit (bit D4: D0) to mark the area occupied by the packet as null and reusable. o The packet is received by the node and taken off the ring, if required. Multicast packets are not taken off the ring. 31. Segmented Dynamic Bandwidth Allocation HDT also allows segmented bandwidth allocation over a section of a SONET network. Assume an application where a server (node A) downloads real-time multimedia video data to the client (node B). While a guaranteed bandwidth availability is needed in the (A->B) direction, the (B->A) direction does not need the bandwidth. The traffic in (B->A) direction may simply consist of requests and acknowledgments from client to server. Jha Expires April, 2001 [Page 60] INTERNET-DRAFT Hybrid Data Transport November, 2000 Bandwidth released +------<-------<---------<-+ | +---+ | | +------>| S |-----+ | v |+------| 6 |<---+| | || +---+ || | || || | +---+ +---+ | S | | S +--> 100 M Ethernet 100M Ethernet --> 1 + | 5 | (Client) (Server) +---+ +---+ B A || SONET || || || | +---+ OC-48 +---+ | | S | | S | ^ | | 2 | | 4 | | | +---+ +---+ | | ^| +---+ || | | |+----->| S |----+| | | +-------| 3 |<----+ | | +---+ | +---->---------->----------+ Guaranteed Bandwidth Figure 23: Bandwidth allocation on a partial link When node A sends a packet with a guaranteed bandwidth it sets the BA bit. When B sends a packet to A, it clears the BA bit if it does not need a guaranteed bandwidth. The packets in (B->A) direction travel as a normal packet, allowing other nodes to use the released bandwidth for time-critical packets. Note that even if B sets the BA bit to a 1, a lot of bandwidth is released to the network. If B is a client sending acknowledgments and requests these packets are significantly smaller than packets coming in (A->B) direction. Since bandwidth consumed is only equal to the NxDS0 (where N is the number of bytes occupied by a packet), the (B- >A) link releases a lot of bandwidth to the `pool'. 32. Fragmentation of Packets Support of PDH-type channel requires a fixed starting location for the channel in every frame. If PDH support is not needed, packets of any mix can be put anywhere inside the SPE and we can achieve excellent bandwidth utilization without much any complexity. When a data packet is sent inside an SPE that is already carrying a mix of some fixed-bandwidth channels, it MAY happen that there are Jha Expires April, 2001 [Page 61] INTERNET-DRAFT Hybrid Data Transport November, 2000 usable areas in the SPE between fixed-bandwidth channels that can be utilized for sending packet data or some other fixed-bandwidth channels. Referring to Figure 22, when fixed-bandwidth packets are destination- stripped at destination nodes and packets of different sizes are added it is likely that spaces exist between two fixed-bandwidth packets that cannot fully accommodate a new packet to be added at this node. +------>---------->-------->-------+ | | +-+-+---------|----------------------------------|-----------+ | | +----+----+-------+------------+-----------+-v---------+ | |T|P| T1 | Pkt Frag 1 | FB packet | FB packet | Pkt Frag 2| | |O|O+----+--+--------------+----+--+----+------+-----v-+---+-+ |H|H| PPP | Pkt Last Frag|DS0 | PPP | | |PPP | | | +-----+-+-^------------+----+-----+-+------------|-+-----+ | | | ATM | | SONET/SDH | ATM | Ethernet| | +-+-+-----+---|-----------------+-----+--------------|-------+ +--------------------------------------+ Figure 24: Fragmentation over a fixed-bandwidth packet boundary In this case, a part of the packet is stored in the available space, and is continued at next available empty space in the SPE. All fragments look like complete packets with proper SDL framing and fragment length information with each pointing to the next by storing the starting address of the next fragment in its Next Fragment Offset field. Fragmentation of HDT frames allows filling the entire SPE with different types of data to its full capacity. If a particular space in the SPE is too small to house HDT header, fragment offset, and some payload bytes the space is filled with idle packets. Idle packets for SDL are length/CRC constructs with the length value set to 0000. For HDLC-based delineation Fragmentation of a packet is quite easy in SONET because all bytes in an SPE are transmitted sequentially, and there isn't any problem in recovering fragments and putting them together. If payload has a CRC-32 at the end, it shows up only in the last fragment. Each of the fragments, except the last one, has Fragment Indication (FI) bit set to 1 in its HDT header. The Next Fragment Offset 32-bit value is set to offset of the next fragment from the beginning of current fragment. Jha Expires April, 2001 [Page 62] INTERNET-DRAFT Hybrid Data Transport November, 2000 33. Tail-end Padding It is possible that between two fixed-bandwidth frames there is an unused set of bytes. This free space is usually created when a frame occupying that space is dropped at a node. When a new frame to be added in the free space is smaller than the free space size, the remaining area must be filled with some valid data. When HDLC framing is used, this free space is simply filled with the delimiter flag. In case of SDL framing [refer to section 22.2], depending on the number of bytes in the remaining area, following steps are taken: o 1-3 bytes in the remaining area: these bytes are set to 00 and included in the added frame as tail-end padding. The Tail-end Padding (TEP) bit in cHdr is set to the number of bytes padded. o 4 bytes: a Null/Idle SDL frame is inserted in the four bytes. o 5+ bytes: an SDL frame is placed with LHdr set to (number of bytes _ 4; 4 is the length of SDL LHdr + LHEC fields). The first byte following these is HLEN of cHdr. The HLEN field is set to the number of bytes that follow the first 4 bytes. Note that HLEN values of 0-5 are not used for sending a packet, instead these values are used to provide rate channels. A HLEN value of 0 gives is exactly one byte of payload, giving a single DS0 channel 34. Multi-Frame Rate Channels An easy extension of HDT support for dynamically creating fixed- bandwidth channels is creation of smaller rate channels such as DS0, 2xDS0, etc. These are typically used for providing data communication channels, HDLC facility data links, and other applications. +-----+------+=====+======+------------------+ |LHdr | LHEC |cHdr | cHEC | Channel Byte(s) | +-----+------+=====+======+------------------+ 2 2 4 2 N This bit set to a 1 indicates that the payload does not contain a complete packet, instead the payload contains byte(s) from a packet (such as an HDLC frame). These multi-frame rate channels appear in every SONET SPE at the same location. There is no CRC for the payload in the HDT frame. The CRC-32 (or CRC- 16) is a part of user HDLC packet, for instance, and it appears within the channel bytes. Jha Expires April, 2001 [Page 63] INTERNET-DRAFT Hybrid Data Transport November, 2000 Operation of these channels is similar to what SONET provides through data communication channels. However, these channels are more powerful than what is provided by SONET in following ways: o Any number of channels can be created within an SPE o These channels coexist with any other type of data being transmitted in the SPE o These rate channels can be of any size, starting from DS0 all the way up to the full capacity of the SPE. 35. Bandwidth Reuse on SONET For packet transport over SONET rings, it is advantageous to take a packet off the ring once it has reached its destination. As discussed in [8.2.3], new protocols such as Spatial Reuse Protocol (SRP [7]) use destination MAC addresses for selecting packets for destination stripping to free occupied bandwidth. Instead of supporting only Ethernet-style framing, HDT provides for any type of data to be destination-stripped. As shown in Figure 25, different types of layer 2 frames can be destination-stripped at their respective destination nodes to free up bandwidth on a SONET network. Jha Expires April, 2001 [Page 64] INTERNET-DRAFT Hybrid Data Transport November, 2000 ATM (VPI/VCI) ^ | +-+-+ +------>| S |-----+ |+------| 6 |<---+| || +--++ || || | || (MPLS or MAC) +---+ VPI/VCI +---+ Gigabit Ethernet ----+ S |\/-----+----| S +--> ATM (VPI/VCI) (VPI/VCI) ATM ----+ 1 |/\ -------| 5 +--> T1 (MPLS) +---+ \ / +---+ || \ || || / \MAC || +---+ / \ +---+ (MPLS) T1 ----+ S |/ \ | S +--> ATM (VPI/VCI) Frame Relay ----+ 2 |\ DLCI \----| 4 +--> Gb Ethernet (DLCI) +---+ \---\ +---+ (MAC or MPLS) ^| +-\-+ || |+----->| S |----+| +-------| 3 |<----+ +-+-+ | (DLCI) | (MPLS) v Frame Relay Figure 25: Bandwidth Reuse on SONET for Multiprotocol Traffic Either layer 2 information (such as ATM VPI/VCI, frame relay DLCI, or Ethernet MAC address) contained within the protocol frame or MPLS label provided in HDT header can be used for destination-stripping the packet. Raw byte streams (clear channel data) would normally use MPLS labels for destination stripping. When MPLS is used, instead of switching cell-by-cell multiple ATM cells can be carried as an ensemble from one node to another node guided by MPLS switching. 36. Fault-resilient Packet Networks with Recovery and Restoration Due to its protocol-independent frame format and MPLS encoding, HDT provides a homogeneous way for creating fault-resilient optical networks for any combination of SONET and data-over-fiber networks. With multiservice transport capability, alternative LSPs are created with traffic engineering parameters for an aggregation of critical traffic that needs to be protected at a segment. One can take Jha Expires April, 2001 [Page 65] INTERNET-DRAFT Hybrid Data Transport November, 2000 different data flows (of different network protocol types) and put these on an alternative LSP link. Different alternative LSP links going to different destinations can be set up. These different LSPs can take different priorities and mixes of traffic when a fault occurs. A network diagram with a hybrid of point-to-point (PTP) and ring networks is shown below. A fixed interconnection that goes through entire SONET ring forces all traffic passing through the SONET network to go to all nodes in case of a fault. In the diagram below, an optimal path for a packet going from (A) to (B) could be to go from W1 to S1 to S2 to W2. If S1-S2 link were to fail, it may not be desirable for a fiber wrap at S1 and S2 and go through S1-S4-S3-S2 route. It may be much more preferable to go through W1-W2 using a point-to-point WDM link for all or part of its critical data. Therefore, a simple fiber wrap-around on a standard SONET link may not be optimal at all times. Many times network providers may choose to keep cheaper point-to-point alternative routes as backup links in case a few of their key nodes or links fail. MPLS labels are set up for network layout, traffic engineering, and alternative route selections. In the example above, an LSP (Label Switched Path) is set up for W1-S1-S2-W2. Jha Expires April, 2001 [Page 66] INTERNET-DRAFT Hybrid Data Transport November, 2000 +----+ +-------->| S |---------+ |+--------| 4 |<-------+| +---------+ || +----+ || |+-------+| || \ || || RING || || LSP-S4S3|| || 2 || +---+ +---+ SONET \.. +---+ |+---+ || | W | PTP | S | RING 1 | S | || S | || (A)----+ 1 ++--------+ 1 |------------------+ 3 +------+| 5 | || From | || Link +---+ LSP-S1S3 +---+ |+---+ || PTP +---+|LSP-W1S1 || || || || or | || +---+ || || +---+ || Ring | |+------->| S |--------+| |+-| S |-+| \ +---------| 2 |<--------+ +--| 6 |--+ \ +---+ +-+-+ \ LSP-W1W2 +---+ +---> \------------------------------+ W +--------+--->(B) PTP | 2 | To other PTP Link +---+ or ring networks Figure 26: Fault Recovery and Restoration The purpose of this draft is not to describe methods for alternative LSP setups. Fault recovery and service restoration have been discussed in [5, 10]. HDT provides following features to achieve fault recovery and restoration: o HDT provides a unified way of switching different types of data on backup LSPs. o Because it allows sharing of an LSP for different types of services, number of LSPs to be set up for fault recovery and restoration is minimized. o It allows transmission of OAM bytes (SONET or non-SONET) from any end node to another end node on any network. While SONET has standard methods for link failure detection, such features are not available in non-SONET networks. In this example, OAM bytes can be sent on W1-W2 link along with in-band MPLS labels and data packets. Or, only OAM bytes can be sent using MPLS labels on desired link(s) to determine link and/or node health. o Standard OAM bytes can be used to send periodic hello messages to confirm health of a link. These OAM bytes can be routed on the network using native packets or MPLS labels, as described in [section 28]. Protection and link health monitoring is thus not limited to SONET networks - standard OAM bytes can be used for any type of network mix. Jha Expires April, 2001 [Page 67] INTERNET-DRAFT Hybrid Data Transport November, 2000 37. Security Considerations The reliability of public SONET/SDH networks depends on well-behaved traffic that does not disrupt the synchronous data recovery mechanisms. This protocol relies on SDL and HDLC that has framing and scrambling options that are used to ensure the distribution of transmitted data such that SONET/SDH design assumptions are not likely to be violated. Any further security issues are not addressed in this document. 38. Acknowledgments The author would like to sincerely acknowledge reviews & comments by Enrique J. Hernandez-Valencia (Bell Labs), Bobby Cates (NASA Ames), Dirceu Cavendish (NEC), Ben Crosby (Alcatel), Tom Moore (ADC Telecom), Luc Roy (Alidian), Raj Sharma (Luminous), Davide Trivellin (Overtek), and Necdet Uzun (Auroranetics); and Gary Cochran, Ed Grivna, Nilam Ruparelia, and Paul Scott (Cypress). 39. Intellectual Property Considerations Cypress Semiconductor Corporation may own intellectual property on some of the technologies disclosed in this document. In the event that Cypress obtains such patent rights, Cypress intends to license them on reasonable and non-discriminatory terms in accordance with the intellectual property rights procedures of the IETF standards process. Lucent Technologies Inc. may own intellectual property on some of the technologies discussed and used in this document. 40. Author's Address Pankaj K Jha Cypress Semiconductor 3901 N First Street San Jose, CA 95134 USA Phone: 408 432 7091 Fax: 408 943 2949 Email: pkj@cypress.com 41. Full Copyright Statement "Copyright (C) The Internet Society. All Rights Reserved. This document and translations of it may be copied and furnished to Jha Expires April, 2001 [Page 68] INTERNET-DRAFT Hybrid Data Transport November, 2000 others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into._ 42. References 1. Doshi, B., Dravida, S., Hernandez-Valencia, E., Matragi, W., Qureshi, M., Anderson, J., Manchester, J.,"A Simple Data Link Protocol for High Speed Packet Networks", Bell Labs Technical Journal, pp. 85-104, Vol.4 No.1, January-March 1999. 2. Anderson, J., Manchester, J., Rodriguez-Moral, A., Veeraraghavan, M.,"Protocols and Architectures for IP Optical Networking", Bell Labs Technical Journal, pp. 105-124, Vol.4 No.1, January - March 1999. 3. Malis, A., Simpson, W., PPP over SONET/SDH, rfc2615, June 1999. 4. Grossman, D., Heinanen, J., Multiprotocol Encapsulation over ATM Adaptation Layer 5, rfc2684, September 1999. 5. Haskin, D., Krishnan, R., A Method for Setting an Alternative Label Switched Paths to Handle Fast Reroute, draft-haskin-mpls- fast-reroute-05.txt, November 2000. 6. Carlson, J., Langner, P., Hernandez-Valencia, E.J., Manchester, J., PPP over Simple Data Link (SDL) using SONET/SDH with ATM- like framing, rfc2823.txt, May 2000. 7. Tsiang, D., Suwala, G., The Cisco SRP MAC Layer Protocol, rfc2892, August 2000. 8. Jones, N., Murton, C., Extending PPP over SONET/SDH with virtual concatenation, high order and low order payloads, draft-ietf- pppext-posvcholo-02.txt, June 2000. 9. Wu, Tsong-Ho, Cost-effective Network Solution, IEEE Communications Magazine, September 1993, pp. 64-73. 10. Sharma, V. et al, Framework for MPLS-based Recovery, draft-ietf- mpls-recovery-frmwrk-00.txt, September 2000. 11. Rosen, E., MPLS Label Stack Encoding, draft-ietf-mpls-label- encaps-08.txt, July 2000. 12. Simpson, W., PPP in Frame Relay, rfc1973, June 1996. Jha Expires April, 2001 [Page 69]