Internet Draft Internet Engineering Task Force Ramesh Bhandari Internet-Draft Siva Sankaranarayanan Eve Varma Lucent Technologies Expiration Date: May 2001 November, 2000 High Level Requirements for Optical Shared Mesh Restoration draft-bhandari-optical-restoration-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract In this draft, we provide the high level requirements for optical shared mesh restoration within the optical transport network. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119. 3. Introduction Because of the enormity of the traffic that optical networks are expected to carry, resulting from the continued explosive growth of data-oriented applications, optical network survivability has become an issue of paramount importance. In conjunction, there is a continuing drive for maximizing efficiency and minimizing costs in large networks. Very fast restoration mechanisms such as 1+1 schemes (with restoration times of the order of the tens of milliseconds) exist, but given the degree of network resource consumption, alternative options are essential. With the availability of large optical cross-connects, shared mesh network restoration at the optical layer is a versatile approach that should be considered. Simulations [1] have shown that shared mesh networks require much less additional capacity than rings. Although less network resource consuming, the trade-off has been service restoration time. However, mesh based restoration is not inherently "slow"; if appropriate architectural requirements are established in a timely manner, it should be possible to enable fast restoration times (e.g, restoration times comparable to those provided by the SONET ring- based infrastructures). Within this contribution, we provide architectural requirements that enable fast and efficient optical mesh restoration. 4. Optical Mesh Network Architecture This draft focuses upon next generation optical networks based upon ITU- T Recommendations G.872 [2] and G.709 [3]. Optical mesh networks basically consist of optical cross-connects (OXCs) interconnected by DWDM links. Associated with these OXCs are controllers that facilitate communications among them. (Note that these controllers may be internal or external to the controlled OXCs, and a one for one relationship is not assumed). An optical channel (OCh) connection through the optical transport network (OTN) is established along a route having capacity (wavelength availability) between its designated ingress and egress points. The OCh connection between the source and the destination OXCs is comprised of a series of OXCs interconnected by OCh link connections, and a signaling mechanism is used to appropriately configure the OXC during OCh network connection establishment. Note that an optical channel transparently carries a variety of client signals (e.g., IP, SONET/SDH, ATM, GbE), and provides OAM capabilities such as tandem connection monitoring (TCM) and end-end signal integrity checking. Thus, an optical channel traversing a series of optical subnetworks, can be monitored at various points along the route, typically at subnetwork boundaries, as well as at the OCh termination points (end-points). When there is a breakdown of an OCh network connection due to a failed OCh link connection(s) or OXC node, the affected traffic needs to be restored using an alternate route. There are two ways in which this restoration may be performed: 1) reroute around the point of failure, e.g., a failed link connection 2) reroute from the tandem connection monitoring (TCM) or OCh termination points. The first method mandates the need for fault localization in advance of initiating restoration actions. I.e., it is necessary to pinpoint the precise location of the fault along the OCh network connection so that rerouting can be performed around it. Relatively quick fault isolation might be provided by digitally monitoring OCh overhead at every optical NE (ONE); however, this builds a dependency upon digital pro- cessing throughout the entire OTN. This introduces additional cost incurred from proliferation of OEO throughout the network solely for maintenance reasons (vs. impairment mitigation), not to mention additional digital monitoring equipment to determine performance degradation. Alternatively, controlling the expense by sharing the monitoring equipment over many optical channels leads to an unacceptably large fault detection time [4]. More significantly, this method inhibits evolution towards transparent optical networks. Further, fault localization in transparent optical networks may be complicated by the non-linear interactions typical of such networks. This can result in time-consuming correlations to identify the root cause of signal impairments. The second method of restoration involves rerouting from the OCh terminations or the TCM points, and therefore does not require fault isolation to occur before initiation of restoration actions. It is expected to be fast because it utilizes the ability to accurately detect loss of signal from TCM and OCh termination points, from which signaling may subsequently be initiated to restore the traffic on an alternate path. Since the exact location of the fault along the primary path is unknown, the alternate path has to be "physically-disjoint" from the primary path. We further note that this approach is conducive towards evolution to increasingly large transparent (all optical, no OEO) subnetworks in two ways: it avoids embedding dependencies on digital processing within the OTN; it is tailored to the needs of all- optical networks. In what follows, we assume restoration is path-based, i.e., it takes place from the TCM or OCh termination points, and that the alternate path is physically-disjoint. It is important to mention that, if the primary path traverses multiple subnetworks or operator domains, then due to monitoring at the edge of each domain, restoration may be performed within that domain. This would also avoid the need for signaling inter-working between multiple domains. Clearly, to effect restoration on these alternate disjoint paths, spare capacity must be reserved on each link of the path. For the network to be efficient, this spare capacity must be shared for restoration of other working paths as well. For fast and scaleable optical network restoration, it is also desirable to maintain the network-state in a distributed manner. Below we point out some high-level requirements for restoration at the optical layer. 5. Requirements for Fast Optical Mesh Restoration Any optical mesh restoration scheme must - Be independent of OCh client (e.g., IP, ATM, SDH/SONET, GbE). - Avoid dependency of restoration action initiation on non-time critical functions. Therefore, it should not require fault localization to occur before initiating restoration actions. -> Restoration must be triggered from the TCM or OCh termination points. -> The alternate path must be physically disjoint; by physically disjoint, we mean not only node and link disjoint, but also span-disjoint. - Have scalability in the event of catastrophic failures such as fiber cable cuts. -> Appropriate mechanisms must be utilized that can restore the (expected) large amount of affected traffic rapidly, and in a cost-effective manner; e.g., core network application domain encompassing up to a few hundred nodes per subnetwork, and thousands of point-to-point demands. - Utilize a robust and efficient signaling mechanism. -> The signaling network must remain functional after a failure in the transport and/or signaling network infrastructure. Clearly, for restoration to be carried out effectively, it is necessary for the connection controllers to have information on the network topology (such as link state and wavelength availability) as well as on physical aspects of the transport network such as fiber span and span- sharing links. Appropriate algorithms are needed to determine physically disjoint paths for restoration (see, e.g., [5]), since restoration must take place from the TCM or OCh termination points. To ensure that paths are actually physically disjoint (i.e., node, link, and span disjoint), span-sharing link topologies or Shared Link Risk Groups (SRLGÆs) [5-6] of the actual physical fiber network must be understood. For special high quality services [7], another key consideration involves regions of failure, specified by the corresponding radii of failure. This is because, for such services, diverse routes should not pass through a region where there is the risk of both the primary and alternate paths failing simultaneously due to catastrophic disasters such as earthquakes, floods, etc. Appropriate mechanisms (see, e.g., [5]) and algorithms may need to be constructed to expedite the restoration process and to make the restorable mesh network cost effective by sharing spare capacity. Approaches to garner information on network topology are currently under consideration within various fora (e.g., via the use of appropriate extensions to OSPF (see, e.g., [8])). 5.References [1] S. Baroni et. al., Proc. Conference on Optical Fiber Communications, Paper TuK2 March 2000. 2] Agreed revisions to Version 2 of G.872 per October 1999 Q19/13 Meeting, provided to T1X1.5 for information, ftp://ftp.t1.org/pub/t1x1/2000x15/0x150500.pdf [3] Draft ITU-T Recommendation G.709, Oct. 2000 version submitted for approval at the Feb. 2001 SG 15 meeting, provided to T1X1.5 for information, ftp://ftp.t1.org/pub/t1x1/x1.5/0x152460.doc [4] G. Newsome, "Maintenance Philosophy for the OTN", T1X1.5/99-108R1 [5] R. Bhandari, "Survivable Networks: Algorithms for Diverse Routing", Kluwer Academic Publishers (1999) [6] S. Chaudhuri et al, "Control of Lightpaths in an Optical Network", Internet Draft <draft-chaudhuri-ip-olxc-control-00.txt> February 2000 [7] H. Ishimatsu et al, "Carrier Needs Regarding Survivability and Maintenance for Switched Optical Networks",, submitted in this meeting. [8] G. Wang et al, "Extensions to OSPF/IS-IS for Optical Networking", Internet Draft March 2000 6. Authors' Contact Information Ramesh Bhandari Lucent Technologies bhandari1@lucent.com Sivakumar Sankaranarayanan Lucent Technologies ssnarayanan@lucent.com Eve Varma Lucent Technologies evarma@lucent.com Expiration Date: May 2001