Internet Draft Internet Draft J. Wroclawski Expires September, 2000 MIT LCS draft-ietf-issll-ds-map-00.txt A. Charny Cisco Systems March, 2000 Integrated Service Mappings for Differentiated Services Networks Status of this Memo This document is an Internet Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.ietf.org (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). This document is a product of the ISSLL working group of the Internet Engineering Task Force. Please address comments to the group's mailing list at issll@mercury.lcs.mit.edu, with a copy to the authors. Copyright (C) The Internet Society (1999). All Rights Reserved. Abstract This document describes mappings of IETF Integrated Services onto IETF differentiated services networks. These mappings allow appropriately engineered and configured differentiated service network clouds to play the role of "network elements" in the Integrated Services framework, and thus to be used as components of an overall end-to-end Integrated Services QoS solution. 1. Introduction The IETF Integrated Services framework [INTSERV] defines mechanisms and interfaces for providing network Quality of Service control useful for applications that require more predictable network service than is available with the traditional best-effort IP delivery model. Provision of end-to-end QoS control in the Intserv model is based on the concatenation of "network elements" along the data transmission path. When all of the concatenated network elements implement one of the defined Intserv "services" [G,CL], the resulting data transmission path will deliver a known, controlled QoS defined by the particular Intserv service in use. The IETF Differentiated Services framework [DIFFSERV] defines a number of mechanisms for differentiating different traffic streams within a network and providing different levels of delivery service to those different streams. These mechanisms include differentiated per-hop queuing and forewarding behaviors, as well as behaviors such as traffic classification, metering, policing and shaping that are intended to be used at the edge or boundary of a diffserv cloud. Crucially, the Differentiated Services framework manages traffic forwarding behavior within a diffserv cloud at the aggregate level, rather than the per-application-flow level. The availability of Differentiated Services per-hop and cloud-edge behaviors, together with additional mechanisms to statically or dynamically limit the absolute level of traffic within a traffic class, allows an IETF Differentiated Services network cloud to act as a network element within the Integrated Services framework. In other words, an appropriately designed, configured and managed Diffserv network cloud can act as one component of an overall end-to-end QoS controlled data path using the Integrated Services framework, and therefore support the delivery of Intserv QoS services. This document is one of a set that together describe the usage of Differentiated Services networks in this manner. This document describes methods for implementing Intserv using Diffserv network behaviors and mechanisms. Companion documents [RSVPAGGR, DCLASS] define extensions to the RSVP signaling protocol [RSVP] that are useful in this environment.It is recommended that readers be familiar with the overall framework in which these mappings and protocols are expected to be used; this framework is described fully in [ISDSFRAME]. Within this document, Section 2 describes the overall approach and discusses issues that are independent of the class of Intserv service being implemented. Section 3 discusses implementation of the Controlled Load service. Section 4 discusses implementation of a mathematically correct Guaranteed service, and presents information about the performance and limitations of this implementation. Section 5 discusses implementation of close approximations to the Guaranteed service that may be acceptable in some circumstances and may allow more efficient use of network resources. Section 6 briefly describes the relationship of the mechanisms described here to the Intserv Null Service [NULL]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. 2. Basics 2.1. Components Figure 1 shows the basic use of a Diffserv network cloud as an Intserv network element. Figure 1The figure shows that required Intserv network element functions are mapped to the Diffserv cloud as follows: - Traffic scheduling. The Intserv traffic scheduling function is supported by appropriately selected, configured, and provisioned PHB's within the Diffserv network. These PHB's, when concatenated along the path of traffic flow, must provide a scheduling result that adequately approximates the result defined by the Intserv service. In general, the PHB concatenation will only be able to approximate the defined Intserv service over a limited range of operating conditions (level of traffic, allocated resources, and the like). In that case, other elements of the network, such as shapers and policers, must ensure that the traffic conditions seen by the PHB's stay within this range. - Traffic classification. The Intserv framework requires that each network element (re)classify arriving traffic into flows for further processing. This requirement is based on the architectural assumption that network elements should be independent, and not depend on other network elements for correct operation. NOTE: the Intserv framework does not specify the granularity of a flow. Intserv is often associated with per-application or per-session end-to-end flows, but in fact any collection of packets that can be described by an appropriate classifier can be treated as an Intserv traffic flow. When Intserv is mapped to Diffserv, packets must be classified into flows, policed, shaped, and marked with the appropriate DSCP before they enter the interior of the diffserv cloud. Strictly speaking, the independence requirement stated above implies that the ingress boundary router of each diffserv cloud must implement a MF classifier to perform the classification function. However, in keeping with the diffserv model, it is permissible to push the flow classification function further towards the edge of the network if appropriate agreements are in place. For example, flows may be classified and marked by the upstream edge router if the Diffserv network is prepared to trust this router. - Policing and shaping. In terms of location in the network, these functions are similar to traffic classification. A strict interpretation of the Intserv framework would require that the ingress boundary router of the diffserv cloud perform these functions. In practice, they may be pushed to an upstream edge router if appropriate agreements are in place. Note that moving the shaping function upstream of the diffserv ingress boundary router may result in poorer overall QoS performance. This is because if shaping is performed at the boundary router, a single shaper can be applied to all of the traffic in the service class, whereas if the shaping is performed upstream separate shapers will be applied to the traffic from each upstream node. As discussed further in Section 4, the single shaper may be preferable in some circumstances. - Admission control. The quantitative Intserv services (Guaranteed and Controlled Load) require that some form of admission control limit the amount of arriving traffic relative to the available resources. Two issues are of interest; the method used by the diffserv cloud to determine whether sufficient resources are available, and the method used by the overall network to query the diffserv cloud about this availability. Within the cloud, the admission control *mechanism* is closely related to resource allocation. If some form of static resource allocation (provisioning) is used, the admission control function can be performed by any network component that is aware of this allocation, such as a properly configured boundary router. If resource allocation within the network cloud is dynamic (a dynamic "bandwidth broker" or signaling protocol) then this protocol can also perform the admission control function, by refusing to admit new traffic when it determines that it cannot allocate new resources to match. The admission control *mechanism* used is independent of the admission control *algorithm* used to determine whether sufficient resources are available to admit a new traffic flow. The algorithm used may range from simple peak-rate allocation to a complex statistical measurement-based approach. The choice of algorithm is dependent on the Intserv service to be supported. Admission control algorithms appropriate for each service are discussed in the service specific sections below. The admission control mechanism used within the diffserv cloud is also independent of the mechanism used by the outside world to request service from the cloud. As an example, end-to-end RSVP might be used together with any form of interior admission control mechanism - static provisioning, a central bandwidth broker, or aggregate RSVP internal signalling. 2.2. Per-Cloud versus Per-Path Control The key to providing absolute, quantitative QoS services within a diffserv network is to ensure that at each hop in the network the resources allocated to the PHB's used for these services are sufficient to handle the arriving traffic. As described above, this can be done through a spectrum of mechanisms ranging from static provisioning to dynamic per-hop signaling within the cloud. Two situations are possible: - With per-cloud provisioning, sufficient resources are made available in the network so that traffic arriving at an ingress point can flow to *any* egress point without violating the PHB resource allocation requirements. In this case, admission control and traffic management decisions need not be based on destination information. - With per-path provisioning, resources are made available in the network to ensure that the PHB resource allocation requirements will not be violated if traffic arriving at an ingress point flows to one (in the unicast case) specific egress point. This requires that admission control and resource allocation mechanisms take into account the egress point of traffic entering the network, but results in more efficient resource utilization. Two points are important to note: - Both approaches are valuable, but all functions must adopt the same approach. Particularly, if resource allocation is per-path, traffic shaping and policing, and hence classification must be destination aware as well. - The per-cloud vs per-path decision is independent of decisions about static vs. dynamic provisioning. It is often assumed that dynamic provisioning is necessarily per-path, while static provisioning is more likely to be per-cloud. In reality, all four options may be useful in differing circumstances. 3. Implementation of the Controlled Load Service 3.1. Summary of CL Requirements The essence of the Controlled Load service is that traffic using it experiences the performance expected of an unloaded network. The CL specification [CL] refines this definition. - Controlled Load traffic is described by a token bucket Tspec. When traffic is conformant to the Tspec, network elements will forward it with queuing delay not greater than that caused by the traffic's own burstiness - that is, the result of the source emitting a burst of size B into a logical network with capacity R. Further, in doing this no packets will be discarded due to queue overflow. Statistically rare deviations from this ideal behavior are permitted. A measure of the "quality" of a CL service is how rare these deviations are. NOTE: the actual behavior requirements stated in the CL spec are slightly more detailed than what is presented here. - Network elements must not assume that that arrival of nonconformant traffic for a specific controlled-load flow will be unusual, or indicative of error. In certain circumstances large numbers of packets will fail the conformance test *as a matter of normal operation*. Some aspects of the behavior of a CL network element in the presence of nonconformant traffic are specified. (These circumstances include elements carrying traffic from adaptive applications that use the CL service to provide a floor on performance but constantly try to do better, elements acting as the "split points" of a multicast distribution tree or carrying multi-source aggregate flows, such as those generated by RSVP's wildcard or shared-explicit reservation styles supporting a shared reservation). In the presence of nonconformant packets arriving for one or more controlled-load flows, each network element must ensure locally that the following requirements are met: 1) The network element MUST continue to provide the contracted quality of service to those controlled-load flows not experiencing excess traffic. 2) The network element SHOULD prevent excess controlled-load traffic from unfairly impacting the handling of arriving best- effort traffic. 3) Consistent with points 1 and 2, the network element MUST attempt to forward the excess traffic on a best-effort basis if sufficient resources are available. These points lead to two observations about a well implemented CL service. - CL traffic can be sorted into "delay classes" based on burstiness. Highly bursty flows, having a large ratio of Tspec parameters B/R, should expect to experience more queuing delay than their low-burstiness counterparts. Thus, a good CL implementation will sort the offered CL traffic into sub-classes that are expecting roughly equivalent delay, and queue these subclasses independently to achieve this result. - The CL specification leaves open the precise treatment of nonconformant traffic, giving only the minimum requirements listed above. NOTE: The phrase "best effort basis" in the portion of the CL spec quoted above has sometimes been taken to mean "the traffic must be placed in the best effort traffic class and treated identically to BE traffic". This interpretation is incorrect. It is easy to see this at one level, because if nonconformant CL traffic from non-adaptive applications is simply lumped in with adaptive best-effort traffic it will tend to unfairly impact that traffic, in contravention of point 2). However, the intent of the specification is more general. An appropriate reading is "nonconformant CL traffic should be transmitted, when possible, in the way that is most advantageous to users and applications, subject to the requirements on non-interference with other traffic". This allows the CL service to be used both to provide a specific QoS for non-adaptive applications and as to provide a "floor" or minimum QoS for adaptive applications. 3.2. Implementation of CL using the AF Per-Hop Behavior The CL service can be supported most effectively using an appropriately designed and configured Assured Forwarding PHB implementation [AF] as the data forwarding element. This approach SHOULD be used whenever possible. The basics of the AF-based approach are as follows: - Sort the offered CL traffic into delay classes based on the B/R ratio of the Tspec. The packets of each delay class will be forwarded using a separate instance of the AF PHB. - For each delay class, construct an aggregate Tspec for the admitted traffic according to the rule for summing Tspecs given in [CL]. This Tspec will be used to police the traffic for conformance at the ingress to the diffserv cloud. - For each delay class, police arriving packets against the token bucket Tspec derived above. Mark conforming packets with a DSCP indicating the selected AF instance, and highest priority forwarding within that instance. Mark nonconformant packets with a DSCP indicating the selected AF instance, and lowest priority forwarding within that instance. - At each node within the diffserv network, configure each AF instance appropriately by: a) setting the actual queue size (or alternatively the dropping parameters for high priority packets) to limit queuing delay to the delay class's target. (In other words, packets that have been delayed beyond the class target should be dropped). b) setting the dropping parameters for low priority packets to drop such packets as soon as any significant non-transient queuing of these packets is detected. c) setting the service rate of the AF instance to a bandwidth sufficient to meet the delay and loss behavior requirements of the CL spec when only high-priority packets are present. - Implement an admission control algorithm that ensures that at each hop in the network the level of conformant traffic offered to each AF instance is equal to or less than that provisioned for in step 4c above (or alternatively dynamically allocates more bandwidth to the relevant AF instance when required). In addition to these basic actions, two subtleties with the use of AF must be observed. First the relationship between different AF instances, and between AF and other PHBs, must be more tightly constrained than is required by the the base AF specification. - Bandwidth should be allocated between AF and BE (and any other relevant PHB's) in such a way that AF cannot simply steal all best-effort bandwidth on demand. A simple WFQ or CBQ scheduler can meet this requirement. - The bandwidth allocation relationship between different AF instances must be known. Two likely relationships are o Bandwidth is allocated to each AF instance independently, as with a WFQ scheduler. o Bandwidth is allocated across the AF instances used for CL service on a priority basis, with the AF instance supporting the lowest delay class of CL having the highest priority. Either of these approaches may be used. However the choice of approach affects the admission control decision, and must be taken into account. In the first case, admission control decisions may be made for each CL delay class independently. In the second case, admission control decisions for high priority classes will affect lower priority classes, which must be taken into account. The second subtlety is that the implementation of AF must service the AF classes in a timely manner, by ensuring that the bandwidth allocated to an AF instance is made available at a time-scale substantially shorter than the delay target of the class. This requirement is slightly stronger than that stated in the AF specification. In practice, any implementation using a common queuing algorithm is likely to be able to meet this requirement unless other PHB's, such as EF, are served at higher priority. When that is true, the traffic seen by the higher priority PHB will also require limiting and shaping in order to ensure that the CL AF instances receive bandwidth on a timely basis. The overall result of this procedure is an implementation of the CL service with the following characteristics: - Conformant CL traffic is carried according to the CL requirements. - Resources are used efficiently by aggregating traffic with similar requirements, but supporting multiple delay classes for traffic with widely differing requirements. - Non-CL traffic is carried whenever resources permit, and is not reordered with respect to the CL flow's conformant traffic. - Nonconformant CL traffic is not able to disrupt traffic of other classes, particular BE. 3.2.1 CL/AF Admission Control Approaches 3.3. Implementation of CL using the EF Per-Hop Behavior It is also possible to implement an approximation of the Controlled Load service using the Diffserv Expedited Forwarding [EF] PHB as the traffic scheduling element. This approach is not preferred, because of two significant limitations. Therefore, this approach SHOULD NOT be used unless the AF-based approach is not available. - Because there is only one EF scheduling class per node, it is impossible to sort the Controlled Load traffic into queuing delay classes, as described above for the AF implementation. Instead, all CL traffic must be handled as one scheduling class, and sufficient resources must be allocated to the class to cause *all* CL traffic to meet the queuing delay expectations of the most demanding flows. - Because the EF PHB requires a hard limit on the amount of traffic passing through it, a CL service implemented using EF cannot handle nonconformant (over-Tspec) traffic gracefully, as can be done with AF. Instead, nonconformant traffic must either be discarded at the ingress of the Diffserv cloud or remarked into a different behavior aggregate, and thus potentially reordered in transit. Either of these behaviors is less desirable than the one obtained from the AF-based implementation above. Notwithstanding these limitations, it may be useful to implement a CL approximation based on the EF PHB when the Diffserv network does not support the AF PHB, or when the implementation of the AF PHB cannot assure the forwarding of traffic in a sufficiently timely manner. In this case: - All CL traffic is marked with a DSCP corresponding to the EF PHB. - A single aggregate Tspec for all CL traffic is computed for each network ingress. - Arriving CL traffic is policed against this Tspec, and nonconformant traffic is either discarded or remarked as BE, at the preference of the network operator. - At each hop within the network the EF PHB must receive a bandwidth allocation sufficient to meet the requirements given in the EF specification when the arriving CL traffic is at the Tspec level for that point within the network. - The topology of the network must be designed so that the instantaneous queuing delay caused by fan-in to a node will exceed the CL requirements rarely or never. In practice, this will be a concern only with very high fan-in topologies. 4. Implementation of the Guaranteed Service The Guaranteed service [G] offers a strict mathematical assurance of both throughput and queuing delay, assuming only that the network is functioning correctly. A key concept of the Guaranteed service is that "error terms", referred to as C and D in the specification, are provided by the network element to the customer, allowing the customer to calculate the bandwidth it must request from the network in order to achieve a particular queuing delay target. Thus, the two important tasks in implementing a Guaranteed service network element are providing the traffic scheduling, policing, and shaping functions needed to support a hard bound on performance, and characterizing the network element's error terms so that the customer of the service can accurately characterize the network path and deduce what level of resources must be requested. Our strategy for implementing these capabilities within a diffserv cloud revolves around the use of the EF PHB for Guaranteed traffic, together with the shaping and policing functions necessary to obtain a performance bound in this context. The basic traffic policing and shaping requirements for Guaranteed service are discussed more fully in the service specification. Delay through a Diffserv cloud can be roughly classified into propagation and serialization delay, shaping/reshaping delays at the boundary, and queuing delay inside the cloud. In order to determine the error terms C_dc and D_dc for the Diffserv cloud needed for end-to-end determination of end-to-end delay, each of these delay components need to be evaluated. The difficulty in characterizing C_dc and D_dc is that unlike the Intserv model, where the C and D terms are a local property of the router, in the case of Diffserv cloud these terms depend not only on the topology of the cloud, but also on the internal traffic characteristics of potentially _all_ EF traffic in the cloud. Hence, the existence of upper bounds on delay through the cloud implies centralized knowledge about the topology of the cloud and traffic characterization. In turn, dependence of the delay bounds on traffic characterization at any ingress point to the cloud implies the existence of a policy that defines traffic characterization rules, as well as implementation mechanisms at _all_ ingress points in the network that enforce that policy. These considerations imply that determination of the bound on the delay through the Diffserv cloud should be performed off-line, perhaps as part of a traffic management algorithm, based on the knowledge of the topology, traffic patterns, shaping policies, and other relevant parametersof the cloud. These parameters are discussed in the following sections with respect of each delay component. Once the delay bounds and determined, the corresponding error terms C_dc and D_dc are configured into the appropriate intserv-capable edge routers, as discussed below. 4.1 Propagation and Serialization Delay. These delay components can be bounded by modeling the Diffserv cloud as a sequence of at most h links, each of which of at most length C. The parameters (h, C) determine the so-called "diameter" of the cloud. The knowledge of this diameter can then be used to obtain upper bounds on the propagation and serialization delay through the cloud. 4.2 Shaping delay. The Diffserv EF PHB assumes that traffic entering the Diffserv region is conditioned at the Diffserv cloud boundary. In the framework of Figure 1, shaping is expected to take place at the ingress edge router ER1, and optionally at the boundary router BR1. Granularity of such shaping is implementation dependent, and can range from microflow shaping to aggregate shaping. The granularity of aggregation can be "all EF traffic between a particular ingress-egress pair", which is frequently referred to as "pipe model", or "all EF traffic originating at a given ingress to all possible destinations", which is frequently referred to as "hose model". In addition to ingress shaping, the Diffserv model allows re-shaping traffic at the egress point. As for the case of ingress shaping, the egress shaping can be implemented either at BR2 or ER2. The effect of different choices of the location and granularity of shaping on the delay guarantees that can be provided by a Diffserv cloud will be discussed in section ??. In this section we consider the effect of this choices on the C and D terms advertised by the Interv-capable routers ER1 and ER2. Note that the Intserv capable router downstream from the Diffserv cloud (ER2 in the reference network of Figure 1) is responsible for exporting the C and D terms of the Diffserv cloud. 4.2.1. Shaping at the Edge Routers If shaping is performed at the ingress edge router ER1, and reshaping, if any, is performed at ER2, but there is no shaping implemented inside the Diffserv cloud, the shaping/reshaping delay is part of the total delay advertised by the edge routers ER1 and ER2, and hence the corresponding C and D terms are exported by the Intserv-capable edge routers. These will be denoted as C_is, D_is, C_es, D_es respectively, where the indices _is and _es denote "ingress shaper" and "egress shaper". The values of these parameters are implementation dependent. Since the Diffserv cloud itself does not perform any shaping in this case, its C_dc should be set to zero. The determination of the value of D_dc and factors affecting it are discussed in section 4.4 below. 4.2.2 Shaping at the boundary routers In the case where shaping is performed by the boundary routers, shaping and reshaping delay become part of the delay of the Diffserv cloud and hence have to be accounted for in the C_dc and D_dc error terms. Note that depending on the shaping implementation, the rate-dependent error term may not necessarily be zero, and hence ingress shaping may add a non-zero component to the C_dc value of the Diffserv cloud. Since the ingress shaping delay depends on the shaping implementation and shaping granularity at the border router, and since different border routers may implement different shaping algorithms, it seems natural to dedicate the responsibility to export the error terms for ingress shaping delay to the ingress edge router(s) attached to the border router. It is important to note that in the case of aggregate shaping, the shaping delay may be a function of the combined burst and combined rate of all microflows comprising the shaped aggregate (note that the aggregate may consist of microflows arriving from different ingress points). To enable an existence of a meaningful upper bound on the shaping delay the shapers at the edge routers must be configured in such a way as to ensure the existence of the bound on the shaping delay at the boundary router. This may be accomplished by emposing a policy such as "token bucket parameters of all flows requiring G support entering the diffserv cloud from any edge router should satisfy the condition (r>=r_min, b<=b_max). Such conditions would enable token bucket characterization of the aggregate stream, which in combination with the properties of the shaping implementation would enable the computation of an upper bound for a particular microflow. If the egress boundary router implements reshaping on an aggregate basis, just as in the case if ingress shaping, the egress reshaping delay of a microflow depends on the combined rate and burstiness of the aggregate which is being reshaped. Aggregate burstiness depends, among other things, on the parameters of ingress shapers and on the delay bound of the diffserv cloud incurred by all microflows after the last shaping point. The C and D terms corresponding to the egress boundary shaping must be configured at the egress edge router, which is responsible for exporting the egress shaping component of the C and D terms of the Diffserv cloud. In addition, just as in section 4.2.1, the egress edge router is responsible for exporting the D_ds component of the delay inside the diffserv cloud which is not due to the shaping or reshaping delays. 4.2.3. Shaping inside the Diffserv cloud While the Diffserv model does not prevent shaping inside the cloud as well as at the boundaries, this draft will concentrate on the most common case when all internal interfaces of any node in the diffserv cloud implement work-conserving aggregate class-based scheduling only. 4.3 Queuing delay Queuing delay experienced by a given packet is caused by two reasons: contention with other packets in the scheduler and the interruption of service experienced by the scheduler as a whole. A typical example of the latter is the delay in a single processor system when the processor schedules some tasks other than packet scheduler. If a bound on this latter portion of the delay is known for all routers inside the diffserv cloud, then the contribution of this delay component can be bounded by multiplying this bound by the max hop count h. The component of the queuing delay due to contention with other packets in the link scheduler will be discussed in detail in section 4.4. For the sake of brevity, in the rest of this draft the term queuing delay will be used to refer to just the portion of the queuing delay due to contention with other packets in the scheduler. 4.4. Queueing delay bounds in the Diffserv Cloud The main difficulty in obtaining hard delay bounds for an arbitrary topology cloud arises from the assumption of aggregate scheduling inside the cloud. When a packet of some flow f traverses a sequence of aggregate queues, its worst case delay may depend on the traffic of other flows which do not even share a single queue with the flow. Moreover, the delay of a packet p of flow f at time t may be affected by flows whose last packets have exited the network long before the first packet of flow f entered the network [CHARNY]. The ability to provide hard delay bounds in a Diffserv cloud with aggregate scheduling must rely on cooperation of all devices in the cloud, as well as strict constraints on the traffic entering the cloud. It has been demonstrated that the knowledge of the following parameters global to the cloud is essential for the ability to provide strict queuing delay guarantees across the Diffserv cloud [CHARNY],[LEBOUDEC]: - limited number of hops of any flow across the cloud (denoted h) - low (bounded) ratio of the load of EF traffic to the service rate of the EF queue on any link in the cloud (denoted u) - minimum rate of the shaped aggregate (denoted r_min) - maximum token bucket depth of an edge-to-edge aggregate (denoted b_max) - minimum service rate of the EF queue (denoted S) - maximum deviation of the amount of service of the EF queue from the ideal fluid service at rate S (denoted E) Currently, the only known delay bound that holds for an arbitrary topology and arbitrary route distribution is given in [LeBouldec] by D = (E/S + ub_max/r_min)x h/(1-u(h-1)) which holds for any utilization u<1/(h-1). This bound holds for the case when the capacity of any single link is substantially smaller than the total capacity of all interfaces of any router. (This bound may be slightly improved if the capacity of a single link is not negligible compared to the total router capacity [LeBouldec]). Unfortunately, this bound explodes when u=1/(h-1). Some knowledge on either the topology or the routes in the cloud may yield to an improved bound. For example, for a class of network topologies which includes a multistage network it can be shown [CHARNY] that the bound is given by D = (E/S + ub_max/r_min)x((1+u)^h-1)/u While this bound holds for any utilization, due to the exponential term the delay grows very fast with the increase in utilization u. Unfortunately, at the moment no bound is known for a general topology with utilization greater than 1/(h-1). It can be shown [CHARNY], that for utilization values greater than 1/(h-1), for any value of delay D one can always construct a network such that the delay in that network is greater than D. This implies that either no bound exists at all, or if a bound does exist, it must depend on some additional characteristics of the network other than just h and u. The practical implication of these results is that, barring new results on delay bounds, the amount of traffic requiring end-to-end Guaranteed service across the diffserv cloud should be rather small. Furthermore, it also implies that if substantial amount of other EF traffic is present in the network, in order to ensure strict delay bounds for GS traffic, buffering and scheduling mechanisms must exist that ensure separation of the GS traffic using EF PHB from other traffic using EF PHB. 4.5. Relationship to Bandwidth Allocation Techniques and Traffic Conditioning Models 4.5.1. Availability of sufficient bandwidth As discussed in Section 4.4, in order to provide a strict delay bound across the Diffserv cloud the ratio of the EF load to the service rate of the EF queue has to be deterministically bounded on all links in the network. This can be either ensured by signaled admission control (such as using RSVP aggregation techniques [RSVPAGGR] or by a static provisioning mechanism. It should be noted that if provisioning is used, then to ensure deterministic load/service rate ratio on all link the network should be strongly overprovisioned to account for possible inaccuracy of traffic matrix estimates. In either case deterministic availability of sufficient bandwidth on all links is a necessary condition for the ability to provide deterministic delay guarantees. 4.5.2. Effect of Shaping Granularity on Delay Bounds A related, although different issue for the ability to provide delay deterministic delay guarantees is the granularity of the ingress shaping. The implications of different choices on the resulting delay bounds are discussed in the following subsections. 4.5.2.1 Per-microflow shaping The known worst case delay bound is linear in the ratio b_max/r_min. In the case of microflow shaping, the minimal rate of the microflow can be quite small, resulting in a large delay bound. There is a substantial advantage therefore in aggregating many small microflows into an aggregate and shaping the aggregate as a whole. While in principle there is a range of choices for aggregation, this document will consider only two: edge-to-edge aggregation and edge-to-everywhere aggregation. 4.5.2.2. Shaping of edge-to-edge aggregates This type of shaping is natural for explicit bandwidth reservation techiques. In this case r_min and b_max relate to the rate and token bucket depth of the border-to-border aggregates. Since the delay bound is linear in b_max/r_min, aggregating as many microflows sharing the same border-to-border pair as possible results in the increase of r_min, and hence in the decrease of the delay bound. The location of the shaper at the border router is therefore beneficial for reducing the edge-to-edge delay bound. 4.5.2.3. Shaping of edge-to-everywhere aggregates. This type of shaping is frequently assumed in conjunction with bandwidth provisioning. The effect of this choice on delay bounds depends on exactly how provisioning is done. One possibility for provisioning the network is to estimate edge-to-edge demand matrix for EF traffic and ensure that there is sufficient capacity to accommodate this demand, assuming that the traffic matrix is accurate enough. Another option is to make no assumption on the edge-to-edge EF traffic distribution, but rather admit a certain amount of EF traffic at each ingress edge, regardless of the destination edge, and provision the network in such a way that even if _all_ traffic from _all_ sources happens to pass through a single bottleneck link, the capacity of that link is sufficient to ensure the appropriate load to service rate ratio for the EF traffic. Depending on which of the two choices for provisioning is chosen, shaping of the edge-to-everywhere aggregate has the opposite effect on the delay bound. In the case of "edge-to-edge provisioning", the bandwidth of any link may be sufficient to accommodate the _actual_ load of EF traffic while remaining within the target utilization bound. Hence, it is the minimal rate and the maximum burst size of the _actual_ edge-to-edge aggregates sharing any link that effect the delay bound. However, aggregate edge-to-all shaping may result in individual substreams of the shaped aggregate being shaped to a much higher rate than the expected rate of that substream. When the edge-to-everywhere aggregate splits inside the network into different substreams going to different destinations, each of those substreams may have in the worst case substantially larger burstiness than the token bucket depth of the aggregate edge-to-everywhere stream. This results in substantial increase of the worst case delay over the edge-to-edge shaping model. Moreover, in this case the properties of ingress shapers do not provide sufficient information to bound the worst case delay, since it is the burstiness of the _substreams_ inside the shaped aggregates that is needed, but is unknown. In contrast, if the "worst case" provisioning is assumed, the network is provisioned in such a way that each link can accommodate all the traffic even if all edge-to-everywhere aggregates end up sharing this link. In this case the r_min and b_max of the edge-to-everywhere aggregate should be used without modification in the formula for the delay bound. Intuitively, in this case the actual traffic distribution can only be better than the worst case, in which all the aggregate traffic at a given ingress is destined to the same "worst case egress". Note that the "worst case" provisioning model targeting a particular utilization bound results in substantially more overprovisioning than the the "point-to-point" provisioning using an estimated traffic matrix, or explicit point-to point bandwidth allocation using signaled admission control. 4.6 Concatenation of Diffserv Clouds In the case where one or more Diffserv clouds are concatenated via an Intserv-capable node, the total delay is simply a concatenation of delays computed for each individual intserv-diffserv-insterv segment along the path. However, obtaining end-to-end delay bound for a concatenation of Diffserv clouds via nodes implementing aggregate scheduling is a more complicated problem which requires further research. 5. Implementation of Resource Efficient Close Approximations to the Guaranteed Service 6. Relationship to the Null Service The Intserv "Null Service" [NULL] differs from other defined services by not expressing any quantitative network performance requirements. Use of the Null Service where an Intserv service class is required allows an application or host requesting QoS control service to express policy related information to the network without making a specific quantitative QoS request. The assumption is that the network policy management and control elements will use this information to select an appropriate QoS for the requesting entity, and take whatever action is required to provide this QoS. One possibility is that the network policy mechanisms will determine that a quantitative end-to-end QoS is appropriate for this entity, and that this QoS can be provided using Intserv mechanisms. In this case, the Null service selector can be replaced, at the first hop router or elsewhere along the path, with a different Intserv service class and related parameter information. Once this occurs, the situation with respect to the use of Diffserv networks to provide the desired QoS is identical to that described above for these other services. A second alternative is that the network policy mechanisms determine that the requesting entity should receive a relative, rather than absolute (quantitative) level of service. In this case, the packets are marked with the appropriate DSCP, but the admission control actions described above are not necessary. 7. Security Considerations 8. References [AF] Heinanen, J., Baker, F., Weiss, W., Wroclawski, J., "Assured Forwarding PHB Group", RFC 2597, June 1999. [CHARNY] Anna Charny, "Delay Bounds in a Network with Aggregate Scehduling", work in progress, ftpeng.cisco.com/ftp/acharny/aggregate_delay_v4.ps [CL] Wroclawski, J., "Specification of the Controlled-Load Network Element Service", RFC 2211, September 1997 [DCLASS] Bernet, Y., "Usage and Format of the DCLASS Object With RSVP Signaling", Internet Draft draft-ietf-issll-dclass-00.txt [DIFFSERV] Blake, S., Black, D., Carlson, M., Davies, E., Wang Z., Weiss, W., "An Architecture for Differentiated Service", RFC 2475, December 1998. [EF] Jacobson, V., Nichols, K., Poduri, K., "An Expedited Forwarding PHB", RFC 2598, June 1999. [G] Schenker, S., Partridge, C., Guerin, R., "Specification of Guaranteed Quality of Service", RFC 2212 September 1997 [GENCHAR] Shenker, S., Wroclawski, J., "General Characterization Parameters for Integrated Service Network Elements", RFC 2215, September 1997 [INTSERV] Clark, D. et al. "Integrated Services in the Internet Architecture: an Overview" RFC1633, June 1994 [ISDSFRAME] Several hundred people, Internet Draft draft-ietf-issll-diffserv-rsvp-03.txt [LEBOUDEC] Jean-Yves LeBoudec, "A Proven Delay Bound in a Network with Aggregate Scheduling", work in progress, http://ica1www.epfl.ch/PS_files/ds2.ps [NULL] Bernet, Y., Smith, A., Davie, B., "Specification of the Null Service Type", Internet Draft draft-ietf-issll-nullservice-00.txt [RSVP] Braden, R., L. Zhang, S. Berson, S. Herzog, S. Jamin, "Resource Reservation Protocol (RSVP) - Version 1 Functional Specification", RFC 2205, September 1997 [RSVPAGGR] Baker, F., Iturralde, C., Le Faucheur, F., Davie, B., "Aggregation of RSVP for IPv4 and IPv6 Reservations", Internet Draft draft-ietf-issll-rsvp-aggr-02.txt [RSVPINTSERV] Wroclawski, J., "The use of RSVP with IETF Integrated Services", RFC 2210, September 1997. 9. Authors' addresses John Wroclawski MIT Laboratory for Computer Science 545 Technology Sq., Cambridge, MA 02139, USA Phone: +1 617 253 7885 EMail: jtw@lcs.mit.edu Anna Charny Cisco Systems 250 Apollo Drive, Chelmsford, MA 01824, USA Phone: +1 978 244 8000 Email: acharny@cisco.com 10. Full Copyright Copyright (C) The Internet Society 1999. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. draft-ietf-issll-ds-map-00.txt Expires September, 2000