Internet Draft

Internet Draft





TE Working Group                                           B. Christian
Internet Draft                                                    UUNET
Document: draft-christian-tewg-measurement-00.txt             B. Davies
Category: Informational                                           UUNET
                                                                  H.Tse
                                                                  UUNET
                                                               Jul 2000



           Operational measurements for Traffic Engineering


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026 [1].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet- Drafts
   as reference material or to cite them other than as "work in
   progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.



1. Abstract

   This memo describes measurement in order to accomplish Traffic
   Engineering (TE) in IP networks.  This document will aid vendors in
   their choice of information to provide; it will assist network
   operators in determining the appropriate information to request; and
   will demonstrate how measurements are used to accomplish TE.  The
   objective of this memo is to describe TE measurement.  This memo
   will also describe (in brief) some methods for using the variables
   and some methods for gathering the information.





Christian/Davies/Tse   Informational - Dec2000                       1


               draft-christian-tewg-measurement-00.txt       July 2000





2. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in RFC-2119 [2].


3. What is Traffic Measurement?

   Traffic Measurement (TM) is defined for the purposes of this
   document as a means of characterizing a flow of IP packets from one
   point to another.  The characteristics of a traffic flow can be
   loosely defined as Throughput, Loss, Delay, Path, and Lifetime.
   These characteristics should be represented in every device that
   carries a flow of IP traffic.  Delay variation and other measures
   are modifications of the above.  A traffic flow can become
   arbitrarily specific.  An example would be the measurement of
   traffic on a physical link as compared to measuring traffic on a
   virtual link.  A physical link with many virtual links will
   aggregate a number of smaller traffic flows.  A flow can also be an
   aggregate of physical links in schemes such as link bundling or
   ECMP.

   The measurement of traffic is meant to "facilitate reliable network
   operations." [AWD1]   Traffic measurement provides a means for
   capacity planning as well as a means to work around congestion.
   Traffic measurement standards need to be protocol independent and
   should be portable across platforms.  Traffic measurement is
   accomplished with the goals of modifying the path of traffic,
   allocating capacity, reducing congestion, and observing trends.


4. Advantages of TE measurements

   4.1  Real-time and long-term TE measurement

   TE measurements are instrumental in providing real-time as well as
   long-term proactive TE.  Network performance may be evaluated by
   examination of TE measurements.  Measurements, such as throughput
   vs. maximum bandwidth, can indicate link utilization and link
   congestion on the network.  Due to the transient nature of the
   network, the measurements must be able to derive the real-time
   characteristics of the network to be effective.




Christian/Davies/Tse   Informational - Dec2000                       2


               draft-christian-tewg-measurement-00.txt       July 2000




   Over a period of time, measurement metrics should be able to provide
   for long-term TE.  Long term TE includes traffic growth patterns,
   congestion issues, and traffic peak patterns.  Traffic growth and
   peak patterns can be derived from measurements such as throughput
   and peak rate.  Measurements must facilitate proactive TE strategies
   to optimize the network or to avoid undesirable network conditions.

   4.2. Measurements for traffic management

   4.2.1. Load balancing

   To perform TE is to be able to optimize network traffic flows and
   balance network traffic on multiple trunks.  During load balancing,
   traffic will be partitioned at the incoming interface onto multiple
   virtual paths.  In the case of virtual links, based on the TE
   measurements, secondary link(s) with the appropriate requirements
   may be created to accommodate load balancing.  Measurement of
   available bandwidth, loss, and delay are critical in determining the
   feasibility of creating secondary connections.

   Measurements, such as available bandwidth, change constantly.  The
   network will not be in a steady load-balanced state because of its
   dynamic changing flow.  In order to achieve a load-balanced steady
   state TE measurements are needed to determine recomputation and
   optimization intervals.

   4.2.2. Policy-based TE measurements

   Policy-based TE provides flexibility in the specification of the
   network optimization objectives and constraints.  Policy can be
   adjusted or fine-tuned on a continuous basis.  Policy attributes on
   network path include priority, preemption, resilience, resource
   classes and policing.

   Policy-based TE measurements should compare the metric values with
   the thresholds based on the policy to trigger the appropriate
   actions.  Policy-based measurements can be used to identify
   potential network traffic issues.  Comparison of the measurements
   and policy-based thresholds can be setup statically at a predefined
   time interval or dynamically at event occurrence.

   For instance, in the event of path preemption, the traffic pattern
   can be impacted and the traffic flow changes.  Measurements should
   be compared with the threshold values to ensure that proper actions




Christian/Davies/Tse   Informational - Dec2000                       3


               draft-christian-tewg-measurement-00.txt       July 2000



   are taken if the preemption induces undesirable effects on the
   traffic pattern.  Policy-based TE should be in compliance with the
   Policy Information Base (PIB) specifications.

   Constraint-based routing (CBR) TE specifies a finer subset of the
   policy-based TE.  CBR takes place when all the specified constraints
   are met by the TE measurements.    Measurements must provide traffic
   characteristics in order to facilitate constraint-based routing
   comparison.  Constraint specifications can include peak rate,
   committed rate and service levels.  Policy-based TE measurements,
   such as bandwidth availability, can be compared with the peak rate
   and committed rate constraints to determine if they are met or not.

   4.2.3. Measurements for Path Protection/Restoration

   Fault detection, path protection, and restoration are imperative in
   an operations environment.  TE measurements are essential to ensure
   these mechanisms are in place.  Faults can be identified using TE
   measurements such as packet loss or low throughput.  Notifications
   may be generated automatically based on the observed value of these
   variables.

   Other metrics can determine the amount of spare capacity for
   different failure recovery scenarios.  For example:

    a. Prior to restoring traffic to the original path
    b. Prior to creating the protection path

   Examination of TE measurement metrics can also be used to ensure
   that there is no overlap of the primary and secondary paths.


5. Throughput

   Throughput is a measure of the amount of traffic that passes between
   a set of end points, where end points can be logically or physically
   defined. The amount of traffic is a measure of the quantity of bits
   that pass over a period of time.  This is usually represented as
   Bits Per Second or BPS.  Another facet of throughput is Packets Per
   Second or PPS.  PPS is infrequently used.  However, PPS in
   conjunction with BPS will allow the operator to determine average
   packet sizes.  Average packet size is an important measure as some
   vendors can have problems passing small packets at line rate.






Christian/Davies/Tse   Informational - Dec2000                       4


               draft-christian-tewg-measurement-00.txt       July 2000



   Both Medium and Long Term TE require a measure of throughput for
   intervention in scenarios of decreasing bandwidth availability as
   well for planning future capacity needs.  Throughput measurement
   will also be important in situations where new software creates the
   demand for dynamic IP flow controls.  See [AWD2] for a more detailed
   explanation of TE over time.

   Throughput for general usage is best measured at a regular interval.
   Most operators choose 5 minutes as their interval of choice.  This
   provides for an approach that is granular without being so
   aggressive that the amount of data recorded becomes overwhelming.
   The use of the 5-minute interval is best when active traffic
   measurement (active traffic measurement is measurement with network
   operator involvement) is not being performed.  The choice of 5-
   minute interval provides for enough data to identify
   daily/monthly/weekly trends.  This data is used to predict capacity
   needs and to identify points of rising congestion.  During periods
   of active traffic measurement intervals of 5 seconds are not
   uncommon.  Active throughput measurement is undertaken in order to
   provide a means of working with points of congestion. With active
   throughput measurement the operator will identify flows and choose
   alternate paths or other modifications of flow parameters.   Active
   throughput measurement also provides a means of monitoring changes
   to network parameters and the impact on traffic during production
   traffic engineering efforts.

   Vendors provide various levels of throughput measurement.  Some
   vendors choose to measure throughput as the amount of IP traffic
   passed.  Unfortunately, with differing methods it becomes necessary
   to remember which vendor you are measuring and adjust appropriately.
   An example would be switch vs. router.  Many switches report the
   throughput of their protocol (such as ATM) which is, of course,
   greater than the throughput possible for an IP packet encapsulated
   within the protocol.  A measure of throughput, which relates the
   most to what an IP packet perceives as throughput, would include
   only the IP packet.  Additional encapsulation can create a false
   sense of capacity since some methods of switching can take up
   significant amounts of bandwidth (see ATM).  The above statements
   seem to indicate that the best method for representing IP traffic is
   to subtract all additional forms of encapsulation from your
   measurements.  This requires that the amount of space used for
   encapsulation be well known.  For most encapsulation methods this
   works quite well since the amount of space necessary is well known.






Christian/Davies/Tse   Informational - Dec2000                       5


               draft-christian-tewg-measurement-00.txt       July 2000



   The 95th percentile is used to determine flow utilization.  The
   percentile allows the capacity planner to determine future needs
   while avoiding the statistical anomalies that are inherent in packet
   networks.  For the network operator 95 percent utilization is used
   to set alarms as well as determine that flows are approaching their
   predefined thresholds.

6. Loss

   A flow has certain requirements it must satisfy in order to be
   considered a quality service.  The degree of loss is an important
   factor.  No internet service (or it's component flows) will always
   be 100% loss free, therefore the loss constraint must be defined
   based on network dynamics and internal system constraints (topology,
   bandwidth etc.).  What is acceptable loss?  None is the preferred
   answer, but that is not always practical or possible.

   Generically, loss can be viewed as a quality attribute of a flow.
   The loss attribute of a flow, when compared to the predetermined
   constraints allows for problem determination.  Accounting and
   measurement (real-time and long-term) provide the necessary
   information for developing a solution and finding the best possible
   resolution based on the system constraints.

   Traceroute & Ping at L3 allow the user to see loss and latency.
   Traceroute at L2 (in an overlay) can allow the user to see problems
   at L2.  Physical outages and errors can lead to any number of higher
   level errors.

   Loss can be caused by outages in a bandwidth guaranteed TDM system
   (such as SONET/SDH) where no statistical gain is generally achieved.
   Loss can also be attributed to statistical systems where demand
   outweighs supply.

   (input port 1 + input port 2) > output port 1

   Protection schemes such as (1+1, 1:1, N:1) can be used to mitigate
   TDM loss.  Buffering, scheduling, and randomized discard strategies
   can be used to mitigate statistical loss and protection schemes.

   A laundry list of required values needed to mitigate, plan for, and
   resolve a flow's loss attribute would include:

   Per traffic class loss statistics. (ex UBR/ABR/VBR/CBR, multiple
   FECs, diffserv)




Christian/Davies/Tse   Informational - Dec2000                       6


               draft-christian-tewg-measurement-00.txt       July 2000



   -Intentional loss (RED, policy, contract enforcement)
   -Unintentional loss (buffer over-utilization, congestion, etc)
   -Total loss (cause independent)

   Per flow loss statistics (VC, DLCI, LSP)
   -Intentional loss
   -Unintentional loss
   -Total loss

   Per interface loss statistics
   -Intentional loss
   -Unintentional loss
   -Total loss

7. Delay

   Delay measurement, defined as the time it takes for a packet to
   travel from source to destination, is a must for any IP forwarding
   device.  Delay directly affects the responsiveness of protocols such
   as TCP across the network.  Round-trip packet delay, in some cases,
   may not be equal to twice the one-way packet delay due to asymmetric
   paths. On an uncongested network, delay value will provide the
   ability to measure propagation and transmission delay. Delay
   measurement is very useful as the use of real time and delay
   sensitive applications is growing.

   Along with end-to-end delay, buffer delay should also be taken into
   consideration and measured separately. Buffer delay is defined for
   the purposes of this document as the time it takes for a node to
   transfer/switch a packet from the ingress to the egress interface.
   This value is dependent on the type/bandwidth of the ingress and
   egress interfaces.  Vendors have different implementations of the
   memory pools used for packet buffering e.g. per interface buffers or
   the use of a global pool of memory buffers, resulting in different
   values when measuring buffer delay.  In other words, different
   vendors can have different ingress to egress transit times.

   Measurement of buffer delay will create the ability to determine the
   amount of time involved in transiting a device.  This will help
   operators to determine congestion points as well as equipment
   performance under load.  In test scenarios the measurement of buffer
   delay is academic since, in most situations, the path will not have
   a speed of light delay that is measurable.  Sending alerts based on
   buffer delay provides a means of determining congestion without
   relying on tools such as ping which can add to the problem.  Ping




Christian/Davies/Tse   Informational - Dec2000                       7


               draft-christian-tewg-measurement-00.txt       July 2000



   and similar tools are also external indicators of performance issues
   and may not monitor all paths through the network (ECMP for
   example).  Pandiculation of buffer sizes will increase potential
   buffer delay and some vendors provide methods for doing this.

   Application level programs like ping and traceroute provide a means
   of measuring end-to-end delay. Most network management systems rely
   on pings to monitor performance of a given path. Methodologies for
   delay measurement on a node level will vary depending on vendor
   implementation. If all the nodes in the path of a packet are closely
   synchronized to a GPS clock, NTP (network time protocol) can be used
   as one way to measure packet delay. The source node will place a
   time-stamp in the packet and send it towards the destination. The
   destination node, upon receiving the packet, time-stamps it. The
   difference in value of the two time-stamps, along with any
   adjustment (adjustments may be necessary due to differences in clock
   synchronization) is one-way packet delay. The process can be
   repeated periodically with 3 to 5 packets sent in each instance.

   In addition to buffer delay, delay measurements can be impacted by
   frame translation.  When IP traffic is being switched or routed from
   a device to another, SAR process can take place to translate the
   frame format.  This will add delay into the switching or routing.
   Delay metrics for TE measurement can be optimized by engineering
   flows to avoid unnecessary frame translation or SAR.


8. Path

   Path can be described as the hops that packets in a flow will take
   from ingress node to egress node.  It is not uncommon for there to
   be three separate layers of path information, from physical layer,
   to switched layer, to IP layer.  Programs such as traceroute and
   ping can provide a record of the nodes that a packet has to
   traverse.  Ping and traceroute only provide IP layer information and
   when a traceroute UDP packet, or a ping with a record option set, is
   received by a node the packet leaves the switching path and the
   information regarding the switching environment in the node is lost.

   Path information provides the ability to determine a flows preferred
   topology.  Maintaining a history of previously preferred paths
   provides the ability to determine where a flow has previously lived
   and will provide the ability to prepare for network failures.
   Historical path information is used to determine failure scenarios





Christian/Davies/Tse   Informational - Dec2000                       8


               draft-christian-tewg-measurement-00.txt       July 2000



   that would represent overload based on aggregate potential flows
   over failover links (links that are preferred during outages).

   Hop count generally indicates on a node level how many nodes a
   packet has traversed in its quest for a destination.  Simply
   counting the number of hops that a flow commonly prefers and sending
   a alert when the count exceeds thresholds will provide the ability
   to determine that a path has reach an unreasonable length or that
   network state has changed.

9. Lifetime

   The lifetime of a flow is simply the measurement of the total time
   that the flow exists.  As stated before, a flow can exist on a
   physical or logical interface and could be permanent (such as a
   backbone connection) or dynamic (perhaps a VPN connection at certain
   times of day).  The lifetime can be used in several ways to help
   facilitate reliable network operations.

   In a perfect world, a permanent flow would have an infinite
   lifetime.  In reality, link outages, equipment failures, or
   scheduled maintenance will always cause flow to have a finite
   lifetime.  By tracking the lifetime of the flow, it's performance
   and reliability may be characterized.  The information gleaned from
   flow lifetimes could be applied to a network monitoring tool to
   alert operators to potential problems at lower OSI layers.

   Dynamic flow lifetime information is also very useful to operators
   or capacity planners.  The range of specialized IP services offered
   continues to grow, and planners will need to be able to maximize the
   use of their network resources (while minimizing loss of course).
   By understanding the lifetime of flows on the network it is possible
   to optimize traffic to use the network to the fullest extent while
   still maintaining an acceptable level of quality.

10. Applications of TE measurement

   Over a period of time, static and dynamic measurement metrics should
   be able to provide data for long-term TE.  Long term TE includes
   traffic growth patterns, congestion issues and traffic peak
   patterns.  Traffic growth and peak patterns can be derived from
   measurements such as peak and average rate.  Measurements must
   facilitate proactive TE strategy planning to optimize the network
   and to avoid undesirable network conditions.





Christian/Davies/Tse   Informational - Dec2000                       9


               draft-christian-tewg-measurement-00.txt       July 2000



   It is incumbent on the operator to determine intervals in which
   measurements should be accomplished.  The rate of change in the 95th
   percentile (throughput change over time) should cue the network
   operator to increase the frequency of TE efforts.  An operator in
   the summer months may adjust flow parameters on a monthly basis and
   in the winter months the operator may need to adjust on a weekly
   basis.  Tracking the rate of change over time will help the operator
   predict this type of behaviour.

   Policy-based TE measurements should compare metric values with
   thresholds based on the policy to trigger the appropriate actions.
   The policy-based measurements should be able to alert operators to
   potential traffic issues.  The comparison of measurements and
   policy-based thresholds can be setup statically at a pre-defined
   interval or dynamically at event occurrence.  For instance, in the
   event of path preemption, the traffic pattern can be impacted and
   the traffic flow changed.  Measurements should be compared with the
   threshold values to ensure proper actions to be taken if the
   preemption induces some undesirable effect on the traffic pattern.
   Policy-based TE should be in compliance with Policy Information Base
   (PIB) specifications.

   Constraint-based routing (CBR) TE specifies a finer subset of
   policy-based TE.  CBR takes place when all the specified constraints
   are met by the TE measurements.    Measurements must provide the
   explicit traffic characteristics in order to perform the comparison
   for CR.  Constraint specifications can include peak rate, committed
   rate and service levels.  Policy-based TE measurements, such as
   bandwidth availability, can be compared with the peak rate and
   committed rate constraints to determine if they are met.


11. Additional TE measurement considerations


   11.1. Protocol-independent link bundling considerations

   In order to reduce the overhead in managing multiple virtual links

   that are originated and destined from the same ingress and egress
   points, there is proposal to aggregate links for network
   optimization.  Component links will have same constraints, resource
   classes and attributes.  Multiple virtual links are treated as a
   single IP link.  TE measurements, such as bandwidth availability,





Christian/Davies/Tse   Informational - Dec2000                      10


               draft-christian-tewg-measurement-00.txt       July 2000



   throughput, should consider the measurements for bundled virtual
   links.


   There are ongoing discussions on virtual link/channel bundling for
   various standards under development or enhancement, such as MPLS,
   optical network.  TE measurements for virtual link/channel bundling
   should be protocol independent and media independent to ensure
   portability and commonality in the measurements.

   11.2. Feedback mechanisms for topology state considerations


   As part of the constraint-based routing measurements, all nodes
   require topology state information.  TE measurements should provide
   information, such as link availability, and maximum
   constraints/resources that each link can meet.  Topology
   information, such as throughput, loss, and bandwidth availability,
   changes continuously in a large-scale environment.  Information
   distribution methodology is usually based on flooding or pre-

   determined algorithm for topology changes.  It takes distribution
   and updating time to synchronize topology information while
   bandwidth measurements could be changed immediately.  As a result,
   not every node will have the same topology view.  In a large-scale
   operations environment, the topology information discrepancies on
   different nodes can be a problem in the event of failure or during
   recovery.


   TE measurements should consider the recent proposal for signaling
   protocol to include the actual link bandwidth availability at every
   link that it traverses.  This feedback mechanism for topology will
   require additional TE measurements to provide the actual information
   as part of the reverse flowing messaging.  The RSVP TLV-type of
   measurements should be protocol independent.  In addition to the
   feedback on the actual bandwidth, future TE measurements should

   consider information on the actual utilization, current congestion,
   and number of channels or wavelengths available as part of the
   feedback mechanism.

   11.3 Optical network considerations






Christian/Davies/Tse   Informational - Dec2000                      11


               draft-christian-tewg-measurement-00.txt       July 2000



   Optical network development is adding new dimensions to TE
   measurements. The role of optical switches in the traditional data
   router/switch network is increasing, TE measurements need to provide

   information on optical performance.

   Optical performance measurements for TE should include LOS, BER,
   insertion loss, OSNR, optical channel registration, optical
   compliance deviation, and optical power level.  The information can
   be distributed to the edge devices that interface the optical layer
   and data layer.  With these optical network measurements and IP data
   TE measurements, virtual paths/channels can be managed dynamically

   and performance can be optimized.

   The development of a traffic engineering control plane function in
   the optical network will require additional TE measurements.  There
   can be similarities in TE measurements for optical channels and
   labels, specifically resource availability and constraints for
   network dimensioning.


   11.4. ICMP extensions for one-way performance metrics

   TE measurements should consider the extension of ICMP for one-way
   traffic measurements.  The new ICMP messages, type 41, and type 42,
   are probe packets for probe request message and probe reply message,
   respectively.  They can provide information on one-way delay based
   on timestamp information and one-way loss rate based on the encoded

   sequence number.  The one-way delay and one-way loss can be useful
   in the TE one-way performance metric measurements.

   11.5. New requirement considerations

   Internet application development is increasing the complexity in the
   TE metrics.  An example is TE multicast, which requires measurements

   to facilitate traffic optimization when multicast and unicast
   traffic co-exist.  TE measurements for multicast need to provide
   information on constraints such as network utilization channel
   availability, delay, loss and throughput when creating the multicast
   tree.  Similarly, additional considerations for TE measurements are
   needed for the voice over IP applications.





Christian/Davies/Tse   Informational - Dec2000                      12


               draft-christian-tewg-measurement-00.txt       July 2000




12. Acknowledgments

   Special Thanks to Syed Malik, Josh Wepman, Brad Volz, Roshan
   Winslow, and Rick Glasser from UUNET.  And yet more thanks to Ed
   Balas and Mark Davisson from Caimis and to Abha Ahuja from the
   University of Michigan.



11. Authors' Addresses

   Blaine Christian
   UUNET
   Blaine@uu.net

   Brian Davies
   UUNET
   Daviesb@uu.net

   Heidi Tse
   UUNET
   Htse@uu.net




   12. References:

   [AWD1] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus,
   "Requirements for Traffic Engineering over MPLS," RFC 2702 September
   1999

   [AWD2] D. Awduche, A. Chiu, A. Elwalid, I. Widjaja, X. Xiao "A
   Framework for Internet Traffic Engineering", Work in Progress, May
   2000













Christian/Davies/Tse   Informational - Dec2000                      13