#### **DESIGN HINTS AND ISSUES**

# High Performance Design XC4000XL-1 FPGAs Exceed 100MHz

In addition to being the world's highest density FPGAs, the Xilinx XC4000XL-1 family is also the world's fastest. They offer greater than 100 MHz internal system clocks and more than 70 MHz in I/O speed. This combination of speed and density comes with low power and total compatibility with 3.3 volt or 5.0 volt logic.

The increase in speed can be quite substantial. Designs for the XC4000E-3 family will run 80-100% faster on the equivalent XC4000XL-1 devices. The pin compatibility among all XC4000 Series devices makes it simple to test actual design speeds — just retarget any design for an existing XC4000 Series FPGA to the appropriate XC4000XL-1 device using the Alliance Series or Foundation Series software.

### **Article Summary**

This article describes the achievable performance (maximum clock frequency) in top-ofthe-line FPGAs. It analyzes the performance of seven typical sub-functions and lists the achievable performance levels for the fastest available Xilinx XC4000XL device, compared with the fastest available Altera 10K100 device. All data was derived from the manufacturers' worst-case timing analyzer.

The remainder of this article describes the dramatic performance impact of three different design styles. It shows that you can often double the performance of the FPGA by spending some effort on optimizing the design structure for the specific FPGA architecture.

An expanded version of this article is available on WebLINX (www.xilinx.com), as an application note, under the title "Speed metrics for high performance FPGAs."

| Selected   | <b>Component Frequency M</b>             | easurem    | ents                        | FREQ.      | EXPLANATION                                          | XC4062XL·1 | 10K100-3 |
|------------|------------------------------------------|------------|-----------------------------|------------|------------------------------------------------------|------------|----------|
|            |                                          | Fmux(2)    | 64:32 Mux between registers | 131 MHz    | 105 MHz                                              |            |          |
| FREQ.      | EXPLANATION                              | XC4062XL·1 | 10K100-3                    | Fmux(8)    | 64:8 Mux between registers                           | 80 MHz     | 60 MHz   |
| Fio(int)   | Clocked I/O referenced to internal clock | 196 MHz    | na                          | Fmxu(64)   | 64:1 Mux between registers                           | 56 MHz     | 38 MHz   |
| Fio(ext)   | Clocked I/O referenced to external clock | 74 MHz     | 54 MHz                      | Eagu(4)    | 10 4 hit AND terms hature on registers               | 164 MHz    | 86 MHz   |
| Fio(lut)   | Clocked I/O to CLB regs                  | 31MHz      | 29 MHz                      | Fequ(4)    | 16 x 4 bit AND terms between registers               |            |          |
|            | (referenced to external clock)           |            |                             | Fequ(16)   | 4 x 16 bit AND term between registers                | 81 MHz     | 54 MHz   |
| Fdst(4,4)  | Distance within 4 rows and 4 columns     | 196 MHz    | 156MHz                      | Fequ(64)   | 1 x 64 bit AND term between registers                | 30 MHz     | 17 MHz   |
| Fdst       | Distance across largest chip             | 79 MHz     | 71 MHz                      | Fadd(1,5)  | 5-bit adder between registers                        | 135 MHz    | 148 MHz  |
| (0,128)    | horizontally or vertically               | 10 MILL    | 11 101112                   | Fadd(1,32) | 32-bit adder between registers                       | 73 MHz     | 43 MHz   |
| Fdst       | Distance across largest chip diagonally  | 28MHz      | 28MHz                       | Fadd(4,32) | 4 cascaded 32-bit adders between registers           | 32 MHz     | 21 MHz   |
| (64,128)/2 | and back                                 |            |                             | - (1-2)    |                                                      |            |          |
| Flut(4,2)  | Two cascaded 4 input LUTs                | 130 MHz    | 82 MHz                      | Fmem(16)   | 16 Bit 16 element dual port RAM<br>between registers | 128 MHz    | na       |
| riut(4,2)  | between registers                        | 100 WILLZ  | 02 WIIIZ                    | Fmem(128)  | 16 Bit 128 element dual port RAM                     | 68 MHz     | 25 MHz   |
| Flut(4,4)  | Four cascaded 4LUTs between registers    | 73 MHz     | 49 MHz                      | rmem(120)  | between registers                                    | UO IVIIIZ  | 20 WIIIZ |
| Flut(4,8)  | Eight cascaded 4LUTs between registers   | 36 MHz     | 27 MHz                      | Fmem(1024) | 16 Bit 1024 element dual port RAM                    | 40 MHz     | na       |
|            |                                          |            |                             |            | between registers                                    |            |          |

**••** This combination of speed and density comes with low power and total compatibility with 3.3 volt or 5.0 volt logic."

# **FPGA Component Speeds**

To determine the maximum speed of the components used in FPGA designs, a set of test designs was created. These designs, written in VHDL, measure fundamental aspects of FPGA performance. The following components were entered and tested for frequency:

- ► I/O three configurations of I/O pins and clocks.
- Interconnect registers separated by "N" rows and columns.

- State Machines 1 to 6 levels (3-, 4-, and 5-input look-up tables).
- Multiplexers 64:32, 64:16, 64:8, 64:4, 64:2, and 64:1 mux.
- Constant Comparators ("AND" terms) 4-, 8-, 16-, 32-, and 64-bit AND terms.
- Adders 4-, 8-, 16-, 24-, and 32-bit adders as well as 2- and 4-bit cascaded adders.
- Memory-Dual Port RAMs, 16-bits wide; 16-, 32-, 64-,128-, 256-, 512- and 1024-bits deep.

## **FPGA Design Style Affects Performance**

In general, FPGA designs with a low ratio of registers to look-up tables (LUTs) run at lower clock rates than designs with equal numbers of registers and LUTs. Even higher clock rates can be achieved if additional registers are used to break up interconnect delays. Design styles can be characterized as low, medium, and high frequency based on the register-to-LUT ratio. They might also be called "easy," "medium," and "difficult." It is important to understand that this difference is not affected by the design entry method. It is just as easy to include registers in a VHDL design as in a schematic. In fact, high-level tools can include register re-timing methods which can significantly increase system frequency.

#### "No Problems" Design Style

If you've ever done a low-speed design for an FPGA, you know how convenient it is to ignore logic depth, pipelining, and placement issues. Logic synthesizers will often generate designs in this style because pipelining and logic placement are not automatically handled. The "no problem" design style requires that timing and placement not be an issue; if the design passes functional simulation, then it will route and meet the non-demanding timing.

#### Medium Frequency FPGA Design Style

Most designs intended for FPGAs fall into this design style. Your designs will tend to fall into this category, if you use one-hot state machines, Global Low Skew (GLS) buffered clocks, register all your big data-path components, and practice moderate floor-planning.

### **High Frequency Design Style**

A high clock frequency allows little margin for such things as routing delay or carry propagation. To work at this level, the physical aspect of a design must be considered. It may mean adding registers to cover interconnect delay, or detailed floor-planning.

| Design Style Summaries |                              |                         |                               |                  |                          |         |
|------------------------|------------------------------|-------------------------|-------------------------------|------------------|--------------------------|---------|
| Design<br>Style        | Charac-<br>teristic<br>Freq. | Reg-<br>isters<br>/LUTs | Inter-<br>connect<br>distance | Design<br>Effort | Norm-<br>alized<br>Freq. | Density |
| Low Freq               | Fmin                         | ~0.5                    | Long                          | Lowest           | 0.5                      | Highest |
| Medium Freq            | Ftyp                         | =1.0                    | Medium                        | Medium           | 1.0                      | Medium  |
| High Freq              | Fmax                         | ~2.0                    | Short                         | Highest          | 2.0                      | Lowest  |

### **FPGA System Frequency Definitions**

For maximum frequency designs, the type of functionality available to you is restricted. In fact, the types of components that run at the same maximum frequency can be used to define these design styles in a formal sense.

# High Performance Design

The components can be adders, I/O pins, state machines, or anything else you can build in an FPGA. For proper operation, all the components used must run at the selected system frequency. If available within each design style; the selected components are generally compatible with each other, and a formal definition allows frequency measurements to be taken.

Continued from the previous page

the types of components used in a design are known, you can estimate the speed of a new design without detailed knowledge of the actual design. Alternatively, you can limit the types of components used in a design to insure hitting a target frequency.

The following table defines the types of components that are



| Component         | Parameters Defined                                          | High Freq.                                             | Medium Freq.                                                  | Low Freq.                                                                      |
|-------------------|-------------------------------------------------------------|--------------------------------------------------------|---------------------------------------------------------------|--------------------------------------------------------------------------------|
| State machines    | Number of cascaded 4LUTs                                    | 2 Logic Levels                                         | 4 Logic Levels                                                | 8 Logic Levels                                                                 |
| Multiplexers      | Number of input bits/ Number of output bits                 | 64-bits/32-bits                                        | 64-bits/8-bits                                                | 64-bits/1-bit                                                                  |
| "AND-OR"<br>Terms | Number of Inputs bits/<br>Number of cascaded AND-OR terms   | 4-bit/1 level                                          | 16-bit/1 level                                                | 64-bit/2 levels                                                                |
| Adders            | Number of input bits/<br>Numbers of cascaded adders         | 4-Bit/1 level                                          | 32-Bits/ 1 Level                                              | 32-Bits/4 Levels                                                               |
| Inputs/Outputs    | Type of Input/Type of Output/<br>Timing Reference for Clock | "NoDelay" inputs/<br>"Fast" outputs/<br>internal clock | "Full Delay" inputs/<br>"Fast" outputs/<br>external GLS clock | "Full Delay" input via 4LUT/<br>"Slow" Outputs via 4LUT/<br>external GLS clock |
| Memory            | Number of locations Dual Ported Memory                      | 16-elements                                            | 128-Elements                                                  | 1024 Elements                                                                  |
| Interconnect      | Distance between registers                                  | 4 CLBs                                                 | 64 CLBs                                                       | 128 CLBs                                                                       |

#### **System Frequency Measurements**

The system frequencies for the three associated design styles can now be measured. First the component frequencies required for

| Frequency | Design Style   | XC4062XL-1 | 10K100-3 |
|-----------|----------------|------------|----------|
| Fmax      | High Frequency | 128 MHz    | 82 MHz   |
| Ftyp      | Typical system | 68 MHz     | 43 MHz   |
| Fmin      | Low frequency  | 28 MHz     | 17 MHz   |

a design style are measured, then the system frequency is determined. The system frequency is defined as the minimum speed of all the components necessary for each design style. To illustrate the point that Xilinx XC4000XL-1 devices are the world's fastest high-density FPGAs, these same measurements were made for a competitor's FPGA; the Altera 10K100-3 device is roughly the same size as the Xilinx XC4052XL. ◆