# XC4000XL FPGAs Interface to SDRAMs at 100MHz by BRAD TAYLOR

Xilinx XC4000XL FPGAs can easily interface to modern systems running at 80 MHz. However, some applications require even higher I/O speed. Devices such as SDRAMs, SSRAMs, and GigaBit Ethernet ICs require I/O speeds of up to 133 MHz using 3.3-V TTL signaling. This article describes how the unique I/O system of the XC4000XL FPGA enables you to build a full-speed SDRAM controller.

SDRAMs are becoming the new standard for large memory devices. This trend follows the introduction of EDO DRAM several years ago, which replaced page-mode DRAMs. EDO DRAMs run at 33 MHz (roughly double the speed of typical page-mode DRAMs). The new SDRAMs run at 66-125 MHz, and are now being used for main memory storage in PCs. They are quickly becoming low-cost devices, selling for less than \$3/MB.

#### The Synchronous SDRAM interface

SDRAMs are clocked and fully synchronous, referencing all I/O transactions to the positive clock edge. The timing model is very simple, with all pins behaving the same. For a *write* operation, the data, address, or control information to a 100 MHz SDRAM must be present on the pin 3.0 ns before the clock edge (Tsu) and held valid until 1 ns past that clock edge (THOLD). For a read operation, a read-request is clocked into the device. Three clock periods later, data will appear on the data pins. This data is guaranteed to be valid 7.5 ns after the third clock edge (ToH) and will be held valid for 3.0 ns past that clock edge (T<sub>DV</sub>). Figure 1 shows these timing relationships.

#### "No-delay" Input Modes

All XC4000 FPGAs contain the ability to capture input data using input flip-flops (IFFs). By default, the inputs are configured to include an additional delay that balances

the clock delay. The purpose of this delay is to eliminate the need to hold data valid after a clock edge. However, when interfacing to SDRAMs, this additional delay may be unnecessary because the

SDRAM holds the data valid after a clock transition. Xilinx XC4000 FPGAs have a special input mode known as "no-delay" to support this requirement. The advantage of this no-delay mode is that it significantly reduces the input set-up time, and thus allows much faster operation.

### XC4000XL No-delay Setup and Hold Requirements

The no-delay setup requirement (the time data must be stable before the FPGA clock-pin edge) is less than 1.7 ns for XC4000XL-09 FPGAs.



The no-delay hold requirement (the time data must be held stable after the FPGA clock-pin edge) is no longer than the clock delay from the clock pad to the IOB clock input node. For the XC4020XL-09 (and all smaller devices) the normal clock delay from the global low-skew clock distribution network (BUFGLS) will always be less than 3 ns. Larger FPGAs such as the XC4085XL can have

Figure 1. The XC4000XL family's unique I/O system enables the SDRAM controller to operate at full speed.

EXILINX C40 B 8 81 6470-00 9545 971 **SDRAMs** 

Continued from previous page

Figure 2. Fast

clock delays of less than 3 ns for hundreds of IOBs when you use special I/O clock buffers. In both cases, the clock delay to the IFF (and thus the FPGA input hold requirement) can be kept below the 3 ns value for which the SDRAM is guaranteed to hold data valid after a clock edge.



### XC4000XL High-speed I/O Clock Distribution Features

XC4000XL FPGAs contain special internal clock buffers known as global early buffers (BUFGEs). These buffers can distribute an early clock to I/O pins. For even the largest XC4000XL FPGAs, the BUFGEs can be used to distrubute a clock to up to 64 IOBs in less than 2.5 ns (-09 speed grade). There are BUFGE clock buffers in each corner of the FPGA, and each of these buffers can distribute a clock to the IOBs in the quadrant it occupies.

### Figure 3. Early Clock



More than one BUFGE may be used in parallel if it is necessary to distribute an early I/O clock to the IOBs in more than one quadrant.

## Fast Capture Latches Enable the Use of Early I/O Clock

The data that has been captured by the early clock (BUFGE) must be transferred to the logic inside the FPGA, which is clocked by the slower global low-skew clock (BUFGLS). XC4000XL IOBs contain a special fast capture latch (FCL) option which can hold data until it is transferred to the normal IFF in the IOB, which is clocked by the slower global clock. Once the IOBs are configured in this special early capture mode, the operation is transparent.

As a result, data is captured with minimal set-up time and a limited hold time with respect to the clock pin. Yet, it is available at the IFF output, synchronous with the global low-skew clock (BUFGLS) used for all internal logic. *See Figure 2.* 

# Early Clocks get the Data out of the FPGA On-time

In addition to reducing input hold time, the Early I/O clock buffers also speed up FPGA output times. The FPGA's clock-pad-tooutput-valid pin-to-pin delay (T<sub>co</sub>) is the sum of the clock delay to the IOB clock node plus the clock-to-pad delay. If the I/O clock delay is less than 2.5 ns, T<sub>co</sub> will be less than 6 ns for XC4000XL-09 FPGAs. This is a respectable performance for an FPGA and is equivalent to that of the fastest TTL devices. Inside the FPGA, data must be transferred from GLSclocked registers to the BUFGE-clocked IOB output registers. *See Figure 3.* 

A race condition between the data and the early I/O clock could exist if the BUFGLS delay were significantly less than the BUFGE delay. The XC4000XL architecture prevents this possibility as long as the timing tools indicate that the BUFGE delay is less than the BUFGLS delay.

### Time Constraints Ensure Correct Data Transfer

Because Xilinx software supports static timing constraints, it is easy to ensure the proper transfer of data from the BUFGLSclocked registers to the BUFGE-clocked registers. You can do this by reducing the timing constraint for the selected paths by the GLS clock delay.

### Clock Loading Adjustment can Reduce T<sub>co</sub> to 5.5 ns

*The 1998 Xilinx Data Book* contains a "capacitive load factor" table (page 4-77), that lists output delays in the presence of high capacitive loads such as those presented by SDRAM modules (up to 100 pF). For example a 100 pF load increases the output delay by 1.8 ns. Conversely, a 35 pF load reduces the clock-to-output delay by 0.5 ns. With a clock delay of 2.5 ns, the output delay is a maximum of 5.5 ns.

### Output Hold Time Required by High Speed Devices

FPGA and PLD vendors have traditionally not published minimum output hold times. Because 0.5 ns to 1 ns of input data hold is required by many high-performance devices, Xilinx will soon publish a minimum output hold time. This parameter (T<sub>OH</sub>) is expected to be in the range of 1 ns for BUFGE-clocked outputs and 2 ns for BUFGLS-clocked outputs.

### Board Delay, Clock Skew, and Clock Jitter

In addition to the on-chip delays, there is always a certain delay between the devices on the pc-board (typically about 150 ps/inch). Many systems rely on reflective switching for data to reach its final value. This requires a round trip and increases the value to 300 ps/ inch. Fixed clock skew and random clock jitter between devices must also be taken into account to ensure valid data transfer between devices.

#### Putting it all Together

By taking advantage of these high-performance I/O features, even the largest XC4000XL FPGAs can meet aggressive time specifications. These parameters are "pin-pin" specifications in that they are referenced to the clock and I/O pins only.

"The I/O

performance that was

formally obtainable only

with custom devices

or high-performance

ASICs is now available to

Tsu, Thold, Tco, and Toh are defined in **Table 1**.

### FPGA to SDRAM Transfer at 100MHz

Tco(fpga) + Tsu(sdram) = 6.0 ns + 3.0 ns = 9.0 ns; (This allows 1.0 ns slack for board delay and clock jitter.) TDV(fpga) <sup>3</sup> THOLD(sdram) = 1.0 ns <sup>3</sup> 1.0 ns; (Requires the board delay to compensate for clock jitter.)

### Check the SDRAM to FPGA Transfer at 100MHz

Tco(sdram) + Tsu(fpga) = 7.5 ns + 1.7 ns = 9.2 ns; (This allows 0.8 ns slack for board delay and clock jitter.) Тон(sdram) <sup>3</sup> Тноцр(fpga) = 3.0 ns <sup>3</sup> 2.5 ns;

(This allows 0.5 ns slack.)

### Table 1: Pin-to-Pin I/O Parameters

| Device                                                                              | Тон    | Tco (35pf) | Tsu    | Thold  |
|-------------------------------------------------------------------------------------|--------|------------|--------|--------|
| SDRAM (10 ns)                                                                       | 3.0 ns | 7.5 ns     | 3.0 ns | 1.0 ns |
| XC4085XL-09*                                                                        | 1.0 ns | 5.5 ns     | 1.7 ns | 2.5 ns |
| * The FPGA specifications assume that the XC4000XL FPGA uses up to four early clock |        |            |        |        |

buffers for I/O with up to 64 IOB clock loads each, is configured with FCL No-delay inputs and fast outputs, and has its outputs loaded with 50pF each.

### **Beyond 100MHZ**

Other features, such as PLLs, or known clock skew between the various clocks, can be used to further increase I/O performance. For smaller FPGAs, such as the newly introduced XC4002XL, 133 MHz operation is easily achieved. The I/O parameters obtained by using these techniques demonstrate that FPGAs are compatible with high-speed devices such as SDRAMs. The I/O performance that was formally obtainable only with custom devices or high-performance ASICs is now available to Xilinx FPGA users. ◆