

XAPP262 (v2.4) December 13, 2002

# Synthesizable QDR SRAM Controller

Author: Olivier Despaux

# Summary

Quad Data Rate (QDR<sup>™</sup>) Synchronous Static RAM (SRAM) is one of the highest bandwidth solutions available for networking and telecommunications applications. This low-cost, high-performance solution is ideal for applications requiring memory buffering, traffic management, look-up tables, or link lists. This application note describes an implementation of a QDR SRAM controller, enabling up to 400 Mb/s (DDR400) simultaneous read and write speed in a Virtex<sup>™</sup>-II -5 speed grade device using a source-synchronous solution.

# Introduction

Micron Technology, Cypress, Hitachi, NEC, Samsung, and IDT jointly created the QDR specification in response to an overwhelming demand for higher bandwidth SRAMs. The main feature of the QDR SRAM is that the data inputs and outputs are separate but operating simultaneously. In this application note, Xilinx proposes an interface to this SRAM family.

Because each data bus operates on two words of data per clock cycle, the data rate for each bus doubles the standard data rate. With both buses operating in parallel, the device operates on four bus-widths of data per clock cycle, hence, the Quad Data Rate name. The minimum set is data operating on two words, i.e., two times the device bus width.

# QDR SRAM Review

# **Basics**

This section provides a general overview of QDR SRAM technology. It also includes design recommendations. Designers comfortable interfacing this type of memory with FPGAs can go directly to the Controller Design section. QDR SRAMs are designed for the networking market. Table 1 summarizes QDR SRAM specifications. Consult the memory manufacturers data sheet to check the memory device characteristics.

| Parameter              | Description                                              |
|------------------------|----------------------------------------------------------|
| Burst Mode (DDR Mode)  | 2-word burst devices                                     |
|                        | 4-word burst devices                                     |
| I/O Terminations       | HSTL class I, 1.5V or 1.8V                               |
| Data Buses             | Separate and non-concurrent Write (D) and Read (Q) buses |
| Device Density         | 9 Mb, 18 Mb, 36Mb                                        |
| Internal Pipeline      | Two stage pipeline (low initial latency)                 |
| Frequency 2-word Burst | 100 - 167 MHz (Double Address Rate (DAR) address bus)    |
| Frequency 4-word Burst | 100 - 200 MHz (Single Address Rate (SAR) address bus)    |
| Core Voltage           | 2.5V                                                     |
| Data Clocks C, C       | Output data from the SRAM are synchronized with respect  |
|                        | to C and $\overline{C}$ for source-synchronous systems.  |
| QDR SRAM Suppliers     | Cypress, Hitachi, IDT, Micron, NEC, Samsung              |

| Table | 1: | QDR | SRAM    | Specifi | cation | Summary |   |
|-------|----|-----|---------|---------|--------|---------|---|
| iubio |    |     | 0.17.11 | opeen   | oution | Gainnar | , |

<sup>© 2002</sup> Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and further disclaimers are as listed at http://www.xilinx.com/legal.htm. All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.

NOTICE OF DISCLAIMER: Xilinx is providing this design, code, or information "as is." By providing the design, code, or information as one possible implementation of this feature, application, or standard, Xilinx makes no representation that this implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of the implementation, including but not limited to any warranties or representations that this implementation is free from any implied warranties of merchantability or fitness for a particular purpose.

QDR SRAMs were specifically created for applications having a nearly equal ratio of read and write cycles occurring at almost the same time. Conventional DDR SRAMs are most efficient in applications employing data streaming, or where the read/write ratio is greater than three. The choice of devices with either a 2-word burst or a 4-word burst depends on the address rate and the data write placement.

#### **Address Rate**

The 2-word burst QDR SRAM can indefinitely sustain both a 2-word read and a 2-word write each clock cycle. Internally, the first half-clock cycle is used to execute the read function, and the second half-clock cycle is used to execute the write function. The address bus is shared for the read and write data ports, so a DAR operation is necessary. The rising edge of a master-clock signal "K" is used to register the read address. The falling edge of this clock signal "K" is used to register the write address.

This application note revision targets 2-word burst QDR SRAM devices able to handle a 167 MHz clock in Virtex-II FPGAs (-5 speed grade). The density of the memory device used is nine.

QDR SRAMs have read and write data buses, both operating in DDR mode. On the write bus, the clock needs to be center aligned with the data. This physical placement is advantageous for the memory device.

#### Write Data Placement

Although a minor consideration in terms of system performance, it is necessary to determine the placement of write data in the design. The address rate is SAR in 4-word burst devices. Four-word burst QDR SRAMs read the write addresses on the rising edge of K. Two-word burst QDR SRAMs read the write addresses on the falling edge of K. The data to store is read one clock period later than the write command in 4-word burst QDR SRAMs. Conversely, 2-word burst QDR SRAMs do not have latency (Figure 1).



Figure 1: QDR 2-Word versus 4-Word Burst Timing Diagram

## **Internal Architecture**

The key objective of QDR architecture is to clearly distinguish read and write ports. The QDR architecture is designed to offer the best performance on alternate read and write cycles. Note that if an address read and write occurs in the same clock period, then the data obtained at the memory read port output will be the data written during this same clock period. See Figure 2.



Figure 2: QDR SRAM Internal Architecture Overview

## **Read And Write Operations**

One of the best advantages of SRAM devices over SDRAM devices is the ease of interfacing them in terms of control signals. For example, there is no need for sending a refresh command to the memory device.

### Address Bus

In 2-word burst devices, the read address location is considered on the rising edge of K clock and the write address location on the falling edge of K clock. The address bus runs in DDR mode. Four-word burst devices run in SDR mode on the address bus. This type of access explains the speed capability difference between 2-word burst and 4-word burst devices.

### **Data Clock Relationship**

Write: Data must be center aligned in regard to K clock when sending the data.

Read: Data are sent with a guaranteed interval delay in respect to C clock.

Figure 3 shows read and write timing diagrams. Table 2 shows the QDR timing parameters.



x262\_03\_020602

## Figure 3: Read and Write Timing Diagrams

Table 2: QDR Timing Parameters

| Parameter                                                                                        | Symbol            |  |  |
|--------------------------------------------------------------------------------------------------|-------------------|--|--|
| Clock                                                                                            |                   |  |  |
| Clock Cycle Time (K, $\overline{K}$ , C, $\overline{C}$ )                                        | t <sub>кнкн</sub> |  |  |
| Clock High time (K, $\overline{K}$ , C, $\overline{C}$ )                                         | t <sub>KHKL</sub> |  |  |
| Clock Low time (K, $\overline{K}$ , C, $\overline{C}$ )                                          | t <sub>KLKH</sub> |  |  |
| Clock to clock (K $\uparrow \rightarrow K\uparrow$ , C $\uparrow \rightarrow C\uparrow$ )        | t <sub>KHKH</sub> |  |  |
| Clock to data clock (K $\uparrow \rightarrow C \uparrow$ , K $\uparrow \rightarrow C \uparrow$ ) | t <sub>KHCH</sub> |  |  |
| Output Times                                                                                     |                   |  |  |
| C, $\overline{C}$ High to Output Valid                                                           | t <sub>CHQV</sub> |  |  |
| C, $\overline{C}$ High to Output Hold                                                            | t <sub>CHQX</sub> |  |  |
| C High to Output High-Z                                                                          | t <sub>CHQZ</sub> |  |  |
| C High to Output Low-Z t <sub>CHQX1</sub>                                                        |                   |  |  |
| Setup Times                                                                                      |                   |  |  |
| Address valid to K rising edge                                                                   | t <sub>AVKH</sub> |  |  |

| Parameter                                              | Symbol            |  |  |  |
|--------------------------------------------------------|-------------------|--|--|--|
| Control inputs valid to K rising edge                  | t <sub>IVKH</sub> |  |  |  |
| Data-in valid to K, K rising edge t <sub>DVKH</sub>    |                   |  |  |  |
| Hold Times                                             |                   |  |  |  |
| K rising edge to address hold                          | t <sub>KHAX</sub> |  |  |  |
| K rising edge to control inputs hold t <sub>KHIX</sub> |                   |  |  |  |
| K, $\overline{K}$ rising edge to data-in hold          | t <sub>KHDX</sub> |  |  |  |

#### Table 2: QDR Timing Parameters (Continued)

## **Special Features**

The QDR SRAM incorporates a serial boundary scan test access port (TAP). This feature includes a TAP controller, an instruction register, a boundary scan register, a bypass register and an ID register. The JEDEC Test Access Port (JTAG) is operating at 2.5V. In order to disable this feature, the TCK pin must be tied Low ( $V_{SS}$ ).

For more information about JTAG functionality and capabilities for QDR SRAMs, refer to the memory vendor's data sheet.

## **Device Pins**

Table 3 is a brief summary of the pins of QDR devices. Please refer to the memory vendor data sheet for more details.

| Symbol                    | Number of Pins (x18) | Туре   | Name                                       |
|---------------------------|----------------------|--------|--------------------------------------------|
| SA                        | 18                   | Input  | Synchronous Address Inputs                 |
| R                         | 01                   | Input  | Synchronous Read                           |
| W                         | 01                   | Input  | Synchronous Write                          |
| BW0                       | 01                   | Input  | Synchronous Byte Writes D0.D7              |
| BW1                       | 01                   | Input  | Synchronous Byte Writes D8.D15             |
| K and $\overline{K}$ pair | 02                   | Input  | Input Clock                                |
| C and $\overline{C}$ pair | 02                   | Input  | Output Clock                               |
| TMS                       | 01                   | Input  | JTAG Test Input 1                          |
| TDI                       | 01                   | Input  | JTAG Test Input 2                          |
| ТСК                       | 01                   | Input  | JTAG Function Enable                       |
| V <sub>REF</sub>          | 02                   | Input  | HSTL Input Reference Voltage               |
| ZQ                        | 01                   | Input  | Output Impedance Matching Input            |
| D                         | 18                   | Input  | Synchronous Data Inputs                    |
| DNU                       | 02                   | Output | Do Not Use                                 |
| Q                         | 18                   | Output | Synchronous Data Output                    |
| TDO                       | 01                   | Output | IEEE 1149.1 Test Output                    |
| V <sub>DD</sub>           | 10                   | Supply | Nominal Power Supply (2.5V) <sup>(1)</sup> |

Table 3: QDR SRAM Pin Summary

| Symbol            | ymbol Number of Pins (x18) T |                   | Name                                                |
|-------------------|------------------------------|-------------------|-----------------------------------------------------|
| V <sub>DDQ</sub>  | 16                           | Supply            | Isolated Output Buffer Supply (1.8V) <sup>(1)</sup> |
| V <sub>SS</sub>   | 25                           | Supply            | Ground Power Supply (0V) <sup>(1)</sup>             |
| NC <sup>(1)</sup> | 41                           | NC <sup>(1)</sup> | No Connection                                       |

#### Table 3: QDR SRAM Pin Summary (Continued)

#### Notes:

 To improve noise immunity and power dissipation, no connect (NC) pins should be connected to ground (GND) on the printed circuit board (PCB). A clean board design is necessary. Use proper capacitors for power supply decoupling, and nominal voltages for supply, voltage references, and logic levels. Xilinx recommends using 1.8V for the HSTL logic levels for better switching characteristics using Virtex-II devices, and 1.5V using Virtex-II Pro devices.

# QDR SRAM Interface Review

## Interface Specification and Overview

This section presents an implementation of a QDR SRAM controller with an Virtex-II FPGA. The controller has a user interface and a QDR SRAM interface. The design may need to be modified depending on the requirements or performance expectations of the designer.

The specifications of this design are the following:

- The 2-word burst design targets a 167-MHz clock in a Virtex-II -5 speed grade device.
- The design is source synchronous and requires the use of K clocks on the transmitter side and C clock on the receiver side.
- The controller has been developed using Micron models and has been successfully tested with Cypress HDL models for 2-word burst devices.
- The controller has an asynchronous Reset capability.
- The design requires two DCMs.
- The estimated dissipated power for one 9 Mb x 18 device in HSTL Class I, 1.8V DCI buffers is 600 mW.

Table 4 presents a snapshot of this implementation's controller performances, synthesis, and place and route results:

#### Table 4: Implementation Performance

|                                                            | Speed Grade |     |     |
|------------------------------------------------------------|-------------|-----|-----|
| Virtex-II Device XC2V1000 - FG896                          | -4          | -5  | -6  |
| Clock minimum functioning frequency (MHz)                  | 150         | 167 | 167 |
| Timing budget margin at this frequency (ps) <sup>(1)</sup> | 400         | 580 | 580 |
| Number of Slices                                           | 150         | 150 | 150 |
| Number of BUFGs                                            | 3           | 3   | 3   |
| Number of Digital Clock Managers (DCMs)                    | 2           | 2   | 2   |
| Number of I/Os                                             | 63          | 63  | 63  |

#### Notes:

1. Using the source-synchronous timing numbers.

Figure 4 is a high-level block diagram of the QDR SRAM controller. The entity *QDR\_ctrl* is the top-level QDR controller block containing the address, data send, and receive modules. *USER\_gui* passes the signals to the main controller either directly or using a pipeline stage as needed by the designer.



Figure 4: Top-Level Architecture Block Diagram

# DDR I/Os

## **Implementation Details**

Virtex-II SelectI/O<sup>™</sup> inputs and outputs hardware support for DDR operations on both the transmit and receive side. Figure 5 shows an implementation example of the architecture. All inputs and outputs to the memory interface can be registered within the IOB to minimize clock-to-out delays.





# Clocking scheme

As in most synchronous designs, all control and data signals are sent from the FPGA to the memory device. Paying particular attention to the clock forwarding scheme avoids having to take into account unnecessary scrutiny of the clock-to-out timing parameters in the FPGA. Skew between clock, data, and address signals is negligible if the clock is forwarded to the memory device using the Dual Data Rate flip-flops (FDDR primitives) inside the FPGA I/Os. Figure 6 shows the actual clock path.



Figure 6: Clock Forwarding Scheme

The timing parameters, including the clock-tree skew involved in the timing budget analysis, are figured in the timing section. Duty cycle distortion depends on the clock frequency of the interface. Simulations show the duty-cycle distortion slightly exceeds the tolerance range of 5% at a starting frequency of 167 MHz and especially at higher frequencies. Note: A worst case of duty cycle distortion of 6% to 8% should be considered at 200 MHz. IBIS or HSPICE simulations give more accurate values. To limit board skew, the trace drawing and length must match (as much as possible) for all signals of the same length.

## Using Several Devices on the Same Bus

Careful PCB layout is necessary to reduce the skew between clocks and data buses. For example, the trace length between the write data bus and the K clock should be the same.

Figure 7 is an example of two devices connected. Having several loads on the same bus will decrease the signal integrity characteristics of the interface.



Figure 7: Example of Connection Between one FPGA and two QDR SRAMs.

## Data path

## **Transmit Side**

At the memory device I/Os, data should be center aligned with the clock. Figure 8 presents the basic waveforms used to write data to the memory.



When accessing a memory device, the setup and hold time needs to be considered. Figure 9 presents an example of data path implementation for a write operation.



Figure 9: Write Data Path

## **Receive Side**

Since the system is source synchronous, the data recapture stage uses the deskew clock coming from the memory device. The DCM is configured to produce a phase offset of the input clock to include capture data at the same time. A detailed analysis of the phase offset value is presented in the timing section. Figure 10 shows the basic waveforms used to read data to the memory.



Figure 10: Read Waveforms

Figure 11 presents an example of read data path and Figure 12 presents the clock resources used to achieve the data capture.



Figure 11: Read Data Path

Figure 12 illustrates the data capture clock generation. The defined phase shift values must either be specified in the user constraints file (as in the present reference design) or in the source files using the appropriate syntax for the synthesis tool.



Figure 12: Data Capture Clock Generation

Figure 13 shows the different locations of the read waveforms in Figure 14. The clock domain change has not been detailed on Figure 13. Figure 14 also includes the clock domain change from the deskew clock to the design clock. The closer the characteristics of the length traces are between all traces, the better skew and jitter analysis. All the FDs are not represented on Figure 13.



Figure 13: System Diagram



XAPP262 (v2.4) December 13, 2002

The parameter t<sub>PSDCM\_HSTL\_I</sub> represents the setup time, pin-to-pin number of Virtex-II devices in HSTL Class I. The clock inside the FPGA is ahead with respect to the same clock outside the device. The waveforms are drawn using typical and average values; jitter and skew characteristics are not shown. The skew information is available on either the memory vendor's data sheet, in the Virtex-II data sheet, and in the **Timing Budget** section.

## **Address Path**

The address path module depends essentially on the burst mode of the memory device. The 2word burst mode us used in this design, and runs in DDR. Figure 15 presents the signals on the address bus.



Figure 15: 2-Word Burst Address Signals

The reference design is implemented using a 9 Mb device, 512k x18. The bus width can be increased by replacing the bus width value as needed. Although the controller looks like a 36-bit SDR device from the user point of view, the external data exchange rate is DDR with an 18-bit bus width between the controller and the memory. The number of traces on the PCB is reduced by two compared with using separate SDR bus I/O devices.

## **Timing Considerations**

### **Clocking Scheme and Clock Forwarding**

By forwarding the clock to the memory, the designer can disregard the FPGAs clock-to-out timing parameter. This considerably improves the speed capability of the controller. Using a source-synchronous design on the receive side enlarges the data valid window. Figure 16 shows the clocking scheme.



Figure 16: Clocking Scheme

- Path A: The clock signals are sent perfectly aligned along with the data by using the clockforwarding scheme.
- Path B: This trace connects path A and path C directly. It should be as short as possible.
- Path C: This trace is connected on both sides.
  - On the memory side, it is connected to the C and C inputs. In this case, the memory device will send in the data with respect to this clock.
  - On the FPGA input side, the "C" deskew clock signal feeds a DCM for the clocking operation on the receiver.
  - Memory vendors recommend making route C and C inputs close together on the PCB to avoid noise issues. If C needs to be routed to the FPGA, make sure the tool routes it to a general I/O user pin for the correct termination.

One other alternative is to keep the path A but to drive the original K pair signals on the path C (C pair) or a phase shifted version of them. This version is not developed in this application note.

#### **Transmit and Receive Data**

As the memory vendor guarantees a specific relationship between "C" and the data placement on the Read bus, this deskew clock signal is used for capturing data inside the FPGA. A clock phase shift is applied to center the deskew clock and the valid data window.

#### Initialization

The QDR SRAM memory behaves like other SRAMs. Special devices have an Echo Clock feature. Those devices have an on-chip embedded Delay-Locked Loop (DLL) and a delay at power up is necessary for the clock circuits to lock and for all levels to stabilize. The reference design uses a conventional QDR SRAM device without Echo Clocks. All clock signals are used to achieve the source-synchronous design. On the FPGA side, after the power-up process, it is necessary to wait for the reset and for the DCMs to lock. For both the memory device and Virtex-II device, this process should take less than 0.1 second.

## **Timing Budget**

### Transmit Side

To write to the memory, the data is sent on the Write bus in DDR mode. The data is center aligned with the K pair. The K pair is forwarded to the memory. The clock tree and package skew parameters of the FPGA will slightly reduce the width of the valid data window. Table 5 presents the different timing analysis parameters.

| XC2V1000 - 5 I          | Worst Case Value (ps) |                    |
|-------------------------|-----------------------|--------------------|
| Channel-to-channel Skew |                       |                    |
| Clock Tree              | T <sub>CKSKEW</sub>   | 100 <sup>(1)</sup> |
| Package                 | T <sub>PKGSKEW</sub>  | 112                |
| Duty Cycle Distortion   | T <sub>DCD_CLK0</sub> | 140                |
| Jitter                  |                       | 200                |
| Clkout_Phase_Offset (2) | 140                   |                    |
| Timing Budget Total     |                       | 692                |

#### Table 5: QDR x36 Transmit Side Timing Budget at a 166.67 MHz Clock Frequency

#### Notes:

- 1. Clock tree skew for the entire device. As the memory interface will be contained in a restricted area of the FPGA die, T<sub>CKSKEW</sub> will be significantly smaller.
- 2. Only two outputs DCM outputs are used for this design. In the worst case, this configuration can generate 140 ps of phase offset to the timing budget.

Table 6 is the worst case timing budget margin. As shown, there is a comfortable margin in the transmit side of the FPGA.

#### Table 6: Worst Case Timing Budget Margin for Data Transmit

|                      | 200 MHz QDR SRAM                                                       | 167 MHz QDR SRAM                                                       |
|----------------------|------------------------------------------------------------------------|------------------------------------------------------------------------|
| Valid Data Window    | t <sub>VDW</sub> = 2.50 – 0.70 = 1.80 ns                               | $t_{VDW} = 3.00 - 0.70 = 2.30 \text{ ns}$                              |
| Memory Sample Window | $t_{MSW} = t_{DVKH_{200}} + t_{KHDX_{200}}$<br>= 0.60 + 0.60 = 1.20 ns | $t_{MSW} = t_{DVKH_{167}} + t_{KHDX_{167}}$<br>= 0.70 + 0.70 = 1.40 ns |
| Timing Budget Margin | $t_{VDW} - t_{MSW} = 1.80 - 1.20 = 0.60 \text{ ns}$                    | $t_{VDW} - t_{MSW} = 2.30 - 1.40 = 0.90 \text{ ns}$                    |
| Notes:               |                                                                        |                                                                        |

 $t_{T/2}$  = half clock period,

 $t_{DVKH_{200}} = 0.6 \text{ ns}$  (Samsung),  $t_{DVKH_{167}} = 0.7 \text{ ns}$  (Micron),  $t_{KHDX_{200}} = 0.6 \text{ ns}$  (Samsung),  $t_{KHDX_{167}} = 0.7 \text{ ns}$  (Micron)

#### **Receiver Side**

There are two options when determining the receiver side timing budget.

- Use timing numbers listed in the Virtex-II or Virtex-II Pro data sheet. These numbers are very conservative, and make a simplistic timing budget. However, a small timing margin budget in particular conditions will not meet the timing.
- Use the timing numbers listed in the source-synchronous data sheet. These numbers are optimized and very accurate. The calculations become more difficult to evaluate the timing budget margin when more details are included. Using the source-synchronous data sheet numbers can help to meet the timing.

Table 7 is a summary of the pin-to-pin setup and hold time parameters for conventional and source-synchronous data sheet. Continue to consult the source-synchronous data sheet for updates as the data in this table is subject to change.

| Table | 7: Sour | e-Synchronou | s Summary |
|-------|---------|--------------|-----------|
|-------|---------|--------------|-----------|

|                                     | -6                 |       | -5                 |       | -4                 |       |       |
|-------------------------------------|--------------------|-------|--------------------|-------|--------------------|-------|-------|
| Virtex-II Speed Grade               | Setup              | Hold  | Setup              | Hold  | Setup              | Hold  | Units |
| Synchronous for HSTL Class I        | 1.98               | -1.28 | 2.02               | -1.32 | 2.32               | -1.24 | ns    |
| Source-Synchronous                  | TBD                | TBD   | TBD                | TBD   | TBD                | TBD   | ns    |
| Sample Window Synchronous           | 700                | 0.00  | 700                | .00   | 1080               | .00   | ps    |
| Sample Window<br>Source-Synchronous | TBD <sup>(1)</sup> |       | TBD <sup>(1)</sup> |       | TBD <sup>(1)</sup> |       | ps    |
| Notes:                              |                    |       |                    |       |                    |       |       |

#### a. Using Conventional Data Sheet Numbers

The Virtex-II data sheet shows a sample window of 700 ps. This parameter indicates the total sampling error of the Virtex-II DDR input registers across voltage, temperature, technology migration, package skew, or clock tree skew.

Internal clock phase shift analysis:

- QDR SRAM memory vendors certify (worst case) the availability of data t<sub>CHQV</sub> after a C edge clock.
- The data bus is considered busy for a delay of at least t<sub>CHQX</sub> on the same edge.
- From the previous two points, data must start to be valid on the data bus some time between t<sub>CHQV</sub> and t<sub>CHQX</sub>. The average value is t<sub>CHQVavg</sub> = (t<sub>CHQV</sub> - t<sub>CHQX</sub>)/2.
- To align C clock and data at the FPGA inputs, C is delayed by t<sub>CHQX</sub> + t<sub>CHQVavg</sub>.
- The data is captured in the middle of the valid data window. A 90° phase shift  $t_{90PS}$  on the C clock sets the data in the middle of the theoretical valid data window. The delay is now  $t_{CHQX} + t_{CHQVavg} + t_{90PS}$ .
- Center the data capture instant right in the middle of the sample window. To capture data inside the FPGA, C is delayed by:

 $t_{\text{SAMPLE}} = ||t_{\text{SETUP}}| - |t_{\text{HOLD}}||$ 

The total delay for the C phase shift inside the FPGA is:

<sup>t</sup>PHASESHIFT = <sup>t</sup>CHQX + <sup>t</sup>CHQVav + <sup>t</sup>90PS + <sup>t</sup>PSDCM<sub>HSTL</sub>

The added delay necessary on the receive side is shown in Table 8.

 Table 8: Phase Shift Value for C Clock

| Device      | XC2V40 to XC2V8000 |      |                     |  |  |
|-------------|--------------------|------|---------------------|--|--|
| Speed Grade | -6 -5 -4           |      |                     |  |  |
| 100 MHz     | 6.58               | 6.62 | 6.92                |  |  |
| 133 MHz     | 5.96               | 6.00 | 6.3                 |  |  |
| 167 MHz     | 5.33               | 5.37 | 5.67                |  |  |
| 200 MHz     | 4.83               | 4.87 | 5.17 <sup>(1)</sup> |  |  |

Notes:

1. Due to the I/O switching capabilities of these high-speed devices, do not use Virtex-II -4 devices for 200 MHz clock interfaces with the first generation of QDR SRAM devices.

b. Using the source-synchronous timing numbers:

With a source-synchronous data sheet, the width of the sample window is reduced by shifting the setup and hold time values. The Virtex-II source-synchronous data sheet shows a sample window of  $t_{SAMP} = 500$  ps. This parameter indicates the total sampling error of the Virtex-II DDR input registers across voltage, temperature, or technology migration. The test measurement methodology uses the DCM to capture the DDR input registers' edges of operation.

These measurements include:

- CLK0 and CLK180 DCM jitter
- Worst-case duty-cycle distortion
- DCM accuracy (phase offset)
- DCM phase shift resolution

The measurements do not include package or clock tree skew. Table 9 presents an analysis using two different packages for an interface for up to 36 data bus width devices. The speed grade of the FPGA has an insignificant impact on these numbers.

| XC2V1000 FF896 -5       |                      | Worst Case Value (ps) | Typical Value (ps) |
|-------------------------|----------------------|-----------------------|--------------------|
| Sample Window           | T <sub>SAMP</sub>    | 500                   | 500                |
| Channel-to-channel Skew |                      |                       |                    |
| Clock Tree              | T <sub>CKSKEW</sub>  | 100 <sup>(1)</sup>    | 25 <sup>(2)</sup>  |
| Package                 | T <sub>PKGSKEW</sub> | 112                   | 112                |
| Timing Budget Total     | -                    | 712                   | 637                |

Table 9: QDR x36 Receive Side Timing Budget at a 166.67 MHz Clock Frequency

#### Notes:

- 1. Clock tree skew for the entire device.
- 2. Clock tree skew for the specific components chosen for this particular interface, using placement constraints.

The memory vendor specifies the valid data window.

 $t_{VDW167} = t_{T/2} - t_{CHQZ_{167}} + t_{CHQX_{167}}$ 

Table 10 presents the timing budget for the receive side of the interface.

#### Table 10: Worst Case Timing Budget Margin for Data Receive

|                      | 200 MHz QDR SRAM                         | 167 MHz QDR SRAM                         |
|----------------------|------------------------------------------|------------------------------------------|
| Valid Data Window    | 2.50 – 2.20 + 1.00 = 1.30 ns             | 3.00 – 2.50 + 1.20 = 1.70 ns             |
| FPGA Sample Window   | $t_{SW} = 2.50 - 0.72 = 1.78 \text{ ns}$ | $t_{SW} = 3.00 - 0.72 = 2.28 \text{ ns}$ |
| Timing Budget Margin | 1.30 – 0.72 = 0.58 ns                    | 1.70 – 0.72 = 0.98 ns                    |
|                      |                                          |                                          |

Notes:

 $t_{T/2}$  = half clock period,

 $t_{CHQZ_{200}} = 2.2 \text{ ns}$  (Samsung),  $t_{CHQZ_{167}} = 2.5 \text{ ns}$  (Micron),  $t_{CHQX_{200}} = 1.0 \text{ ns}$  (Samsung),  $t_{CHQX_{167}} = 1.2 \text{ ns}$  (Micron)

It is still possible to use this interface with a 200 MHz clock for 4-word burst high-speed devices. In this case, using a Virtex-II -5 speed grade device or faster will ensure sufficient HSTL I/Os switching performance.

Valid data window for data sent by the memory device:

 $t_{VDW200} = t_{T/2} - t_{CHQZ_{200}} + t_{CHQX_{200}} = 1.30 \text{ ns.}$ 

There is still a positive margin of 0.66 ns.

Since the valid data window is enlarged in QDR II SRAM devices, there is more margin when interfacing to that generation of memory devices.

- 1. System level timing analysis
  - The system level timing analysis completes the data capture analysis while including the parameters involved in the system design. These parameters depend essentially on the PCB design, the board level skew and jitter. The analysis must ensure the PCB timing requirements are smaller than the available timing margin.
- 2. Internal clock phase shift analysis:

The method applied in the previous section requires known values for setup and hold time for Virtex-II source-synchronous designs. If the values are not available, then a bench calibration is needed to set the value of the clock phase shift on the receive side. The designer can set the starting point of the phase shift using the values in Table 8, page 17.

### **Clock Domain Change**

The first step in the receive side is capturing the data. Data capture is done with respect to C clock and the controller has the internal K clock as a reference. It is necessary to realign the received data to K clock or the main clock for the design.

The current controller uses D flip-flops to make the clock domain change. This is one solution among possible other alternatives: FIFO lists using D flip-flops, SRL16, or block RAMs. Using CoreGen, included in the Xilinx ISE software, facilities the implementation of the FIFO. Although appropriate for particular applications, these alternate solutions are not developed in this reference design. They require an additional study of the speed characteristics of internal components used during implementation.

The trace board delay timing,  $t_{BD}$  parameter is decisive in this analysis. The relationship between K and internal C clock signals is  $t_{CDC} = t_{IOCKP / HSTL} + t_{PHASESHIFT} + 2 \times t_{BD}$ , modulo the clock period. The board delay for all transmit and receive signals is described as 2 x  $t_{BD}$ .

For example, in the current design,  $t_{CDC}$  is roughly equal to (1.6 + (2 x  $t_{BD}$ )) ns for a period of 6 ns:

 $t_{CDC} = (2.99 + 0.21) + (1.20 + 0.68 + 1.50 + 1.00) + 2 \times t_{BD}$ 

Data is caught on the 90° phase shift with respect to K for D0, and 270° phase shift for D1. Both data registers are aligned in SDR mode (acting like a pipeline) due to the second level of D flip-flops. Since this value will vary, it becomes necessary to rearrange the clocks of this stage to complete the clock domain change properly. As the designer has a precise idea of the value of  $t_{PHASESHIFT}$ , the unknown parameter is  $t_{BD}$ . Bench calibration is one way to determine this value.

## **Timing Analysis Summary**

The worst case numbers, used in this example, can be substantially different than the best case. The designer should readjust these values and the clocking scheme (especially in the **Clock Domain Change**). Two levels of performance can be achieved using the general type of design developed in this application note:

- Conventional synchronous design is the easiest type of design and it allows the controller to run at 133 MHz. Only one DCM is necessary on the transmit side. It can be used on the receiver side to capture data.
- Source-synchronous design is developed in this application note. It allows a 166 MHz interface for a 2-word burst memory device and a 200 MHz interface for a 4-word burst device. Using a high-speed grade Virtex-II device, it can be done with a significant margin.

In any case, a detailed timing analysis is the first step to designing a high-performance memory interface. The PCB characteristics, including new speed files, must be considered to ensure timing budget margin availability.

# Reference Design Notes

The reference design is available in VHDL and Verilog at: <a href="http://ftp.xilinx.com/pub/applications/xapp/xapp262.zip">http://ftp.xilinx.com/pub/applications/xapp/xapp262.zip</a>.

## Reference Design Description

- The top-level architecture is described in QDR\_burst\_2\_body.
- The address path is described in Address\_burst\_2.
- C\_Generator is the entity that provide the clock for capturing the data on the receive side.
- Clk\_generator is the entity that implements the clock generation for the memory device and the internal logic.
- The data path on the receive side and the clock domain change takes place in Read\_burst\_2.
- The data path on the transmit side is implemented in Write\_burst\_2 module.
- A testbench file for the reference design is provided in the package and the name of the architecture is QDR\_Ctrl\_TB.
- Another test bench is provided to check the direct data placement and exchange with the provided HDL model for QDR SRAMs.
- The Micron Technology MT54V512H18A HDL model.
- The package also contains scripts for simulation, and a .ucf constraints file.

### **Reference Design Notes and Advise**

- The functional simulations use ModelTech ModelSim 5.6. The synthesis uses Synplicity Synplify 7.1 and mapping, place and route with Xilinx ISE 5.1i.
- To reduce the skew on the I/O flip-flops clock tree, Xilinx recommends setting the buses on adjacent pads in the same I/O bank. In this design, all signals are on Bank 6 and Bank 7. If the size of the bus increases, it may be necessary to follow the SSO guideline and insert other signal in the bus implementation.

Also, the system I/O rules must be followed to achieve mapping. You will note that in our case, the input for deskew clock is set on bank 5.

- The current design should be able to run at a much higher speed than the FPGAs I/Os can handle (>300 MHz with a medium level of effort and few timing constraints). Nevertheless, depending on the design changes or on the resources used for a FIFO buffer, the speed may decrease significantly. Set the appropriate timing constraints to the critical path in the interface design.
- IBIS or H-Spice simulations are strongly recommended to check for any signal integrity problems and to check the Virtex-II I/Os switching speed.

## **Abbreviations**

Table 11: Glossary of Abbreviations

| Abbreviation | Description                                            |
|--------------|--------------------------------------------------------|
| BST          | Boundary Scan Test (IEEE 1149.9)                       |
| CLB          | Configurable Logic Block                               |
| DAR          | Double Address Rate                                    |
| DCM          | Digital Clock Manager                                  |
| DDR          | Double Data Rate                                       |
| DLL          | Delay-Locked Loop                                      |
| FIFO         | First In, First Out                                    |
| GND          | Ground                                                 |
| HSTL         | High-Speed Transceiver Logic                           |
| JEDEC        | Joint Electron Device Engineering Council              |
| JTAG         | Join Test Action Group                                 |
| PCB          | Printed Circuit Board                                  |
| PLL          | Phase-Locked Loop                                      |
| QDR SRAM     | Quad Data Rate Synchronous Static Random Access Memory |
| SAR          | Single Data Rate                                       |
| SSO          | Simultaneous Switching Outputs                         |
| TAP          | Test Access Port                                       |

# References

The following documents are recommended:

- 1. Micron Technology Inc., QDR SRAM Technical Guide, Technical Note TN-54-01, Rev. 4/01, 2001, <u>http://www.micron.com</u>
- Micron Technology Inc., 9Mb QDR SRAM 2-Word Burst, MT54V512H18A, Revision 3/00, 2000, <u>http://www.micron.com</u>
- Cypress Semiconductor Corporation, 9-Mb Pipelined SRAM with QDR Architecture CY7C1302V25, March 28, 2000, <u>http://www.cypress.com</u>
- 4. Xilinx Inc., Virtex-II Platform FPGA Handbook, Revision 1.3, December 2001, <u>www.xilinx.com</u>.
- 5. Xilinx Inc., Virtex-II Platform FPGA Data sheet, DS031-1 (rev 1.7), DS031-2 (Rev 1.9), DS031-3 (Rev 2.0), DS031-4 (Rev 1.6), November 2001, <u>www.xilinx.com</u>.
- 6. Xilinx Inc., Using the Virtex SelectI/O<sup>™</sup> Feature, Application Note XAPP133, Rev 2.5, September 7, 2000, <u>http://www.xilinx.com/xapp/xapp133.pdf</u>.

Also more detailed information about QDR SRAM can be found on the Internet web site of the QDR consortium, at <u>http://www.qdrsram.com</u>. Those memory devices are proposed by:

- Cypress Semiconductor Corporation at http://www.cypress.com
- HITACHI, Ltd. at <u>http://semiconductor.hitachi.com/memory.html</u>
- Integrated Device Technology, Inc. at <u>http://www.idt.com</u>
- Micron Technology, Inc. at <u>http://www.micron.com/</u>
- NEC Corporation at <u>http://www.ic.nec.co.jp/memory/index\_e.html</u>

SAMSUNG Electronics Company, Inc. at http://www.samsungelectronics.com

# Conclusion

-

The application note guidelines help to achieve the best performance when interfacing QDR devices with Virtex-II FPGAs. Virtex-II devices support up to DDR400 QDR interfaces as shown in the timing analysis section. The current reference design covers building a Virtex-II based high-performance controller. It also gives a starting point for building a QDR II SRAM interface. The reference design package provides design resources for DDR I/Os, DCM, System I/O interfaces to meet performance expectations.

# Revision History

The following table shows the revision history for this document.

| Date     | Version | Revision                                              |
|----------|---------|-------------------------------------------------------|
| 01/15/01 | 1.0     | Initial Xilinx release.                               |
| 02/27/02 | 2.0     | Overall revisions to code and document.               |
| 03/28/02 | 2.1     | Changes in many figures and tables.                   |
| 04/24/02 | 2.2     | Changes to figures and tables, added Verilog support. |
| 10/23/02 | 2.3     | Updates to many figures and tables.                   |
| 12/13/02 | 2.4     | Updated Figure 6 for clarity.                         |