## **DESIGN HINTS AND ISSUES**

## Synchronous RAM Improves System Speed



The XC4000 FPGA architecture allows the option of using two of the memorylook-up-table-based function generators in each logic block as small blocks of RAM or ROM memory. These distributed RAM blocks —16 or 32 addresses deep — are a popular feature of the XC4000 devices.

These RAM blocks offer sub-5 ns access times and write pulse widths; they can greatly improve system performance by avoiding inter-chip delays. On-chip, distributed memory facilitates the efficient implementation of register banks, status registers, shift registers and high-speed FIFO buffers that bridge the gap between subsystems that have different access times and data burst rates.

However, writing into these fast, asynchronous RAM blocks can pose timing challenges. While a READ operation is simple (data is available a short time after the address inputs are stable), a WRITE operation must be timed more carefully. The actual writing is controlled by a Write Enable (WE) or write strobe signal. This signal must exceed a specified minimum width, and it must not start until the address inputs at the RAM cell have been stable for a specified address set-up time. The address inputs must remain stable for the duration of the WE pulse and for the specified address hold time after WE returns to its inactive state. The data to be written must be stable a specified time before the end of WE and also a short hold time after the end of WE. Violating any of these specifications can result in wrong data being written into the selected location, or data being written into a location not selected .

Such timing requirements are common to all conventional static RAMs. A relatively easy challenge with 35-ns SRAMs, timing becomes more difficult when dealing with 4-ns RAM blocks located inside an FPGA. In a typical synchronous system, timing requirements are easily met if a write operation can be stretched over two or more system clock periods; however, most systems require higher performance for their memory functions.

To perform a write operation in only one clock period, the circuit must be designed to:

- generate the address and route it to the memory block,
- generate a WE signal and route it to the RAM block so that it is arrives several nanoseconds after all address inputs are stable,
- terminate WE and route it so that it ends at the RAM cell while the address remains stable for several more ns, and
- generate proper data timing with respect to the trailing edge of WE.

If all of this must be achieved within one clock period — with only two identifiable clock edges — the user faces several difficult problems.

An elegant solution appears on page 8-139 of the Data Book which uses the low-skew global clock network as a synchronous active-High write strobe. Address and data can be generated synchronously by the falling edge of this global clock, while routing delays can be controlled to be less than the clock Low time, thus meeting the address set-up time requirement. WE timing is well controlled, and the natural delay in generating and routing the next address provides sufficient address hold time. Since the write strobe is unconditional, a write operation occurs on every clock pulse. To prevent the writing of new data, a multiplexer must connect the old data back to the data input. This method results in a robust, single-clock synchronous design. However, it sacrifices perfor-



mance — only half of one clock period is available for address routing. The data multiplexer also sacrifices density.

## Synchronous RAM

The XC4000E family features a synchronous RAM option that provides more performance and density. When the synchronous RAM option is invoked, writing to the RAM is like writing to a flip-flop or register. The user connects a clock signal - preferably a low-skew global clock net - to the RAM block and selects one clock edge (for example, the rising edge). The write requirements are now simple: Address, Data, and the WE control signal must be available and stable at the RAM block input a short set-up time before the active clock edge (see figure). There is no hold-time, and there are no other timing requirements.

Internal to the RAM block, Address, Data and WE drive latches are transparent when the clock is Low, and latched when the clock is High. When a WE signal is recognized, the rising clock edge generates a short internal write pulse that writes data into the addressed RAM location. The width of this pulse naturally adapts to the requirements of the RAM cell. A slow device at high temperature and low supply voltage generates a wider pulse than a fast device at low temperature and high supply voltage.

Using the same clock and the same clock edge to generate address, data, WE, and to clock the RAM, the system designer has a whole clock period to generate and distribute these signals. (See related article on page 30.) Thus, interfacing to the RAM is the same as interfacing to other registers. As a result, system speed can be almost doubled, while logic density can be increased substantially, compared with older asynchronous design methods. Synchronous mode does not affect the read operation. The read address bypasses the address latches, and synchronous and

asynchronous read operation are, therefore, identical.

## Synchronous, Dual-Port Mode

In synchronous, dual-port mode, the Faddress drives the address latches in both the F and the G RAMs; the data input is common

to both RAMs . Identical data is thus written into both RAMs, but the G read address can be used to read data independently, and present it at the G' output.

This addressing mechanism is ideal for FIFOs, with the F address used for writing and the G address used for reading. The user then only has to design the control circuitry for detecting the extreme situations of full and empty.

Synchronous dual-port mode achieves the highest possible speed, and can be run with asynchronous read timing, but it sacrifices storage capacity. For large and relatively slow FIFOs, the synchronous single-port mode may be the more efficient choice.

Small, distributed RAMs offer an attractive system solution for registers, shift registers and FIFO buffers. Previous fullyasynchronous designs had demanding timing constraints that limited their speed and density. The new synchronous distributed RAM feature in the XC4000E family makes distributed RAMs as easy to use as flip-flops and registers, and allows RAM operations at 70 MHz with a 14 ns synchronous clock cycle time (-3 speed grade). The optional dual-port mode makes FIFO designs fast and simple. ◆



Edge-Triggered RAM Write Cycle

