

# **Color Space Converter**

Author: Latha Pillai

# **Summary**

This application note describes three ways to implement the YCrCb Color Space to RGB Color Space conversion necessary in many video designs. The first implementation shows how one might simply write Behavioral Verilog to describe the conversion equations and then synthesize to a silicon target. The second implementation uses the Xilinx feature of embedded RAM functioning as a Look-up Table (LUT), or ROM, to store all possible intermediate results for the terms in the three equations. Since three of the seven total terms are identical, only five ROMs are needed. The third implementation makes use of the embedded multiplier in the Virtex<sup>™</sup>-II device to do the color space conversion. Again, only five multipliers are used. The Verilog model using the embedded multiplier is synthesized, placed, and routed. The design has a clock performance of 185 MHz after place and route, using simple constraints.

# Color Space Definition

The human eye has three types of photoreceptor cells called cones. Stimulating the cells causes the human brain to "perceive" color. Colors can be specified, created, and visualized using different color formats or "color spaces."

Different color spaces have historically evolved for different applications. In each case, a color space was chosen for reasons that may no longer be applicable. Maybe a choice was made on a particular color space because the math elements needed to process were simpler or faster. Maybe a certain choice was better because it required less storage and bandwidth on digital buses.

Whatever historical reasons caused color space choices in the past, the convergence of computers, the Internet, and a wide variety of video devices, all using different color representations, is forcing the digital designer today to convert between them. The objective is to have a common color space that all inputs are converted to before algorithms and processes are executed. The converters are useful for a number of markets, such as image processing and filtering. Their basic function is to convert from one color space to another. This application note describes one such conversion.

Two Color Space Examples

# **RGB Color Space**

RGB color space is a simple and robust color definition used in computer systems and the Internet to help ensure that a color is correctly mapped from one platform to another without significant loss of color information. RGB uses three numerical components to represent a color. This color space can be thought of as a three-dimensional coordinate system whose axes correspond to the three components, R or red, G or green, and B or Blue. RGB is the color space that computer displays use. RGB corresponds most closely to the behavior of the human eye.

RGB is an additive color system. The three primary colors red, green, and blue are added to form the desired color. Each component has a range of 0 to 255, with all three 0s producing black and all three 255s producing white.

# YCbCr Color Space

YCbCr Color Space was developed as part of the Recommendation ITU-R BT.601 for worldwide digital component video standard and is used in television transmissions. YCbCr is a scaled and offset version of the YUV color space where Y represents luminance (or

© 2001 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and disclaimers are as listed at <a href="http://www.xilinx.com/legal.htm">http://www.xilinx.com/legal.htm</a>. All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice. brightness), U represents color, and V represents the saturation value. Here the RGB color space is separated into a luminance part (Y) and two chrominance parts (Cb and Cr).

As mentioned earlier, the historical reasons for this choice, over RGB, were to reduce storage and bandwidth. The eye is more sensitive to change in brightness than change in color. Engineers found that 60 to 70 percent of luminance or brightness is found in the "green color." In the chrominance part Cb and Cr, the brightness information can be removed from the blue and red colors.

To generate the same color in the RGB format, all three color components should be of equal bandwidth. This requires more storage space and bandwidth. Also, processing an image in the RGB space is more complex since any change in the color of any pixel requires all the three RGB values to be read, calculations performed, and then stored. If the color information is stored in the intensity and color format, some of the processing steps can be made faster.

The result is that Cb and Cr provide the hue and saturation information of the color and Y provides the brightness information of the color. Y is defined to have a range of 16 to 235 and Cb and Cr have a range of 16 to 240 with 128 equal to zero. Because the eye is less sensitive to Cb and Cr, engineers did not need to transmit Cr and Cb at the same rate as Y. Less storage and bandwidth was needed, resulting in design costs being reduced.

# Converting from YCrCb to RGB

A color in the YCrCb color space is converted to the RGB color space using the following equations:

 $\begin{aligned} R' &= 1.164(Y - 16) + 1.596(Cr - 128) \\ G' &= 1.164(Y - 16) - (0.813)(Cr - 128) - 0.392(Cb - 128) \\ B' &= 1.164(Y - 16) + 1.596(Cr - 128) \end{aligned}$ 

Where R'G'B' are gamma-corrected RGB values.

Figure 1 shows a direct mapping of the above three equations. Notice that three of the seven terms are duplicates. This term is computed once and fed to the output adders for the Y, Cr, and Cb results.



Figure 1: Block Diagram Showing Math Elements

# Virtex-II Implementation Examples

The high density, on-chip memory in the Virtex-II designs increase overall system bandwidth by providing fast and resource-efficient FIFO buffers, shift registers, and CAMs. With embedded multipliers and improved arithmetic functions, Virtex-II solutions deliver over 600 billion MACs/s of Xtreme DSP performance.

There are up to 192 18 x 18 signed multipliers in a single device, supporting up to 36-bit signed multiplications. Cascading these multipliers supports even larger numbers. The multipliers can be combinatorial or pipelined, running between 140 MHz and 250 MHz depending on bit width. These features make Virtex-II devices the ideal choice for implementing the color space converter.

#### **Verilog Examples**

As mentioned at the start of this application note, there are three different implementation examples. The following are the results of synthesizing and implementing each example.

# Implementation Using Behavioral Verilog (gen\_model.\*)

In this implementation, the basic YCrCb2RGB conversion equations are synthesized using Synplicity. All the signals are registered at the input and at the output. The synthesized EDIF file is then placed and routed using Design Manager. A timing constraint of 10 ns was given to the place and route tool. The implementation results are listed in the following tables.

#### Notes:

1. See Verilog file, gen\_model.v.

## Design Summary

| Target device  | XC2V1000 |
|----------------|----------|
| Target package | FG256    |
| Target speed   | -5       |

## **Design Statistics**

| Number of errors                            | 0                      |  |
|---------------------------------------------|------------------------|--|
| Number of warnings                          | 3                      |  |
| Number of slices                            | 162 out of 5,120 (3%)  |  |
| Number of slices containing unrelated logic | 0 out of 162 (0%)      |  |
| Number of slice flip-flops                  | 53 out of 10,240 (1%)  |  |
| Total number of 4-input LUTs                | 253 out of 10,240 (2%) |  |
| Number used as LUTs                         | 247                    |  |
| Number used as a route-through              | 6                      |  |
| Number of bonded IOBs                       | 68 out of 172 (39%)    |  |
| IOB flip-flops                              | 15                     |  |
| Number of GCLKs                             | 1 out of 16 (6%)       |  |
| Total equivalent gate count for design      | 3,487                  |  |
| Additional JTAG gate count for IOBs         | 3,264                  |  |

## Timing Summary

| Minimum period                           | 14.156 ns (Maximum frequency: 70.641 MHz) |
|------------------------------------------|-------------------------------------------|
| Minimum input arrival time before clock  | 0.785 ns                                  |
| Minimum output required time after clock | 12.993 ns                                 |

## Implementation Using Block RAM as Look-Up ROM (ram\_model.\*)

Y, Cb, and Cr are 10-bits wide and so have a range of 0 to 1023. This would give the following values for each of the terms in the R,G, and B equations:

1.164(Y-16) = 1.164[(0-16)to(1023-16)] = 1.164(-16to1007) 1.596(Cr - 128) = 1.596[(0-128)to(1023 - 128)] = 1.596(-128to895) 0.813(Cr - 128) = 0.813[(0-128)to(1023 - 128)] = 0.813(-128to895) 0.392(Cb - 128) = 0.392[(0-128)to(1023 - 128)] = 0.392(-128to895)2.017(Cb - 128) = 2.017[(0-128)to(1023 - 128)] = 2.017(-128to895)

Each of these terms is calculated for all the possible input values. The results can then be stored in a 16-bit wide, 1024-deep RAM. Five RAMs are used for the five terms. The address lines to the RAMs are the respective input signals that are used in each of the terms. The output of the RAM is the data stored in the location addressed by the input signals, Y, Cr, and Cb. The output of the RAMs are added using an adder. The block diagram and the implementation results for this method are shown in Figure 2.



Figure 2: Implementation Using RAM

#### Implementation Results Using Embedded Multiplier in Virtex-II Device

The model with the instantiated block RAM was synthesized using Synplicity and the resulting EDIF file was placed and routed using Design Manager. A timing constraint of 5 ns was given to the place and route tool. The implementation results (push button) for the color space converter using the instantiated block RAM are as follows:

#### Notes:

1. See Verilog file, ram\_model.v.

### Design Summary

| Target device  | XC2V1000 |
|----------------|----------|
| Target package | FG256    |
| Target speed   | -5       |

#### **Design Statistics**

| 0                     |  |
|-----------------------|--|
| 0                     |  |
| 23                    |  |
| 44 out of 5,120 (1%)  |  |
| 0 out of 44 (0%)      |  |
| 10 out of 10,240 (1%) |  |
| 60 out of 10,240 (1%) |  |
| 55                    |  |
| 5                     |  |
| 68 out of 172 (39%)   |  |
| 26                    |  |
| 5 out of 40 (12%)     |  |
| 1 out of 16 (6%)      |  |
| 328,655               |  |
| 3,264                 |  |
|                       |  |

#### Timing Summary

| Minimum period                           | 9.740 ns (Maximum frequency: 102.669 MHz) |
|------------------------------------------|-------------------------------------------|
| Minimum input arrival time before clock  | 1.365 ns                                  |
| Minimum output required time after clock | 11.889 ns                                 |

## Implementation Using Embedded Multiplier (mult\_model.\*)

The block diagram for the implementation using embedded multiplier is as shown in Figure 3. A two's complement circuit is provided to take care of the negative results for (Y-16), (Cr-128), and (Cb-128) values. The two's complement circuit can be omitted if the inputs are assumed to be in two's complement format.



#### Implementation Results Using Embedded Multiplier in Virtex-II Device

The model with the instantiated multiplier was synthesized using Synplicity and the resulting EDIF file was placed and routed using Design Manager. A timing constraint of 5 ns was given to the place and route tool. The implementation result (push button) for the color space converter using the instantiated multiplier is as follows:

#### Notes:

1. See Verilog file, mult\_model.v.

#### Design Summary

| Target device  | XC2V1000 |
|----------------|----------|
| Target package | FG256    |
| Target speed   | -5       |

#### **Design Statistics**

| 0                      |  |
|------------------------|--|
| 2                      |  |
| 137 out of 5,120 (2%)  |  |
| 0 out of 137 (0%)      |  |
| 183 out of 10,240 (1%) |  |
| 144 out of 10,240 (1%) |  |
| 142                    |  |
| 2                      |  |
| 68 out of 172 (39%)    |  |
| 36                     |  |
| 5 out of 40 (12%)      |  |
| 1 out of 16 (6%)       |  |
| 23,168                 |  |
| 3,264                  |  |
|                        |  |

#### Timing Summary

| Minimum period                           | 5.512 ns (Maximum frequency: 181.422 MHz) |
|------------------------------------------|-------------------------------------------|
| Minimum input arrival time before clock  | 5.193 ns                                  |
| Minimum output required time after clock | 10.027 ns                                 |

# Reference Design

The VHDL and Verilog reference designs for this application note are available on the Xilinx web site in a .zip file:

ftp://ftp.xilinx.com/pub/applications/xapp/xapp283.zip

# Conclusion

The results of the synthesis and implementations demonstrate how the three examples trade off one math resource for another. The Behavioral Verilog describing the conversion equations uses a resource available in Virtex, Virtex-E, and Virtex-II devices, known as "MULT\_AND" to form the basis of the multiplies in the equations. No block RAM or embedded multipliers are consumed. In the second example, the math resource used is block RAM/ROM, again available in all Virtex families and their derivatives. Finally, the Virtex-II family now provides the most flexible math resource for DSP in the form of an imbedded, high-speed, two's complement multiplier.

# Revision History

The following table shows the revision history for this document.

| Date     | Version | Revision               |
|----------|---------|------------------------|
| 07/11/01 | 1.0     | Initial Xilinx release |