

## **Major Advances**

A number of advances have occurred since the von Neumann architecture was proposed:

- -Microprocessors
- -Solid-state RAM

CSCI 4717 – Computer Architecture

 Family concept – separating architecture of machine from implementation

RISC Processors - Page 2 of 46

RISC Processors – Page 4 of 46

## Major Advances (continued)

Microprogrammed unit

CSCI 4717 – Computer Architecture

- Microcode allows for simple programs to be executed from firmware as an action for an instruction
- Microcode eases the task of designing and implementing the control unit
- Cache memory speeds up memory hierarchy
- Pipelining reduces percentage of idle components
- Multiple processors Speed through parallelism

#### RISC Processors – Page 3 of 46

### Semantic Gap

- Difference between operations performed in HLL and those provided by architecture
- Example: Assembly language level case/switch on VAX in hardware
- Problems
  - inefficient execution of code
  - excessive machine program code size
  - increased complexity of compilers
- Predominate operations
- Movement of data

CSCI 4717 – Computer Architecture

- Conditional statements

Measuring Effects of Instructions
Dynamic occurrence – relative number of times instructions tended to occur in a compiled program
Static occurrence – counting the number of times

- Static occurrence counting the number of times they are seen in a program (This is a useless measurement)
- Machine-Instruction Weighted relative amount of machine code executed as a result of this instruction (based on dynamic occurrence)
- Memory Reference Weighted relative amount of memory references executed as a result of this instruction (based on dynamic occurrence)
- Procedure call is most time consuming

CSCI 4717 – Computer Architecture

RISC Processors – Page 5 of 46

#### **Operations** (continued) Machine-Instruction Memory-Reference Dynamic Occurrence Weighted Weighted Pascal C C Pascal Pascal C ASSIGN 45% 38% 13% 13% 14% 15% LOOP 5% 3% 42% 32% 33% 26% CALL 15% 12% 31% 33% 44% 45% IF 29% 43% 11% 21% 796 13% GOTO 3% OTHER 6% 296 196 1% 1%

#### Table 13.2 from Stallings

CSCI 4717 – Computer Architecture

RISC Processors – Page 6 of 46

## Operands

- Integer constants
- Scalars (80% of scalars were local to procedure)
- Array/structure

CSCI 4717 - Computer Architecture

CSCI 4717 – Computer Architecture

- Lunde, A. "Empirical Evaluation of Some Features of Instruction Set Processor Architectures." Communications of the ACM, March 1977.
  - Each instruction references 0.5 operands in memory
  - Each instruction references 1.4 registers
  - These numbers depend highly on architecture (e.g., number of registers, etc.)

RISC Processors - Page 7 of 46

RISC Processors – Page 9 of 46

#### Operands (continued) Table 13.3 from Stallings Pascal С Average 16% Integer 23% 20% constant 55% Scalar 58% 53% variable Array/ 26% 24% 25% structure CSCI 4717 – Computer Architecture RISC Processors - Page 8 of 4

| Procedure calls<br>Table 13.4 from Stallings   |                                          |                              |  |  |  |
|------------------------------------------------|------------------------------------------|------------------------------|--|--|--|
| Percentage of Executed<br>Procedure Calls With | Compiler, Interpreter, and<br>Typesetter | Small Nonnumerie<br>Programs |  |  |  |
| >3 arguments                                   | 0-7%                                     | 0-5%                         |  |  |  |
| >5 arguments                                   | 0-3%                                     | 0%                           |  |  |  |
| >8 words of arguments and<br>local scalars     | 1-20%                                    | 0-6%                         |  |  |  |
| >12 words of arguments and<br>local scalars    | 1-6%                                     | 0-3%                         |  |  |  |

This implies that the number of words required when calling a procedure is not that high.

### Reduced Instruction Set Computer (RISC)

Characteristics of a RISC architecture (reduced instruction set is not the only one):

- Limited/simple instruction set Will become clearer later
- Large number of general-purpose registers and/or use of compiler designed to optimize use of registers – This saves operand referencing
- Optimization of pipeline due to better instruction design – Due to high proportion of conditional branch and procedure call instructions

CSCI 4717 – Computer Architecture

RISC Processors – Page 11 of 46

# **Results of Research**

This research suggests:

CSCI 4717 – Computer Architecture

- Trying to close semantic gap (CISC) is not necessarily answer to optimizing processor design
- A set of general techniques or architectural characteristics can be developed to improve performance.



CSCI 4717 – Computer Architecture

RISC Processors – Page 12 of 46

RISC Processors – Page 10 of 46

## **Register Windows**

- The hardware solution for making more registers available for a process is to increase the number of registers
  - May slow decoding

CSCI 4717 - Computer Architecture

CSCI 4717 – Computer Architecture

- Should decrease number of memory accesses
- · Allocate registers first to local variables
- A procedural call will force registers to be saved into fast memory (cache)
- As shown in Table 13.4 (slide 9), only a small number of parameters and local variables are typically required



RISC Processors – Page 14 of 4

CSCI 4717 – Computer Architecture

## Register Windows (continued)

RISC Processors - Page 13 of 46

RISC Processors – Page 15 of 46

- This implies no movement of data to pass parameters.
- Begin to see why compiler writers would make better processor architects
- To make number of registers appear unbounded, architecture should allow for older activations to be stored in memory



## Register Windows (continued)

- When we need to free up a window, an interrupt occurs to store oldest window
- Only need to store parameter registers and local registers
- Temporary registers are associated with parameter registers of next call
- Interrupt is used to restore window after newest function completes
- N-window register file can only hold N-1 procedure activations
- Research showed that N=8 → 1% save or restore of the calls and returns.

CSCI 4717 – Computer Architecture

RISC Processors – Page 17 of 46

### Register Windows - Global Variables

- Question: Where do we put global variables?
- Could set global variables in memory
- For often accessed global variables, however, this is inefficient

CSCI 4717 – Computer Architecture

• Solution: Create an additional set of registers for global variables. (Fixed number and available to all procedures)

RISC Processors - Page 18 of 46

## Problems with Register Windows

• Increased hardware burden

CSCI 4717 – Computer Architecture

• Compiler needs to determine which variables get the nice, high-speed registers and which go to memory

RISC Processors – Page 19 of 46

## Register Windows versus Cache

- It could be said that register windows are similar to a high-speed memory or cache for procedure data
- This is not necessarily a valid comparison

CSCI 4717 – Computer Architecture

CSCI 4717 – Computer Architecture





RISC Processors - Page 20 of 46

RISC Processors – Page 22 of 4

Register Windows versus Cache (continued)

- There are, however, some areas where the register windows are a better choice
  - Register file more closely mimics software which typically operates within a narrow range of procedure calls whereas caches may thrash under certain circumstances
  - Register file wins the speed war when it comes to decoding logic
  - Good compiler design can take better advantage of register window than cache
- Solution use register file and instructions-only cache

CSCI 4717 – Computer Architecture

RISC Processors – Page 23 of 46



CSCI 4717 – Computer Architecture

RISC Processors – Page 24 of 46





## **CISC versus RISC**

RISC Processors – Page 25 of 46

RISC Processors – Page 27 of 46

CSCI 4717 – Computer Architecture

- Complex instructions are possibly more difficult to directly associate w/a HLL instruction – many compilers may just take the simpler, more reliable way out
- Optimization more difficult with complex instructions
- Compilers tend to favor more general, simpler commands, so savings in terms of speed may not be realized either

## CISC versus RISC (continued)

CISC programs may take less memory

- Not necessarily an advantage with cheap memory
- Is an advantage due to fewer page faults
- May only be shorter in assembly language view, not necessarily from the point of view of the number of bits in machine code

#### Additional Design Distinctions

- Further characteristics of RISC
  - One instruction per cycle

CSCI 4717 – Computer Architecture

- Register-to-register operations
- Simple addressing modes
- Simple instruction formats
- There is no clear-cut design for one or the other
- Many processors contain characteristics of both RISC and CISC

CSCI 4717 – Computer Architecture

#### RISC Processors – Page 29 of 46

### RISC – One Instruction per Cycle

• Cycle = machine cycle

CSCI 4717 – Computer Architecture

- Fetch two operands from registers very simple addressing mode
- Perform an ALU operation
- Store the result in a register
- Microcode should not be necessary at all hardwired code
- Format of instruction is fixed and simple to decode
- Burden is placed on compiler rather than processor – compiler runs once, application runs many times

CSCI 4717 – Computer Architecture

RISC Processors – Page 30 of 46

RISC Processors – Page 28 of 4



#### Simple addressing modes

- Register
- Displacement
- PC-relative
- No indirect addressing requires two memory accesses
- · No more than one memory addressed operand per instruction
- · Unaligned addressing not allowed, i.e., addressing only on breaks of 2 or 4

RISC Processors - Page 32 of 46

· Simplifies control unit



| har<br>Processor | Number<br>of<br>instruc-<br>tion | Max<br>instruc-<br>tion size | Number of<br>addressing | Indirect   | Load store<br>combined<br>with | Max<br>number of<br>memory | Unaligned | Max<br>Number of | Number of<br>bits for<br>integer<br>register | Number of<br>bits for FP<br>register |
|------------------|----------------------------------|------------------------------|-------------------------|------------|--------------------------------|----------------------------|-----------|------------------|----------------------------------------------|--------------------------------------|
| AMD29000         | sizes                            | in bytes<br>4                | modes                   | addressing | arithmetic<br>no               | operands                   | allowed   | MMU uses         | specifier<br>8                               | specifier<br>3.*                     |
| MIPS R2000       | 1                                | 4                            | 1                       | 80         | 80                             | 1                          | 80        | 1                | 5                                            | 4                                    |
| SPARC            | 1                                | 4                            | 2                       | 80         | 80                             | 1                          | 80        | 1                | 5                                            | 4                                    |
| MC88000          | 1                                | 4                            | -                       | 80         | 80                             | 1                          | 80        | 1                | 5                                            | 4                                    |
| HP PA            | 1                                | 4                            | 10.4                    | 80         | BC BC                          | 1                          | 80        | 1                | 5                                            | 4                                    |
| IBM RT/PC        | 24                               | 4                            | 1                       | 80         | no                             | 1                          | 80        | 1                | 44                                           | 34                                   |
| IBM RS/6000      | 1                                | 4                            | 4                       | 80         | во                             | 1                          | 145       | 1                | 5                                            | 5                                    |
| Intel i860       | 1                                | 4                            | 4                       | 80         | 80                             | 1                          | 80        | 1                | 5                                            | 4                                    |
| IBM 3090         | 4                                | 8                            | 23                      | B0*        | yes                            | 2                          | yes       | 4                | 4                                            | 2                                    |
| Intel \$0486     | 12                               | 12                           | 15                      | no*        | yes                            | 2                          | yes       | 4                | 3                                            | 3                                    |
| NSC 32016        | 21                               | 21                           | 23                      | yes        | yes                            | 2                          | yes       | 4                | 3                                            | 3                                    |
| MC68040          | 11                               | 22                           | 44                      | yes        | yes                            | 2                          | yes       | 8                | 4                                            | 3                                    |
| VAX              | 56                               | 56                           | 22                      | yes        | yes                            | 6                          | yes       | 24               | 4                                            | 0                                    |
| Clipper          | 4*                               | 81                           | 9*                      | 80         | во                             | 1                          | 0         | 2                | 4*                                           | 3*                                   |
| Intel \$0960     | 2*                               | 81                           | 91                      | 80         | во                             | 1                          | yes"      | -                | 5                                            | 3*                                   |





## **Delayed Branch**

- Traditional pipelining disposes of instruction loaded in pipe after branch
- Delayed branching executes instruction loaded in pipe after branch
- NOOP can be used if instruction cannot be found to execute after JUMP. This makes it so no special circuitry is needed to clear the pipe.

RISC Processors – Page 37 of 46

· It is left up to the compiler to rearrange instructions or add NOOPs

CSCI 4717 – Computer Architecture

#### **Delayed Branch (continued)**

| Address | Norm: | al Branch | Delaye | d Branch |       | imized<br>d Branch |
|---------|-------|-----------|--------|----------|-------|--------------------|
| 100     | LOAD  | X,A       | LOAD   | X,A      | LOAD  | X,A                |
| 101     | ADD   | 1,A       | ADD    | 1,A      | JUMP  | 105                |
| 102     | JUMP  | 105       | JUMP   | 106      | ADD   | 1,A                |
| 103     | ADD   | A,B       | NOOP   |          | ADD   | A,B                |
| 104     | SUB   | C,B       | ADD    | A,B      | SUB   | C,B                |
| 105     | STORE | A,Z       | SUB    | C,B      | STORE | A,Z                |
| 106     |       |           | STORE  | A,Z      |       |                    |





RISC Processors – Page 40 of 46

|      | I          | Problem 13.6        | 6 from Textbook                  |
|------|------------|---------------------|----------------------------------|
|      | S := 0     | );                  |                                  |
|      | for K      | =1 to 100 do S :=   | S – K;                           |
|      |            | tran                | slates to                        |
|      | LD         | R1, 0               | ;keep value of S in R1           |
|      | LD         | R2, 1               | keep value of K in R2            |
| LP   | SUB        | R1, R1, R2          | ;S := S – K                      |
|      | BEQ        | R2, 100, EXIT       | ;done if K = 100                 |
|      | ADD        | R2, R2, 1           | ;else increment K                |
|      | JMP        | LP                  | ;back to start of loop           |
| Whe  | e shou     | ld the compiler adc | NOOPs or rearrange instructions? |
| CSCI | 1717 – Con | nputer Architecture | RISC Processors – Page 41 of 46  |









