# **CS321: Computer Architecture**

# Introduction



#### **Arijit Mondal**

Dept. of Computer Science & Engineering Indian Institute of Technology Patna

arijit@iitp.ac.in

|         | Class information                                                                                                          |
|---------|----------------------------------------------------------------------------------------------------------------------------|
|         | <ul> <li>Monday — 0900-1000</li> <li>Tuesday — 1000-1100</li> <li>Wednesday — 1100-12000</li> <li>Room No — 301</li> </ul> |
| ω C5321 |                                                                                                                            |



# **Introduction**

Application
Algorithms

Application

**Physics** 

Algorithms

Programming language

۰

Application

Algorithms

Programming language

Operating systems

Application
Algorithms
Programming language
Operating systems
Instruction set architecture

Application

Algorithms

Programming language

Operating systems

Instruction set architecture

Microarchitecture

**Application Algorithms Programming language Operating systems** Instruction set architecture Microarchitecture Register transfer level

| Application                  |
|------------------------------|
| Algorithms                   |
| Programming language         |
| Operating systems            |
| Instruction set architecture |
| Microarchitecture            |
| Register transfer level      |
| Gates                        |
|                              |

| Application                  |
|------------------------------|
| Algorithms                   |
| Programming language         |
| Operating systems            |
| Instruction set architecture |
| Microarchitecture            |
| Register transfer level      |
| Gates                        |
| Circuits                     |
| Physics                      |
| L                            |

| Application                  |
|------------------------------|
| Algorithms                   |
| Programming language         |
| Operating systems            |
| Instruction set architecture |
| Microarchitecture            |
| Register transfer level      |
| Gates                        |
| Circuits                     |
| Physics                      |
|                              |

| Application                  |  |
|------------------------------|--|
| Algorithms                   |  |
| Programming language         |  |
| Operating systems            |  |
| Instruction set architecture |  |
| Microarchitecture            |  |
| Register transfer level      |  |
| Gates                        |  |
| Circuits                     |  |
| Physics                      |  |

- Application Requirements:
  - Suggest how to improve architecture
  - Provide revenue to fund development

 Architecture provides feedback to guide application and technology research directions

- Technology Constraints:
  - Restrict what can be done efficiently
  - New technologies make new arch possible

#### **Abstraction**

- Abstraction helps us to deal with complexity
  - Hide lower level details
- Instruction set architecture
  - Hardware/Software interface
  - **Application binary interface**
  - ica i i
  - ISA plus system software
- Implementation
- The details underlying and interface

# Components of a Computer • Same components for all kind of computers • Server, Desktop, Embedded systems

# Components of a Computer Same components for all kind of computers Server, Desktop, Embedded systems Input-Output support

#### **Components of a Computer**

- Same components for all kind of computers
  - Server, Desktop, Embedded systems
- Input-Output support
  - User interface devices Keyboard, mouse, display
  - Storage devices Hard disk, CD/DVD, Flash
  - Network adapters for communicating with others

- Same components for all kind of computers
  - Server, Desktop, Embedded systems
- Input-Output support
  - User interface devices Keyboard, mouse, display
  - Storage devices Hard disk, CD/DVD, Flash
  - Network adapters for communicating with others
- Inside the computer

#### **Components of a Computer**

- Same components for all kind of computers
  - Server, Desktop, Embedded systems
- Input-Output support
  - User interface devices Keyboard, mouse, display
  - Storage devices Hard disk, CD/DVD, Flash
  - Network adapters for communicating with others
- Inside the computer
  - Arithmetic logic unit (ALU)
  - Program control unit
  - Memory
  - Datapath

# **IAS Computer**

Arithmetic Logic Unit

CS321

#### 5321

# **IAS Computer**

Arithmetic Logic Unit

Program control Unit Arithmetic Logic Unit

Program control Unit

## **IAS Computer**

Main Memory

Arithmetic Logic Unit

Program control Unit

Input Output

2

#### **IAS Computer**



### **Expanded structure of IAS Computer**



# Top level view of computer



### **Basic instruction cycle**



```
g = h * i;
k = j + i;
g = h[1];
```

- High level language
  - Easy to code & debug
     Close to problem dom
  - Close to problem domain
  - Provides productivity

```
g = h * i ;
k = j + i ;
g = h[1] ;
Compiler
```

- High level language
  - Easy to code & debug
  - Close to problem domain
  - Provides productivity

```
g = h * i ;
k = j + i ;
g = h[1] ;

Compiler

MUL RO, R1, R2 ;
ADD R3, R4, R2
```

LDR R3, [R0,#4]

- High level language
  - Easy to code & debug
  - Close to problem domain
  - Provides productivity
- Assembly language
  - Textual representation of instructions

```
g = h * i ;
k = j + i;
g = h[1];
 Compiler
MUL RO, R1, R2;
ADD R3, R4, R2
LDR R3, [R0,#4]
```

- Assembler

- High level language
  - Easy to code & debug
  - Close to problem domain
  - Provides productivity
- Assembly language
  - Textual representation of instructions

```
g = h * i ;
k = j + i ;
g = h[1] ;

Compiler

WUL RO, R1, R2 ;
ADD R3, R4, R2
LDR R3, [R0,#4]
```

- High level language
  - Easy to code & debug
  - Close to problem domain
  - Provides productivity
- Assembly language
  - Textual representation of instructions

Assembler 0000101101000010101

1010101111100101010

1010101011110000011

- Hardware language
  - Binary data
  - Encoded instruction and data

#### **Machine Model**



# **Understanding Performance**

- Algorithms
  - Determines number of operation executed
- Programing language, compiler, architecture
  - Determine number of machine instructions is executed per operation
- Processor and memory systems
  - Determines how fast instructions are executed
- I/O systems
  - Determines how fast I/O operations are performed

### **Performance**

- Response time
  - How long it takes to finish a task
- Throughput
  - Total workdone per unit time (eg. task/transaction/per hour)
  - Dependency of response time and throughput
  - Replacing the processor with a faster version?
  - Adding more processors?

# Relative performance

- Performance is defined as 1/Execution time
- X is n times faster than Y
  - Performance  $\chi$  / Performance  $\gamma$  = Execution time  $\gamma$  / Execution time  $\chi$  = n
- Example: Time taken to run a program
  - 10ns in machine X and 15ns in machine Y
  - Execution time \( \text{/Execution time} \( \text{ = 15/10 = 1.5} \)
  - So, X is 1.5 times faster than Y

- Elapsed time (Wall clock time)
  - Total time to complete a task including I/O, memory access, disk access, OS overhead, etc.
- CPU time
  - The time the CPU spends computing this task
  - Does not include I/O time, other jobs' share
  - Can be further subdivided user CPU time and system CPU time
- Different programs are affected differently by CPU and system performance

- Operation is controlled by a constant rate clock
  - Clock period is duration of clock cycle. (eg.  $300 \text{ns} = 300 \times 10^{-9} \text{s}$ )
  - Clock frequency is cycles per second. (eg.  $4 \text{GHz} = 4 \times 10^9 \text{Hz}$ )
  - Clock period = 1/Clock frequency

- CPU time = CPU clock cycles × Clock period = CPU clock cycle Clock frequency
- Performance can be improved by
  - Reducing number of clock cycle
  - Increasing clock frequency
  - Hardware designer must trade off clock frequency against cycle count

#### **Instruction count and CPI**

- Clock cycles = Instruction count × Cycles per instruction
- CPU time = Instruction count  $\times$  CPI  $\times$  Clock period =  $\frac{Instruction count \times CPI}{Clock frequency}$
- Instruction count for a program
  - Depends on ISA, compiler, program
- Average cycles per instruction
  - Determined by CPU hardware
  - Different instruction have different CPI
  - Average CPI is affected by instruction mix

# **CPI example**

- Machine A: Clock period 250ps, CPI 2.0
- Machine B: Clock period 500ps, CPI 1.2
- Same set of instructions
- Which is faster?

#### **CPI in more detail**

- Different instructions take different cycles
- Clock cycles =  $\sum_{i=1}^{n} (CPI_i \times Instruction count_i)$
- Weighted average CPI =  $\frac{\text{Clock cycle}}{\text{Instruction count}} = \sum_{i=1}^{n} \left( \text{CPI}_i \times \frac{\text{Instruction count}_i}{\text{Instruction count}} \right)$

- Which code sequence executes the most instructions?
- Compute average CPI for each sequence.

5321

3

## **Performance summary**

- $\bullet \ \ \text{CPU Time} = \frac{\text{Instructions}}{\text{Program}} \times \frac{\text{Clock cycles}}{\text{Instruction}} \times \frac{\text{second}}{\text{Clock cycle}}$
- Performance depends on
  - Algorithm Affects IC, possibly CPI
  - Programming language Affects IC, CPI
  - Compiler Affects IC, CPI
  - Instruction set architecture Affects IC, CPI, Clock period

#### **Performance: Power**

- ullet Power  $\propto$  Capacitive load imes Voltage $^2$  imes Frequency
- Suppose a new CPU has the following
  - 85% of capacitive load of old CPU
  - 15% reduction in voltage, 15% reduction in frequency

$$\begin{array}{l} \bullet \quad \frac{\mathbf{P}_{new}}{\mathbf{P}_{old}} = \frac{0.85 \times \mathbf{C}_{old} \times (\mathbf{V}_{old} \times 0.85)^2 \times \mathbf{F}_{old} \times 0.85}{\mathbf{C}_{old} \times (\mathbf{V}_{old})^2 \times \mathbf{F}_{old}} = 0.85^4 = 0.52 \end{array}$$

- Constarints
  - Further reduction in voltage may not be possible
  - Dissipation of heat

## MIPS as performance metric

- MIPS: Millions of Instruction Per Second
  - Does not account for
    - Differences in ISAs in computers
    - Differences in complexity between instructions

$$\bullet \ \ \text{MIPS} = \frac{\text{Instruction count}}{\text{Execution time} \times 10^6} = \frac{\text{Instruction count}}{\frac{\text{Instruction count} \times \text{CPI}}{\text{Clock frequency}} \times 10^6} = \frac{\text{Clock frequency}}{\text{CPI} \times 10^6}$$

CPI varies between programs on a given CPU

## Multiprocessors

- Multicore multiprocessors
  - More than one processor per chip
- Requires explicit parallel programming
  - Instruction level parallelism
    - Hardware executes multiple instructions simultaneously
    - Hidden from programmer
  - Hard to do
    - Programming for performance
    - Load balancing
    - Optimizing communication and synchronization