







| Instruction Level Pipelining                                                                                                                                      |       | Ins      | stru    | ctior        | n Le    | vel     | Pipe       | lini       | ng:        | Big      | Picture    | ;      |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|----------|---------|--------------|---------|---------|------------|------------|------------|----------|------------|--------|
| <ul> <li>Pipelining is also applied to Instruction Processing</li> </ul>                                                                                          |       | • Each s | stage o | of the I     | nstruc  | ction F | Proces     | ssing      | Cycle      | takes    | 1 clock cy | /cle   |
| <ul> <li>In instruction processing, each instruction goes through</li> </ul>                                                                                      |       | ≻ 1      | 1 clock | cycle =      | = x tim | e units | per s      | tage       |            |          |            |        |
| F->D->EA->OP->EX->S cycle                                                                                                                                         |       | Eor ear  | ch eta  | <u>ao on</u> | o nha   | so of   | inetru     | ction i    | e carr     | ied ou   | t and the  | stages |
| The instruction cycle is divided into stages                                                                                                                      |       | are ove  | erlappe | ed           | e pria  | SE UI   | instru     | CIUTI      | s can      | ieu ou   |            | slayes |
| One stage could contain more than one phase of the instruction cycle or one phase can be divided into two stages                                                  |       |          |         | Cycle 1      | Cycle 2 | Cycle 3 | Cycle 4    | Cycle 5    | Cycle 6    | Cycle 7  |            |        |
|                                                                                                                                                                   |       |          |         | S1           | 52      | 53      | S4         | \$5        | 56         |          |            |        |
| <ul> <li>If an instruction is in a particular stage of the cycle, the rest of the stages<br/>are idle</li> </ul>                                                  |       |          |         | lostruc      | tion 1  |         |            | 00         |            |          |            |        |
|                                                                                                                                                                   |       |          |         | mstruc       |         |         |            |            |            | -        |            |        |
| We exploit this idleness to allow instructions to be executed in parallel                                                                                         |       |          |         |              | S1      | S2      | <b>S</b> 3 | <b>S</b> 4 | <b>S</b> 5 | S6       |            |        |
| <ul> <li>From the Laundry Example, we know that throughput increase also<br/>allows reduction in completion time, hence overall program execution time</li> </ul> |       |          |         |              | Instruc | tion 2  |            |            |            |          |            |        |
| can be lowered                                                                                                                                                    |       |          | S1. F   | etch ir      | nstruc  | tion    |            | S4.        | Fetch      | n opera  | inds       |        |
| • Queb percellel evenution is called instruction lovel ninglining                                                                                                 |       |          | S2 D    | ecode        | onco    | de      |            | <b>S</b> 5 | Exec       | ute      |            |        |
| • Such parallel execution is called instruction-level pipelining                                                                                                  |       |          | S3. E   | valuat       | e Ado   | dress   |            | S6.        | Store      | e result |            |        |
| СІТ 595                                                                                                                                                           | 9 - 5 | CIT 595  |         |              |         |         |            |            |            |          |            | 9 - 6  |
|                                                                                                                                                                   |       |          |         |              |         |         |            |            |            |          |            |        |



## **Theoretical Speedup due to Pipelining**

If we take the time required to complete n tasks without a pipeline and divide it by the time it takes to complete n tasks using a pipeline, we find:

Speedup S = 
$$\frac{nt_n}{(k+n-1)t_p} \rightarrow t_n = k \times t_p$$

If we take the limit as n approaches infinity, (k + n - 1) approaches n, which results in a theoretical speedup of:

Speedup 
$$S = \frac{kt_p}{t_p} = k$$

CIT 595

9 - 8









| Impact on Clock Cycle due to Pipelining                                                                                                                                           |      |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| • Again for pipelining, the <i>clock</i> is sequencing the stages (instructions move in lock step fashion)                                                                        |      |
| • For pipelining to work correctly, we want to make sure<br>that all work done in one stage gets done on time before<br>it moves to next stage                                    |      |
| • Hence, the clock cycle time should be as long as time it takes through the longest pipe stage (this also includes the time for capturing data into registers in between stages) |      |
| Clock cycle time = $max(t_1, t_2, t_3, t_4, t_5, t_6)$                                                                                                                            |      |
| СІТ 595 9                                                                                                                                                                         | - 13 |























| Solution to Example 1                                                                            |                         |      |        |                      |            |         |           |         |          |            |
|--------------------------------------------------------------------------------------------------|-------------------------|------|--------|----------------------|------------|---------|-----------|---------|----------|------------|
| <ul> <li>Naive approach, introduce delay in the pipeline till instruction i1 finishes</li> </ul> |                         |      |        |                      |            |         |           |         |          |            |
| <ul> <li>Also stop any new instructions from being fetched</li> </ul>                            |                         |      |        |                      |            |         |           |         |          |            |
| • Also                                                                                           | know                    | n as | Pipeli | ine <mark>S</mark> t | all or     | Bubb    | le        |         |          |            |
| cycle (                                                                                          | cycle 0 1 2 3 4 5 6 7 8 |      |        |                      |            |         |           |         |          |            |
| i1                                                                                               | S1                      | S2   | S3     | S4                   | <b>S</b> 5 | S6      |           |         |          |            |
| i2                                                                                               |                         | S1   | S2     |                      |            |         | S3        | S4      | S5       | S6         |
|                                                                                                  |                         |      |        |                      |            | S1: Ins | structio  | n Fetch | I        |            |
|                                                                                                  |                         |      |        |                      |            | S2: De  | ecode     |         |          |            |
|                                                                                                  |                         |      |        |                      |            | S3: Re  | egister l | Fetch   |          |            |
|                                                                                                  |                         |      |        |                      |            | S4: Ex  | ecute/E   | Evaluat | e Addr   |            |
|                                                                                                  |                         |      |        |                      |            | S5: Me  | emory A   | Access  | (LD/ST   | )          |
| CIT 595                                                                                          |                         |      |        |                      |            | S6: W   | rite Bac  | k (regi | ster wri | ie) 9 - 25 |





















| Solution to Example 3                                                                                                                        |       |       |        |                |        |       |        |        |      |        |
|----------------------------------------------------------------------------------------------------------------------------------------------|-------|-------|--------|----------------|--------|-------|--------|--------|------|--------|
| • Com                                                                                                                                        | plete | Data  | Forw   | <i>l</i> ardir | ng not | poss  | ible i | n this | case | è      |
| C                                                                                                                                            | /cle  | 0 1   | 2      | 2 3            | 3 4    | . 5   | 5 6    | 5 7    | 7    |        |
|                                                                                                                                              | i1    | S1    | S2     | S3             | S4     | S5    | S6     |        |      |        |
|                                                                                                                                              | i2    |       | S1     | S2             | S3     | S4    | S5     | S6     |      |        |
| Value from memory (to be put in R1) is received<br>from memory at end of cycle 5 for i1, but i2<br>needs value of R1 at beginning of cycle 4 |       |       |        |                |        |       |        |        |      |        |
| <ul> <li>Stall</li> </ul>                                                                                                                    | for o | ne cy | cle ar | nd the         | en Fo  | rward | I      |        |      |        |
| cycle                                                                                                                                        | 0     | 1     | 2      | 3              | 4      | 5     | 6      | 7      | 8    |        |
|                                                                                                                                              | S1    | S2    | S3     | S4             | S5     | S6    |        |        |      |        |
| i2                                                                                                                                           |       | S1    | S2     | S3             |        | S4    | S5     | S6     |      |        |
| CIT 595                                                                                                                                      |       |       |        |                |        |       |        |        |      | 9 - 36 |





| Control Hazards                                                                                                                                               | Control Hazard                                                                                                                                                                                                            |    |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| <ul> <li>Occurs when we need to make a decision<br/>based on the result of instruction while others<br/>are executing</li> </ul>                              | • The problem with branch instructions is that:<br>1. We find out that instruction is Branch Instr only after<br>Decode Stage (S2) and by then we have already fetche<br>the next sequential instr.                       | ed |
| <ul> <li>Branch Instructions are instructions that make decision</li> <li>Alter the flow of execution in a program and give rise to control hazard</li> </ul> | <ul> <li>2. Branch address is resolved only in the <i>Evaluate</i> Address phase (S4)</li> <li>≻So we have to stall the pipeline till we know which address to fetch from i.e. PC + 1 or Branch Target address</li> </ul> | t  |
|                                                                                                                                                               | 3. The instr. before the branch instr. will set <i>Condition Code</i> (NZP) one cycle before or the same cycle the branch address is resolved                                                                             |    |
| CIT 595 9 - 39                                                                                                                                                | СІТ 595                                                                                                                                                                                                                   | 9  |

9 - 40





















| Prediction helps reduce impa                                                                                       | act on Avg. CPI                           |
|--------------------------------------------------------------------------------------------------------------------|-------------------------------------------|
| <ul> <li>Each branch instruction takes 1 cycle to co<br/>cycles (due to branch delays caused)</li> </ul>           | omplete + additional                      |
| Lets say 20% instructions are branches                                                                             |                                           |
| <ul> <li>Assume that branch predictor is 90% accurate</li> </ul>                                                   | rate                                      |
| • Pipeline without Prediction<br>CPI <sub>branch</sub> = Fraction Branches * (1 + Addit<br>= (0.2) * (1 + 3) = 0.8 | tional Cycles)                            |
| Pipeline with Prediction                                                                                           |                                           |
| $CPI_{branch} = Fraction Branches * (1 + 1 + (Rate * Additional Cycles))= (0.2) *(2 + (190) * 2)$                  | (Misprediction                            |
| = 0.44                                                                                                             | Assume instr. right after                 |
|                                                                                                                    | BR always cause 1<br>delay (conservative) |
| CIT 595                                                                                                            | 9-51                                      |

| Compiler + Prediction helps reduce impact on Avg. CPI                                                                                                                                                                                |   |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| <ul> <li>Lets say 20% instructions are branches</li> <li>Assume the compiler only finds one useful instruction on average</li> <li>&gt; Hence additional cycles needed is 2 due to NOPs</li> </ul>                                   |   |
| • Pipeline with Compiler<br>CPI <sub>branch</sub> = Fraction Branches * (1 + Additional Cycles)<br>= (0.2) * (1 + 2) = 0.6                                                                                                           |   |
| <ul> <li>Now, if we kick in the predictor (90% accurate) for 2 NOPs<br/>CPI<sub>branch</sub> = Fraction Branches * (1 + [Misprediction Rate * Additional<br/>Cycles])<br/>= (0.2) *[1 + [ (190) * 2])<br/>= 0.24         </li> </ul> |   |
| CIT 595 9 - 5                                                                                                                                                                                                                        | 2 |





## Delays due to JUMP instruction Additional delay of 2 cycles will be incurred one instruction that will eventually be aborted and one stall for not fetching next instruction after discovering Jump instruction Branch Prediction cannot help in this case as we are not waiting on a condition Any penalty if to be reduced will fall on compiler i.e. find useful instructions to do in the 2 delay slots

## **Exceptions** Exceptions are used for signaling a certain condition You already know 1. I/O request: device requests attention from CPU 2. System call or Supervisor call from software (TRAP in LC3, I/O functions in C) 3. Arithmetic: Integer or FP, overflow, underflow, division by zero 4. Invalid opcode: CPU was given an wrongly formatted instruction 5. Memory protection: read/write/execute forbidden on requested address Yet to learn (or may be heard) 1. Page fault: requested virtual address was not present in main memory 2. Misaligned address: bus error 3. Hardware malfunction: component failure **CIT 595** 9 - 56



| Handling All other Exceptions                                                                                                                                    |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>Let the instruction(s) before exception condition<br/>instruction complete</li> </ul>                                                                   |
| <ul> <li>Abort the instructions after excepting instruction</li> </ul>                                                                                           |
| <ul> <li>Save the state of the machine (esp. PC of the<br/>excepting condition instruction) to return back and<br/>continue from faulting instruction</li> </ul> |
| <ul> <li>Start fetching instructions in memory where the<br/>exception handling routine instructions are kept</li> </ul>                                         |
| • This is known as implementing <i>precise exceptions</i> i.e. undo all instructions after the excepting instructions and restart from the excepting instruction |
| CIT 595 9 - 54                                                                                                                                                   |
|                                                                                                                                                                  |
|                                                                                                                                                                  |

|                               | Pipelining Complications                                                            |  |
|-------------------------------|-------------------------------------------------------------------------------------|--|
| • Due to<br>multiple<br>cycle | o overlapping of instruction execution,<br>e exceptions can occur in the same clock |  |
| Stage                         | Problem Exceptions Occurring                                                        |  |
| Fetch                         | Memory-protection violation, Page fault on instruction                              |  |
|                               | fetch, misaligned memory access                                                     |  |
| Decode                        | Undefined Instruction                                                               |  |
| Execute                       | Arithmetic exception                                                                |  |
| Memory<br>Access              | Memory-protection violation, Page fault on instruction                              |  |
| 100000                        | fetch, misaligned memory access                                                     |  |
| 595                           | 9 - 55                                                                              |  |

CIT 595







