#### MIPS, 1 55 implementation

**Instruction-Level Parallelism** 

- --- A single job (program/thread)
- --- Execute multiple instructions simultaneously
- --- As many as we can
- --- w/ minimal added hardware

#### **Complex System Design**

- --- Modularity
  - --- simple functions w/ guaranteed behavior
  - --- simple interfaces w/ good abstractions
- --- Heirarchical composition
  - --- low-level complexity is hidden
  - ===> 1 billion components that work!

- --- Reusable building blocks
- ---- Generally applicable
- --- Customizable
- --- Isolation allows evolution

## Implement MIPS/LC3\*

- Generally decompose digital systems into two kinds of operation
  - Things that deal with the real data (Datapath)
  - Things that control the stuff operating on the real data (Control)

- Find a decomposition that is simple, and efficient
  - Some are obvious, others can be more subtle
- We will start simple
  - Add stuff to improve performance

Remove annoying instructions



 $\rightarrow$  18 bits

 $\rightarrow$  28 bits





skews the start of critical path delay for some path.

Total delay = max critical path + max skew + hold + setup -> T clock

Instruction Fetch , PC++



PC < PC + 1, ISA = { ALU\_op, LDR/STR, BR (includes TRAP, JSRR) } -> simplify

## Design ISA for ILP, fast ops

Instructions are fixed length

- Don't need to decode first instruction to find next one

Always add 4 pytes to instruction pointer (PC)

e.g., unlike x86

- Register specifiers are always in the same place
  - Destination moves around some, but
  - Source registers are always in the same place
    - Or you don't need that register
  - Can fetch the registers BEFORE you decode instruction
    - Feed bits directly from the instruction memory

# Datapath: R-Format Instructions



#### Datapath: Load I-format, LW

Extend datapath to support other immediate operations address calc.



- Extender handles either sign or zero extension
- MUX selects between ALU result and Memory output



I-format, SW

Read Register 2 is passed on to Memory

Memory address calculated just as in lw case

Could we use LOR/STR? How?



### I-format, Datapath: IF + Branch

PC++ 30 30- bit reg opcode is BR 6 isZero E BR cond. SRC-1 decode Read address Dest from ? 00 nstruction [31–0] zero-extend LC3? NZP? immed. Instruction 2-bits 16 SEXT what phase: decode, ALU ..? 30 16 A 1/-----

(We can squeeze more range out of offsets: ignore low 2-bits for instruction addresses.)

#### Datapath: IFU + Jump J-format, JMP

• MUX selects pseudodirect jump target JMP anywhere within either OS or USER space. Low 2 bits always 00: ignore them; Upper 4 bits == (x8, super mode) or (x0, user mode)



#### Putting It All Together: Your first processor





ROM is turned sideways.

This is just a portion of decode ROM and ALU control decode ROM. 6-bit FUNC ===> 64 arith/logic operations per opcode for R-format instructions. "ADD" is 3-bit code to select ALU's output to be ADD. Simplify? Some opcodes use ALU for

Single-Cycle MIPS Processor Summary

---- Advantages

=== Simple control logic

=== All instructions execute in 1 cycle (CPI = 1)

=== minimal hardware

--- Disadvantages

=== Each component is idle most of the time as wave of logic signals traverse entire circuit.

Logic/arith, some for address

ALU decode? Immed. values?

calc. LC3 -> get rid of

=== Cycle time is sum of component delays ===> very slow CR.

=== Slowest instruction (LW) determines CR for all instructions.

#### LC4, a 1-cyle, simplified ISA, LC3



ADD AND NOR MOV (pass through), opcodes: 0000, 0001, 0010, 0011 instr[15:12] instr[11:9] instr[8:6] instr[5:3] instr[2:0] OPcode SR1 SR2 (ALUk == low 2 bits)DST unused ADD R3, R1, R2 ;--- R3 <=== R1 + R2 [0000 001 010 011 xxx] MOV R3, *xx*, R2 ;--- R3 <=== R2 [0011 xxx 010 011 xxx] ..... LDR STR, opcodes: 1001, 1010 instr[11:9] instr[8:6] instr[15:12] instr[5:0] DST/SRC (ALUk == 2'b00)OPcode baseR offset LDR R2, R1, x7 ;--- DMEM[ R1 + x7 ] ===> R2 [1001 001 010 000111] ..... **LEA LIM**, opcodes: 1011, 1100 instr[15:12] instr[11:9] instr[8:6] instr[5:0] OPcode unused DST offset (ALUk == 2'b00, 2'b11)LEA R2, PC, x7 ;--- R2 <=== PC + x7 [1011 xxx 010 000111] LIM R2, xx, x7 ;--- R2 <=== [1100 xxx 010 000111] x7 ..... **BRR**, opcode: 1111 instr[11:9] instr[8:6] instr[15:12] instr[5:0] OPcode baseR CND offset BRR R2, R1, x7 ;--- PC <=== R1 + x7 [1111 001 010 000111] (taken if R2 < 0)