

### Motivation #2: Memory Management for Multiple Programs







Names: memory addresses (as before) and disk addresses;

Objects: data/instruction "pages" (the same, but bigger than cache blocks)





A name is a key to look up its location, Mapping: key ===> location

- Why Mapping/Translation?
- --- Multiple Programs
- --- Program written w/o knowing where it will be in memory
- --- OS moves programs around
- --- Problem w/ physical memory: relocation == load editing?





Segment register could point anywhere in memory

MAR content is address relative to Segment register's pointer.

Swap Segment register's value ===> Switch to new memory area

Programs both "live" in own space, but each "thinks" its space starts at address 0.

## It's all caching (review, new view of old stew)



# A System with Virtual Memory

Address Translation



Pages Vos manages Page Tables

M-1

Table

Process 2:

N-1

#### Page Table *issues*

--- Where, physically?

memory? SRAM? Hardware?

table in memory?

how many memory accesses to read a word? (ignoring L1, L2, ...)

--- Page Tables R/W?

program rewrites page table (accidentally)?

not R/W? how to store pointer values?

===> protection bits per page: Kernel Mode: R/W, User Mode: not R/W ===> Where are bits? How accessed?

--- Share physical memory?

interleaving: long latency, do other work. OS has work to do, too.

#### ---- I/O?

use virtual addresses? Memory mapped device registers? long, slow I/O for disk blocks (pages)?



Protection through Access Permissions

Add more bits to Page Table Entry (PTE)



<sup>1</sup>2<sup>18</sup> frames



4 kR frame

Bigger VM space? 64-bit: 2<sup>64</sup> = (166)6B



OS turns off VM translation to directly access physical memory.





Speed it up:

- 1. PTBR <== Page-table-location-pointer Do this once at program startup
- 2. Cache PTEs!





### TLB Misses -> TLB exception handler

## Read PT, get PTE

- l page is in memory
  - Load the PTE to TLB and retry instruction
  - Could be handled in hardware ?
    - Can get complex for more complicated page table structures ?
  - Or in software
    - Raise a special exception, with optimized handler
    - This is what MIPS does using a special vectored interrupt
- If page is not in memory (page fault)
  - OS handles fetching the page and updating the page table

Load PTE to TLB

Then restart the faulting instruction















Is PT same size as before? How do we save space with this?





Do collisions bother us? Typically, how many?

Is this scheme to slow?

Why? When does this happen?

How many instructions are involved anyway?

Where is the real Page Table anyway?

Do we need one?

### Real Example: Intel P6

- Internal Designation for Successor to Pentium
  - Which had internal designation P5
- Fundamentally Different from Pentium
  - Out-of-order, superscalar operation
  - Designed to handle server applications
    - Requires high performance memory system
- Resulting Processors
  - PentiumPro 200 MHz (1996)
  - Pentium II (1997)
    - Incorporated MMX instructions
    - L2 cache on same chip
  - Pentium III (199<u>9)</u>
    - Incorporated Streaming SIMD Extensions
  - Pentium M 1.6 GHz (2003)
    - Low power for mobile

Adapted from Computer Systems: APP

The base for Intel Core and Core 2

Bryant and O'Halloraon

### P6 memory system







P6 page directory entry (PDE) one 32-bit Word



Page base address: 20 most significant bits of physical page address (forces pages to be 4 KB aligned)

Avail: available for system programmers

G: global page (don't evict from TLB on task switch)

<u>D</u>:dirty(set by MMU on writes)

A: accessed (set by MMU on reads and writes)

CD: cache disabled or enabled

WT: write-through or write-back cache policy for this page

U/S: user/supervisor

R/W: read/write

P: page is present in physical memory (1) or not (0)

31

Available for OS (page location in secondary storage)

1 0 P=0





Read PDE, find PT disk address; Read PT page from disk; Restart; (after restart: Case 1/1)



Page fault for PT as in case 0/1; Restart; (after restart, becomes Case 1/0)



Goal: enforce an invariant so that, --- Shared data is present once, at most. in same page table.)









No limit checking ---- can overrun segment

No protection

----- can write segment registers

Segment registers implicit ----- instruction fetch: uses CS

- ----- data access: uses DS
- ----- stack operation: uses SS

Programmer's perspective:

---- Segment's address is 0

---- Offset is address

Seg Reg is offset into table ---- Entries are descriptors Too Slow? Fix it: ---- extend Seg Regs ---- cache Descriptor in extended Seg Reg ---- Check limits, PID, R/W, ... Also: ---- Special Segs for Calls ---- "conforming" ==> change mode

---- 8k segments @ 4GB

#### Flat Addressing:

- ---- set all Descriptors:
- ---- BASE == x00000000
- ---- LIMIT == xFFFFFFF
- ---- 1-to-1 w/ 32-bit MAR



---- Seg Selects can be written

writing CS, DS, SS: changes segments But, via selecting different segment descriptors in table

Descriptor table is OS controlled.

---- Also available in IA-32 (x86)

Paging mode (2-level and 3-level)

"Real" mode (all physical 20-bit address w/ 16-bit segment + 16-bit offsets)

**Paged Segments paging + segmentation:** 

Segment Descriptor points to Page Directory

**Reference:** 

http://pdos.csail.mit.edu/6.828/2005/readings/i386/s05\_01.htm



Approach II minimal "05"

- --- Monitor provides Mapping Virtual Machine (VMa) ===> partitioned resources
- --- Monitor is small, simple, reliable
- --- Each guest runs in its own VMa
- --- Guest instructions run w/o emulation Guest code is in HW ISA
- --- Each VMa has its own OS manages resources separately
  - --- processes
  - --- memory
  - --- disk space
  - --- cpu scheduling



#### Advantages

- --- Monitor-1, Monitor-2 identical virtual machines Host HW can be different (degree?)
- --- Guests Isolated VMM is safer than OS Bugs/Attacks limited to one VMa
- --- Guest migration, Multiple guests ===> bulk efficiencies: shared computing resources uptime load balancing
- --- Legacy apps ===> Legacy VMa.
- --- Guest OS configuration matches guest's apps
- --- Different OS per Guest
- --- Checkpointing Suspend/restart/rollback



| Арр                     | Арр                 | Арр | Арр                 | Арр |                     |  |
|-------------------------|---------------------|-----|---------------------|-----|---------------------|--|
| Opera<br>syst           | Operating<br>system |     | Operating<br>system |     | Operating<br>system |  |
| Virtual machine monitor |                     |     |                     |     |                     |  |
| Hardware                |                     |     |                     |     |                     |  |



- --- Virtualizable if:
  - 1. Can execute directly on HW
  - 2. VMM controls resources
- --- Monitor runs in kernel mode
- --- Guest runs in user mode
- ===> privileged instructions trap to monitor's handler

#### OR

---- Binary translation (static or runtime): Replace problematic instructions

#### OR

---- Add new hardware modes.

HW Platform  $1 \neq$  HW Platform 2

What problem instructions, can't we trap all of them? if ao -> x86 virtualization!



### VM Ware METHODS

#### vmkernel:

- --- boot loader
- --- x86 abstraction
- --- IO stacks (storage, network)
- --- memory scheduler
- --- cpu scheduler

#### VMM (vmkernel priviledged process):

- --- Trapping, translation
- --- one per VM



Figure 1: The ESX hypervisor: one vmkernel per host, and one VMM per virtual machine.

#### The Evolution of an x86 Virtual Machine Monitor

Ole Agesen, Alex Garthwaite, Jeffrey Sheldon, Pratap Subrahmanyam

### from VMWare - Server Version

For example, VMware's vSphere ESX hypervisor is comprised of the *wmkernel* and a VMM. The vmkernel contains a boot loader, an x86 hardware abstraction layer, I/O stacks for storage and networking, as well as memory and CPU schedulers. To run a VM, the vmkernel loads the VMM, which encapsulates the details of virtualizing the x86 architecture, including all 16 and 32 bit legacy modes as well as 64 bit long mode. The VM executes directly on top of the VMM, touching the hypervisor only through the VMM surface area.



--- can't run original OS binaries ===> keeping up w/ the Joneses?



### I/O Virtualization

- □ Issue: lots of I/O devices
- Problem: Writing device drivers for all I/O device in the VMM layer is not a feasible option
- Insight: Device driver already written for popular Operating Systems
- Solution: Present virtual I/O devices to guest VMs and channel I/O requests to a trusted host VM running popular OS







G-OS: page fault, context switches, 100s of cycles VMM: examine G-PT (find G-PA), 100s of cycles VMM: find H-Phys-Addr, 100s of cycles VMM: allocate/fill shadow PT 100s of cycles

> we must examine what happens when the guest accesses a particular gVA. First, the memory access causes a page fault (several hundred cycles in the circa 2002 processors). Then, the VMM walks the guest's page tables in software to determine the gPA backing that gVA (again costing a few hundred cycles). Next, the VMM determines the hPA that backs that gPA. Often, this step is fast, but upon first touch it requires the host OS to allocate a backing page. Finally, the VMM allocates a shadow page table for the mapping and wires it into the shadow page table tree. The page fault and the subsequent shadow page table update are analogous to a normal TLB fill in that they are invisible to the guest,



